Nebula

From ArchWiki

Nebula is a user-space mesh virtual private network (VPN) daemon that uses tunneling and encryption to create a secure private mesh network between participating hosts.

Installation

Install the nebula package.

Basic concepts and terminology

Nebula is a mesh VPN technology, inspired by tinc. In a mesh VPN, individual nodes form direct tunnels between each other. This allows for high speed direct communication between nodes, without the need to go through a central node. Nodes are authenticated using certificates signed by a certificate authority.

This is in contrast to WireGuard, which is a peer-to-peer VPN technology (although there exist mesh network managers for WireGuard, e.g. innernet and wesherAUR).

This is also different from OpenVPN, which uses a star topology (also called hub and spoke).

Certificate authority
The certificate authority creates host certificates by signing it.
Lighthouse
In a Nebula network, there is typically at least one lighthouse node that serves as an information hub for other nodes. Lighthouse nodes help other nodes find each other and form a network mesh.
Node
A node in the Nebula network.
Nebula IP
IP address of a node within the Nebula network. Also known as VPN IP.
Routable IP
The "normal" or "native" IP address of a node. This can be a public IP address or a private IP address, depending on where the node is located and how its network is configured. A node can have multiple routable IP addresses.

Example: Simple mesh VPN

Network setup

In this example, we have 3 nodes:

  • lighthouse
    • Nebula IP: 192.168.100.1
    • Routable IP: 12.34.56.78
  • hostA
    • Nebula IP: 192.168.100.101
    • Routable IP: 10.0.0.22
  • hostB
    • Nebula IP: 192.168.100.102
    • Routable IP: 23.45.67.89

The lighthouse has a public static IP address and is reachable by hostA and hostB. hostA lives behind a NAT. hostB has a public IP address.

In our case, we will use a /24 subnet for the VPN network. We will call this network "My Nebula Network".

Certificate and key generation

First, generate the CA certificate and private key with nebula-cert ca -name "My Nebula Network". This will create two files:

  • ca.crt: The CA certificate file
  • ca.key: the CA private key

Subsequently, generate the certificate and private key files for the nodes in the network:

$ nebula-cert sign -name lighthouse -ip 192.168.100.1/24
$ nebula-cert sign -name hostA -ip 192.168.100.101/24
$ nebula-cert sign -name hostB -ip 192.168.100.102/24

Notice that we did not specify ca.crt and ca.key. By default, nebula-cert looks for those files in the current directory.

After this step, we will have these files:

  • lighthouse.crt, lighthouse.key
  • hostA.crt, hostA.key
  • hostB.crt, hostB.key

Configuration

Create this configuration file on the lighthouse node:

/etc/nebula/config.yml
pki:
  ca: /etc/nebula/ca.crt
  cert: /etc/nebula/lighthouse.crt
  key: /etc/nebula/lighthouse.key

lighthouse:
  am_lighthouse: true
  
listen:
  port: 4242

firewall:
  outbound:
    - port: any
      proto: any
      host: any
  inbound:
    - port: any
      proto: any
      host: any

Create this configuration file on hostA:

/etc/nebula/config.yml
pki:
  ca: /etc/nebula/ca.crt
  cert: /etc/nebula/hostA.crt
  key: /etc/nebula/hostA.key

static_host_map:
  "192.168.100.1": ["12.34.56.78:4242"]

lighthouse:
  hosts:
    - "192.168.100.1"

punchy:
  punch: true

firewall:
  outbound:
    - port: any
      proto: any
      host: any
  inbound:
    - port: any
      proto: any
      host: any

Finally, use this configuration file for hostB:

/etc/nebula/config.yml
pki:
  ca: /etc/nebula/ca.crt
  cert: /etc/nebula/hostB.crt
  key: /etc/nebula/hostB.key

static_host_map:
  "192.168.100.1": ["12.34.56.78:4242"]

lighthouse:
  hosts:
    - "192.168.100.1"

firewall:
  outbound:
    - port: any
      proto: any
      host: any
  inbound:
    - port: any
      proto: any
      host: any

Distribute certificates and private keys

Because the certificates and private keys were generated by the certificate authority, they need to be distributed to each node. SCP and SFTP are suitable for this purpose.

Specifically:

  • ca.crt should be copied to all 3 nodes: lighthouse, hostA, and hostB
  • lighthouse.crt and lighthouse.key should be copied to the lighthouse node
  • hostA.crt and hostA.key should be copied to hostA
  • hostB.crt and hostB.key should be copied to hostB
Note: The ca.key file does not have to be copied over to any node. Keep it safe (do not lose it) and secure (do not leak it).

Start the nebula daemon

On each node, start nebula.service. Optionally, enable it so that it will be started on boot.

Note that it does not matter which node starts the nebula daemon. The lighthouse node can even be started last. Each individual node always tries to connect to the list of known lighthouse nodes, so any network interruption can be rectified quickly.

Test for mesh functionality

With a mesh network, every node is directly connected to every other node. So, even if the connection between lighthouse and both hostA and hostB is slow, traffic between hostA and hostB can be fast, as long as there is a direct link between those two.

This can be demonstrated by a simple ping test on hostA:

$ ping -c 5 12.34.56.78
PING 12.34.56.78 (12.34.56.78) 56(84) bytes of data.
64 bytes from 12.34.56.78: icmp_seq=1 ttl=56 time=457 ms
64 bytes from 12.34.56.78: icmp_seq=2 ttl=56 time=480 ms
64 bytes from 12.34.56.78: icmp_seq=3 ttl=56 time=262 ms
64 bytes from 12.34.56.78: icmp_seq=4 ttl=56 time=199 ms
64 bytes from 12.34.56.78: icmp_seq=5 ttl=56 time=344 ms

--- 12.34.56.78 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4004ms
rtt min/avg/max/mdev = 199.141/348.555/480.349/108.654 ms
$ ping -c 5 192.168.100.1
PING 192.168.100.1 (192.168.100.1) 56(84) bytes of data.
64 bytes from 192.168.100.1: icmp_seq=1 ttl=64 time=218 ms
64 bytes from 192.168.100.1: icmp_seq=2 ttl=64 time=241 ms
64 bytes from 192.168.100.1: icmp_seq=3 ttl=64 time=264 ms
64 bytes from 192.168.100.1: icmp_seq=4 ttl=64 time=288 ms
64 bytes from 192.168.100.1: icmp_seq=5 ttl=64 time=163 ms

--- 192.168.100.1 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4004ms
rtt min/avg/max/mdev = 162.776/234.874/288.073/42.902 ms
$ ping -c 5 192.168.100.102
PING 192.168.100.102 (192.168.100.102) 56(84) bytes of data.
64 bytes from 192.168.100.102: icmp_seq=1 ttl=64 time=106 ms
64 bytes from 192.168.100.102: icmp_seq=2 ttl=64 time=2.14 ms
64 bytes from 192.168.100.102: icmp_seq=3 ttl=64 time=4.53 ms
64 bytes from 192.168.100.102: icmp_seq=4 ttl=64 time=4.29 ms
64 bytes from 192.168.100.102: icmp_seq=5 ttl=64 time=5.39 ms

--- 192.168.100.102 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4006ms
rtt min/avg/max/mdev = 2.136/24.535/106.344/40.918 ms

Notice that the connection between hostA and lighthouse is slow, but the connection between hostA and hostB is very fast. Also notice that the first packet between hostA and hostB is delayed a bit, but subsequent packets take almost no time at all.

Configuration options

listen.port
This is the listening port for the nebula daemon, which by default is 4242. On a lighthouse node, or a node with a static IP address, set this to any other number in order to personalize your setup and reduce the chances of unwanted service discovery and DDoS attacks on that port. Then update static_host_map to reflect the change.
On a node with a dynamic IP address, it is recommended to set this to 0, such the nebula daemon will use a random port for communication.
logging.level
By default, the nebula daemon logs INFO-level messages. Thus handshakes are printed, and this can generate a lot of log messages. Set it to warning in order to reduce the amount of messages logged.
relay
This option can be used if a node cannot be reached directly from another node. Relay nodes help forward the communication between such nodes.
firewall
This option can be used to allow only certain traffic to and from a node.

Troubleshooting

My lighthouse node takes forever to handshake

If your lighthouse node needs a long time to handshake, and it prints multiple handshake messages all at once when handshake is completed, maybe it does not support recvmmsg(). To get around this issue, add this configuration option:

/etc/nebula/config.yml
listen:
  batch: 1

This problem usually happens if your Linux kernel is too old (<2.6.34). The proper solution is to upgrade it.

See also