VxLAN and namespaces – basic concepts, creating a simple isolated router

Hello! Last two or three weeks I have spent toying around with OpenStack. Having never dealt with any SDN before, I was surprised by how well it works and how easy it is to expand our networks to multiple nodes. I started looking at some older presentations on the OpenStack YouTube channel (which is pretty cool by the way, give it a look if you’re interested – https://www.youtube.com/user/OpenStackFoundation/) looking for answers what’s sitting under the hood and surprisingly I ran into a lot of different concepts (which is I guess a result of OpenStack being as modular as it is).

After some poking around I have learnt a fair bit and I would like to share it with you. Below you can find a network diagram – it doesn’t have to tell you much at the moment, I will cover all the bits later. Let’s crack on!

Interfaces

If an interface in the diagram does not have an IP address assigned to it, it means it literally does not have an IP configured on the device. For example, looking at the diagram above we can see that eth1 and eth2 don’t have any addresses configured. We can confirm that by issuing ip a command in the console:

[root@centos1 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:5a:0e:c3 brd ff:ff:ff:ff:ff:ff
    inet 172.16.201.11/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe5a:ec3/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master internalbr state UP group default qlen 1000
    link/ether 52:54:00:4e:ff:57 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:fe4e:ff57/64 scope link 
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master externalbr state UP group default qlen 1000
    link/ether 52:54:00:30:37:fd brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:fe30:37fd/64 scope link 
       valid_lft forever preferred_lft forever
5: internalbr: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
    link/ether 1a:eb:df:5b:c9:d9 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::3452:4ff:fe3c:5b31/64 scope link 
       valid_lft forever preferred_lft forever
6: vxlan-99: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master internalbr state UNKNOWN group default qlen 1000
    link/ether 1a:eb:df:5b:c9:d9 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::18eb:dfff:fe5b:c9d9/64 scope link 
       valid_lft forever preferred_lft forever
7: externalbr: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:30:37:fd brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:fe30:37fd/64 scope link 
       valid_lft forever preferred_lft forever
8: vxeth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master externalbr state UP group default qlen 1000
    link/ether ea:12:0e:f5:c2:e3 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::e812:eff:fef5:c2e3/64 scope link 
       valid_lft forever preferred_lft forever

Networks

Before we begin configuration, it’s worth mentioning that I will be using few terms you should get familiar with: overlay network, underlay network, veth pairs and namespaces.

Overlay network is highlighted by the green colour – it is a virtual network that sits “on top” of our infrastructure and relies on underlay network for transport. In my example, it is the 10.255.0.0/24 network. Overlay network hosts don’t need to know any underlay network details.

Underlay network – it is the network providing connectivity between hosts providing for the overlay network. In our example, it is the red link between our two CentOS boxes. In order for our overlay network to work, all hosts that provide for the same overlay network need to have a way to reach each other in order to transport traffic.

veth pairs – that one should be pretty easy. If we want to provide connectivity between things like bridges or namespaces we need to use veth pairs – veth pair is nothing else than just a virtual equivalent of a cable we would use to connect two devices to each other. Imagine you have two switches – what would you do first in order to allow packets to go from one switch to another? Well, probably start with a cable! And here we also have the same cable – except, it’s virtual. We are going to use this to connect our “router” to the internal bridge.

Namespace – you probable heard of VRF and if not – I suggest you go and check it out. Namespaces provide us a way to create a completely isolated network environments with separate interfaces and routing tables. Things like OpenStack use namespaces for few functionalities like virtual routers, DHCP service etc.

Internal bridge

Let’s get started! First of all we will be setting up the internal bridge so hosts on the overlay network in this particular physical network segment can all see each other.

Before we start bridging, let’s make sure we have the required kernel module loaded and forwarding is enabled with following commands ran as root:

modprobe br_netfilter
echo '1' > /proc/sys/net/ipv4/ip_forward

Now we can create a bridge and bring it up with following commands:

ip link add name internalbr type bridge
ip link set internalbr up

Let’s add eth1 to the bridge and bring it up as well:

ip link set eth1 master internalbr
ip link set eth1 up

Now let’s create a VxLAN interface and bind it to our underlay network interface, in our case it’s eth0:

ip link add vxlan-99 type vxlan id 99 dev eth0 dstport 0

Then we can add the interface to our bridge and bring it up:

ip link set vxlan-99 master internalbr
ip link set vxlan-99 up

And put an endpoint entry in the forwarding database:

# on centos1 i will run following command:
bridge fdb append to 00:00:00:00:00:00 dst 172.16.201.12
# and on centos2 i will do this:
bridge fdb append to 00:00:00:00:00:00 dst 172.16.201.11

After that’s done, lets try pinging Client2 from Client1:

[root@localhost ~]# ping 10.255.0.2
PING 10.255.0.2 (10.255.0.2) 56(84) bytes of data.
64 bytes from 10.255.0.2: icmp_seq=1 ttl=64 time=0.989 ms
64 bytes from 10.255.0.2: icmp_seq=2 ttl=64 time=0.827 ms
64 bytes from 10.255.0.2: icmp_seq=3 ttl=64 time=0.940 ms
64 bytes from 10.255.0.2: icmp_seq=4 ttl=64 time=1.18 ms

Cool! But what exactly have we done?

We first added a bridge called “internalbr”. This bridge is supposed to allow us to connect multiple “cables” to the same “switch”, allowing layer 2 connectivity. In our case, we connected eth1 and vxlan-99 interfaces to the same switch. That way, any traffic from Client1 arriving on CentOS1 eth1 will also be visible to all other interfaces connected to this bridge.

After we created the bridge and plugged in eth1, we then created a special vxlan interface and pointed it at our eth0 interface. It is pointing at eth0 because it will be using eth0 to reach vxlan interfaces on other nodes (in our case, on CentOS2). We specified port 0 because we will only be relying on unicast endpoints as normally VxLAN would use multicast addresses with port 4789 and we don’t want that. After vxlan interface is up and running, we connected it to the bridge and, as I said before, vxlan-99 and eth1 interfaces can now see each other’s traffic which is crucial for the vxlan operation.

What about this weird bridge fdb command that we ran afterwards? Well, because hosts on the overlay network are oblivious to the fact we have underlying network providing connectivity between nodes, they are still going to use the same method to reach hosts on the network they are on – so if we decide to ping 10.255.0.2/24 from 10.255.0.1/24, client1 will first send an ARP request to the broadcast address to find out the MAC address of the 10.255.0.2 device and then attempt to send it an ICMP message. Because of how unicast VxLAN operates, we need to first add an entry into each one of the bridge’s forwarding databases and essentially say “hey, if you get an ARP request, please also send it to 172.16.201.12 endpoint”. That way, the same ARP request will be visible on both CentOS1 and CentOS2 bridges, and will allow Client2 to reply. We can add multiple endpoints using the same command, so we essentially make all nodes aware that they should be passing the broadcast messages to each other.

We can see what happens in the CentOS1 bridge’s forwarding database after successfully connecting from Client1 (52:54:00:64:d3:c7) to CentOS2 (52:54:00:a6:04:bf):

[root@CentOS1 ~]# bridge fdb show | grep vxlan-99
# --- Redacted --- 
00:00:00:00:00:00 dev vxlan-99 dst 172.16.201.12 self permanent
52:54:00:a6:04:bf dev vxlan-99 dst 172.16.201.12 self 

And on CentOS2:

[root@CentOS2 ~]# bridge fdb show | grep vxlan-99
# --- Redacted ---
00:00:00:00:00:00 dev vxlan-99 dst 172.16.201.11 self permanent
52:54:00:64:d3:c7 dev vxlan-99 dst 172.16.201.11 self 

Below the permanent entry we’ve put in, we can also find the ARP entry for Client2 and Client1 (on CentOS1 and CentOS2 respectively). Thanks to that, next time we receive a packet with that destination address, we won’t need to send out the packet to all endpoints configured with 00:00:00:00:00:00 address, but instead send it directly to 172.16.201.11 device.

Creating a router

Okay, but what if we have multiple networks that we would like to connect to each other? What about internet connectivity?

Well, let’s have a look at the diagram again. We can see few new elements coming into play – the “router” namespace and the “externalbr”. Externalbr will be used to bridge eth2 interface (providing access to routed network) and the router namespace will be providing routing between both of those networks.

Let’s start off by creating the externalbr bridge and plug in eth2:

ip link add name externalbr type bridge
ip link set dev externalbr up
# plug in eth2
ip link set dev eth2 master externalbr
ip link set dev eth2 up

Now we will need to create the namespace. We can do that with following command:

ip netns add router

Now let’s create two veth pairs – one to connect our namespace to internal bridge, and one to connect namespace to external bridge:

# veth to external bridge
ip link add vreth0 type veth peer name vxeth0
# veth to internal bridge
ip link add vreth1 type veth peer name vieth1

Plug in one end of each pair into each bridge:

# Plug in vxeth0 into externalbr
ip link set dev vxeth0 master externalbr
ip link set dev vxeth0 up
# Plug in vieth1 into internalbr
ip link set dev vieth1 master internalbr
ip link set dev vieth1 up

And now let’s plug in the loose ends to our namespace and bring them up:

ip link set vreth0 netns router
ip link set vreth1 netns router
ip netns exec router ip link set dev vreth0 up
ip netns exec router ip link set dev vreth1 up

And now let’s assign each interface attached to the router an IP address so we can route traffic between the 10.255.0.0/24 network and our provider 192.168.1.0/24 network and to the outside world:

# interface attached to external bridge - make sure the 192.168.1.X IP is 
# unique on both CentOS1 and CentOS2
ip netns exec router ip addr add 192.168.1.244/24 dev vreth0
# default route from the external bridge to the internet
ip netns exec router ip route add default via 192.168.1.1
# interface attached to internal bridge. The 10.255.0.254 IP can be
# reused on both CentOS1 and CentOS2
ip netns exec router ip addr add 10.255.0.254/24 dev vreth1
# enable forwarding in the namespace:
ip netns exec router sh -c "echo '1' > /proc/sys/net/ipv4/ip_forward"

Now on our Client1 and Client2 devices we need to make sure our default route points at 10.255.0.254:

ip route add default via 10.255.0.254

Now let’s ping my home router from Client1 and Client2:

[root@client1 ~]# ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=1 ttl=63 time=1.43 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=63 time=1.40 ms
64 bytes from 192.168.1.1: icmp_seq=3 ttl=63 time=1.02 ms
64 bytes from 192.168.1.1: icmp_seq=4 ttl=63 time=0.889 ms

[root@client2 ~]# ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=1 ttl=63 time=1.02 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=63 time=1.34 ms
64 bytes from 192.168.1.1: icmp_seq=3 ttl=63 time=17.7 ms
64 bytes from 192.168.1.1: icmp_seq=4 ttl=63 time=3.53 ms

Cool! If you want to go outside to the internet, make sure you have a masquerading enabled in the router namespace or make sure the default gateway (in my case it’s my home router – 192.168.1.1) can route traffic from the 10.255.0.0/24 network back to one of the nodes.

If you want to add masquerading of traffic leaving via vreth0, run following on both CentOS1 and CentOS2:

ip netns exec router iptables -t nat -A POSTROUTING -o vreth0 -j MASQUERADE

I purposely left 10.255.0.254/24 IP configured on both CentOS1 and CentOS2 because it doesn’t matter we have a conflict – this sort of traffic will always reach the correct host (because they are connected to the same bridge). This is also how OpenStack handles their switching as well.

Hopefully you enjoyed it as much as I did! If you have any questions or I skipped something which doesn’t make much sense – ask right away! I will try my best to answer your questions 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

Navigation