Skip to content
amirius edited this page Nov 29, 2017 · 6 revisions
Table of Contents
  1. Introduction
    1. Topology
  2. Overlay Configuration
  3. Tunnel Configuration
    1. Flat Configuration
    2. Hierarchical Configuration
  4. Features and Limitations
  5. Further Resources

Introduction

Since L3 tunneling is fundamentally a routing technology, the switch where tunnels should to be configured needs to have routing enabled. See Static Routing for more details.

Topology

In abstract, the reason to create an IP-in-IP tunnel is to connect two IP networks separated by another IP network. In the example here, the two domains to be connected are represented by two hosts with arbitrarily-chosen addresses 192.168.1.33 resp. 192.168.2.33. The two hosts are each connected to a tunnel endpoint, addressed 1.2.3.4/31, which wraps up the host traffic and delivers it through a tunnel to the other endpoint. The encapsulated traffic travels over a transport network, here addressed 192.168.99.0/24.

In tunneling parlance, the traffic flowing between the two separated IP domains is called overlay traffic, and correspondingly the network where it flows overlay network. The encapsulated traffic on the other hand is called underlay traffic, and the network where it flows underlay network.

+--------------+         +--------------+
|              |         |              |
|    host1     |         |    host2     |
|              |         |              |
| 192.168.1.33 |         | 192.168.2.33 |
|      +       |         |      +       |
|      |       |         |      |       |
+--------------+         +--------------+
       |                        |
+--------------+         +--------------+
|      |       |         |      |       |
|      +       |         |      +       |   Overlay
| 192.168.1.1  |         | 192.168.2.1  | - - - - - -
|              |         |              |   Underlay
|   switch1    |         |   switch2    |
|              |         |              |
|   1.2.3.4    |         |   1.2.3.5    |
|      +       |         |      +       |
|      |       |         |      |       |
| 192.168.99.1 |         | 192.168.99.2 |
|      +       |         |      +       |
|     | |      |         |     | |      |
+--------------+         +--------------+
      | |______________________| |
      '--------------------------'

The switch, as a tunneling gateway, naturally handles both overlay and underlay traffic. Both can be in the same VRF (possibly the default one), or each can be in a different VRF. See below for details of each of these configurations.

Currently, mlxsw offloads GRE tunnels, but not all possible configurations are supported. Refer to Features and Limitations for the list of constraints that the tunnel needs to satisfy to be offloaded.

Besides setting up a tunnel device, one needs to also have a local route matching tunnel local address, which is offloaded to decapsulate packets; and possibly one or more routes that direct traffic to the tunnel, which are offloaded to encapsulate packets.

Overlay Configuration

First, set up connection to local overlay network and route for tunneling of traffic destined for the remote overlay network (in this case, 192.168.2.0/24):

host1 $ ip link set dev eth0 up
host1 $ ip address add dev eth0 192.168.1.33/24
host1 $ ip route add 192.168.2.0/24 via 192.168.1.1
host2 $ ip link set dev eth0 up
host2 $ ip address add dev eth0 192.168.2.33/24
host2 $ ip route add 192.168.1.0/24 via 192.168.2.1

On the switch, set up the overlay interface accordingly:

sw1 $ ip link set dev sw1p49 up
sw1 $ ip address add dev sw1p49 192.168.1.1/24
sw2 $ ip link set dev sw1p49 up
sw2 $ ip address add dev sw1p49 192.168.2.1/24

Tunnel Configuration

You need a GRE module in order to set up GRE tunnels:

sw $ modprobe ip-gre

There are two ways that GRE tunnel endpoint can be set up. Either overlay and underlay are each in a different VRF (which we call hierarchical configuration), or they share the same VRF (flat configuration). The following sections elaborate how to set up each of these options.

Flat Configuration

In flat configuration, overlay and underlay traffic share the same VRF:

   +------------------( switch )-------------------+
   |                                               |
   |   overlay          GRE         transport      |
---|-+ 192.168.1.1      1.2.3.4 +-- 192.168.99.1 +=|===
   |                                               |
   +-----------------------------------------------+

First, set up the tunnel itself:

sw1 $ ip tunnel add name g mode gre local 1.2.3.4 remote 1.2.3.5 tos inherit
sw1 $ ip link set dev g up
sw1 $ ip address add dev g 1.2.3.4/32
sw2 $ ip tunnel add name g mode gre local 1.2.3.5 remote 1.2.3.4 tos inherit
sw2 $ ip link set dev g up
sw2 $ ip address add dev g 1.2.3.5/32

Or, if you want to use GRE keys:

sw1 $ ip tunnel add name g mode gre local 1.2.3.4 remote 1.2.3.5 tos inherit \
         key 123

Or:

sw1 $ ip tunnel add name g mode gre local 1.2.3.4 remote 1.2.3.5 tos inherit \
         ikey 456 okey 789

Note that the tunnel remote address must be reachable from this node. For example:

sw1 $ ip link set dev sw1p51 up
sw1 $ ip address add dev sw1p51 192.168.99.1/24
sw1 $ ip route add 1.2.3.5/32 via 192.168.99.2
sw2 $ ip link set dev sw1p51 up
sw2 $ ip address add dev sw1p51 192.168.99.2/24
sw2 $ ip route add 1.2.3.4/32 via 192.168.99.1

At this point, it is possible to direct traffic at the tunnel:

sw1 $ ip route add 192.168.2.0/24 dev g
sw1 $ ip route add 2001:db8:2::/56 dev g
sw2 $ ip route add 192.168.1.0/24 dev g
sw2 $ ip route add 2001:db8:1::/56 dev g

To verify that the individual routes have been offloaded:

sw $ ip route show table local dev g
local 1.2.3.4 dev g proto kernel scope host src 1.2.3.4 offload
sw $ ip route show dev g
192.168.2.0/24 scope link offload
sw $ ip -6 route show dev g
2001:db8:2::/56 metric 1024 offload pref medium

Hierarchical Configuration

This is similar in spirit to the flat configuration, however now the GRE netdevice has a bound device that selects a VRF to use for underlay traffic. Typically this would be a different VRF than the one with the GRE netdevice itself, but it does not have to be.

Note: Bound devices are offloaded correctly only when their master is a VRF device. In that case, the bound device is only used to select the VRF to use for underlay traffic. Without a VRF, the bound device serves to actually select interface to egress encapsulated traffic through. That use is not recognized by mlxsw, a bound device is assumed to always just select an underlay VRF, even in cases when the bound device is not a member of a VRF. That is the reason we use a dummy device in this tutorial; it is the only device that makes sense as an anchor to select VRF.

This is what the setting looks like:

   +------------------( switch )-------------------+
   |                                               |   <-- VRF ol
   |   overlay           GRE                       |
---|-+ 192.168.1.1        ^                        |
   |                      |                        |
   | - - - - - - - - - - -|- - - - - - - - - - - - |
   |                      v                        |   <-- VRF ul
   |                    dummy       transport      |
   |                    1.2.3.4 +-- 192.168.99.1 +=|===
   |                                               |
   +-----------------------------------------------+

First, create the VRFs themselves. For more details on that, see Virtual Routing and Forwarding (VRF):

sw $ ip link add name ol type vrf table 10
sw $ ip link set dev ol up
sw $ ip link add name ul type vrf table 20
sw $ ip link set dev ul up

Next create a dummy device to use to select the underlay VRF:

sw1 $ ip link add name d type dummy
sw1 $ ip link set dev d master ul
sw1 $ ip link set dev d up
sw1 $ ip address add dev d 1.2.3.4/32
sw2 $ ip link add name d type dummy
sw2 $ ip link set dev d master ul
sw2 $ ip link set dev d up
sw2 $ ip address add dev d 1.2.3.5/32

Now create a tunnel, using the dummy as a bound device:

sw1 $ ip tunnel add name g mode gre local 1.2.3.4 remote 1.2.3.5 dev d tos inherit
sw1 $ ip link set dev g master ol
sw1 $ ip link set dev g up
sw2 $ ip tunnel add name g mode gre local 1.2.3.5 remote 1.2.3.4 dev d tos inherit
sw2 $ ip link set dev g master ol
sw2 $ ip link set dev g up

You can of course set input and/or output GRE key like in the case of flat configuration.

At this point, it is possible to direct traffic at the tunnel:

sw1 $ ip route add vrf ol 192.168.2.0/24 dev g
sw1 $ ip route add vrf ol 2001:db8:2::/56 dev g
sw2 $ ip route add vrf ol 192.168.1.0/24 dev g
sw2 $ ip route add vrf ol 2001:db8:1::/56 dev g

Also remember to put the ports which connect to the overlay and underlay networks to their right VRF. For example:

sw $ ip link set dev sw1p49 master ol
sw $ ip link set dev sw1p51 master ul

Decap-only Tunnels

Tunnel decap is offloaded as soon as there is a local route matching the local address of a tunnel. However in slow path, if the decapsulated packets are to be forwarded to hosts, one of the following conditions needs to hold:

– There actually needs to be a corresponding route that would direct traffic from those hosts to the tunnel device (i.e. an encapsulating route) – Reverse path filtering needs to be disabled: sysctl -w net.ipv4.conf.all.rp_filter=0 – The decapsulated traffic needs to be IPv6

mlxsw ignores the rp_filter setting and offloads as if it were disabled. This might create a discrepancy between how slow path and fast path packets are processed.

Another possibility to create a decap-only tunnel is to actually introduce the encapsulating routes, but set the bound device down. In that scenario, Linux (and mlxsw) does not forward encapsulated traffic, but the existence of the route makes the reverse path filtering work.

Configuration Changes

As of Linux 4.15, changes to configuration of tunnel netdevice, or its bound netdevice, lead to updates of switch ASIC configuration. If the configuration ends up not being offloadable, all impacted tunnels are moved to slow path.

The opposite logic which would notice that a netdevice became eligible for offloading due to configuration changes is currently not implemented. What falls to slow path, stays there.

Features and Limitations

In Linux 4.15

Only tunnels satisfying the following conditions are currently offloaded:

– Only GRE tunnels (IPv4 underlay) – Both local and remote addresses shall be given (NBMA tunnels and LWT are currently not supported) – TTL and TOS shall both be inherit (note that the latter is not a default setting in Linux) – No two tunnels that share underlay VRF shall share a local address (i.e. dispatch based on tunnel key is not supported) – Sequence numbers and checksumming shall not be used

In Linux 4.14

Everything mentioned for Linux 4.15 as well as the following:

– Forming encapsulating routes to two tunnels that have the same local address and underlay VRF, leads to invocation of abort mechanism (see [[Static Routing]]) – Nothings is offloaded until an encapsulating route is added (i.e. the decap-only flow is not supported) – Changes to configuration done after the tunnel is offloaded are not reflected. This can be circumvented by removing and re-adding of all encapsulating routes at once (not one at a time). – State of bound device (up/down) is not reflected

The tunnel may have i-key and/or o-key set, and if it has both, the two may differ.

Further Resources

  1. man ip-tunnel
  2. https://www.deepspace6.net/docs/iproute2tunnel-en.html
Clone this wiki locally