Using Mahimahi on P2P Networks

In a recent research project, I needed to emulate various network conditions on a per-link basis in a peer-to-peer network. This blog post documents how I achieved this goal using Mahimahi and the gotchas.

The need of emulating different sorts of network conditions has long existed. People want it for various reasons: testing new congestion control algorithms, evaluating application performance under harsh network conditions, and so on. Specifically, we want to artificially set the bandwidth and the delay of a link so as to emulate the behavior of another, usually worse-performing, network link. Actually, tools have existed as early as 1997 1. In today’s Linux, qdisc and netem can be used to achieve bandwidth throttling and delaying. However, in many situations, users want the ability to frequently change the simulated network condition, ideally according to a pre-recorded trace. Mahimahi is a tool that targets this application. It spawns shells where all network traffic towards or from applications within this shell is subject to user-controlled queuing (for bandwidth throttling and fair sharing) and delay. On a high level, Mahimahi creates private network segments for each of these shells, and works like a vritual router so that these shells can talk to the outside. Mahimahi starts the shell (and applications inside it) in a separate network namespace (NS hereafter), and creates a dummy network interface for this NS, so that all traffic within the NS goes through this dummy interface. This interface is then attached to Mahimahi, which queues and delays the traffic. Mahimahi creates another interface in the “parent” NS and channels the traffic there, and sets the iptables of the parent NS so that the parent NS performs NAT for the child NS. The following figure summarizes how Mahimahi encapsulates traffic inside the NS.

This is a very nice architecture for server-client applications, such as web browsing, video streaming, or simply testing end-to-end congestion control algorithms. However, one thing it cannot do is peer-to-peer applications. The reason is simple: Mahimahi relies on NAT, and processes behind NAT cannot be reached - the processes itself has to initiate connections. This is exactly the reason why people have a hard time deploying P2P applications in today’s internet - almost all personal devices are behind NAT! Your home router, your ISP, or your company IT all do this, for various reasons. When a device is behind NAT but wants to listen to a network port (so as to provice some service or allow peers to connect to it), we need to configure port forwarding on the gateway to manually tell it to open a port on behalf of the listening device, and rewrite the IP packet destination to forward all traffic on this port to that device.

Clearly, we need to do the same thing so that a process inside the Mahimahi shell can accept incoming connections. The situation is complicated a little by the fact that Mahimahi is designed to be composed. Specifically, Mahimahi provides three types of shells: delay shell, which delays the packets; link shell, which applies custom bandwidth throttler and queuing; loss shell, which randomly drops packets. To emulate sophisticated network behaviors, users are supposed to nest Mahimahi shells together. For example, one could nest a link shell inside a delay shell to emulate a link with limited bandwidth and long propagation delay. What this means to our problem is we need to configure port forwarding for every layer of Mahimahi shell.

To configure port forwarding, we need to install DNAT rule to the iptable. Suppose the host has address 1.2.3.4 on its public interface eth0 and we want to open port 9000 on it and forward UDP traffic (I was using QUIC in the experiments) to it towards our Mahimahi shell. According to my observation, Mahimahi assigns 100.64.0.2 to the first Mahimahi shell. So on the host (not within any Mahimahi shell), we need to run

iptables -A PREROUTING -i eth0 -t nat -p udp -d 1.2.3.4 --dport 9000 -j DNAT --to-destination 100.64.0.2

Then suppose we start a nested Mahimahi shell inside the first shell. According to my observation, the nested shell gets IP address 100.64.0.4, and the dummy interfaces are always called ingress. So inside the first shell, we need to run

iptables -A PREROUTING -i ingress -t nat -p udp -d 100.64.0.2 --dport 9000 -j DNAT --to-destination 100.64.0.4

Note that we run this inside the first shell. This is because each NS has its own iptable, and we want to configure the iptable for the first Mahimahi shell so that it forwards traffic to the nested shell.

You may test this setup by starting a iperf server at port 9000 inside the inner shell, and try to connect to it by 1.2.3.4:9000. If it works, then you have succeeded. The remaining quesiton is: how to automate this process so that we can use this setup in our experiment scripts? Achieving that requires a tiny bit of dirty hack. Mahimahi accepts, as command line arguments, a command to be executed when the shell starts. For example, you can use the following command to start a wget as soon as the delay shell starts

mm-delay 80 wget 'google.com'

Nothing fance here, right? However, we need to complete a chain of tasks to make our setup work:

  1. Run iptables on the host
  2. Start the first Mahimahi shell
  3. Run iptables inside the first Mahimahi shell
  4. Start the second Mahimahi shell
  5. Start our application

Here, task 1 is straightforward and can be done as a separate command. However, when we start the first Mahimahi shell (step 2), we need to concatenate step 3, 4, 5 as a single command and pass it to Mahimahi. That is, the first Mahimahi shell needs to set the iptable, start another Mahimahi shell, and tell that shell to start our application. The hack here is to compose tasks 3, 4, 5 as a one-line Bash script, and run it using bash -c. Note that you cannot simply pass the one liner to Mahimahi, since Mahimahi does not know how to interpret Bash scripts. The following command switches to a special user I created for experiments, and performs tasks 1-5 as that user. Note that I inserted newlines for readability, and you should remove them before passing it into Bash.

su - test -c "
    sudo iptables -A PREROUTING -i ens3 -t nat -p udp -d %s --dport 9000 -j DNAT --to-destination 100.64.0.2 && 
    mm-delay 80
        bash -c "\""
            sudo iptables -A PREROUTING -i ingress -t nat -p udp -d 100.64.0.2 --dport 9000 -j DNAT --to-destination 100.64.0.4 &&
            mm-link /tmp/linkfile /tmp/linkfile --
                sudo <your-application> --listen-to 0.0.0.0:9000 &> /home/test/your.log
        "\"