In a recent research project, I needed to emulate various network conditions on a per-link basis in a peer-to-peer network. This blog post documents how I achieved this goal using Mahimahi and the gotchas.
The need of emulating different sorts of network conditions has long existed. People want it for various reasons:
testing new congestion control algorithms, evaluating application performance under harsh network conditions, and so on.
Specifically, we want to artificially set the bandwidth and the delay of a link so as to emulate the behavior of another,
usually worse-performing, network link.
Actually, tools have existed as early as 1997 1. In today’s Linux,
netem can be used to achieve bandwidth
throttling and delaying. However, in many situations, users want the ability to frequently change the simulated network
condition, ideally according to a pre-recorded trace. Mahimahi is a tool that targets this
application. It spawns shells where all network traffic towards or from applications within this shell is subject to
user-controlled queuing (for bandwidth throttling and fair sharing) and delay. On a high level, Mahimahi creates private
network segments for each of these shells, and works like a vritual router so that these shells can talk to the outside. Mahimahi starts
the shell (and applications inside it) in a separate network namespace (NS hereafter),
and creates a dummy network interface for this NS, so that all traffic within the NS goes through this dummy interface. This interface
is then attached to Mahimahi, which queues and delays the traffic. Mahimahi creates another interface in the “parent” NS and channels the
traffic there, and sets the
iptables of the parent NS so that the parent NS performs NAT for the child NS. The following figure
summarizes how Mahimahi encapsulates traffic inside the NS.
This is a very nice architecture for server-client applications, such as web browsing, video streaming, or simply testing end-to-end congestion control algorithms. However, one thing it cannot do is peer-to-peer applications. The reason is simple: Mahimahi relies on NAT, and processes behind NAT cannot be reached - the processes itself has to initiate connections. This is exactly the reason why people have a hard time deploying P2P applications in today’s internet - almost all personal devices are behind NAT! Your home router, your ISP, or your company IT all do this, for various reasons. When a device is behind NAT but wants to listen to a network port (so as to provice some service or allow peers to connect to it), we need to configure port forwarding on the gateway to manually tell it to open a port on behalf of the listening device, and rewrite the IP packet destination to forward all traffic on this port to that device.
Clearly, we need to do the same thing so that a process inside the Mahimahi shell can accept incoming connections. The situation is complicated a little by the fact that Mahimahi is designed to be composed. Specifically, Mahimahi provides three types of shells: delay shell, which delays the packets; link shell, which applies custom bandwidth throttler and queuing; loss shell, which randomly drops packets. To emulate sophisticated network behaviors, users are supposed to nest Mahimahi shells together. For example, one could nest a link shell inside a delay shell to emulate a link with limited bandwidth and long propagation delay. What this means to our problem is we need to configure port forwarding for every layer of Mahimahi shell.
To configure port forwarding, we need to install DNAT rule to the iptable. Suppose the host has address
18.104.22.168 on its public interface
eth0 and we want to open
9000 on it and forward UDP traffic (I was using QUIC in the experiments) to it towards our Mahimahi shell. According to my observation, Mahimahi assigns
100.64.0.2 to the first
Mahimahi shell. So on the host (not within any Mahimahi shell), we need to run
iptables -A PREROUTING -i eth0 -t nat -p udp -d 22.214.171.124 --dport 9000 -j DNAT --to-destination 100.64.0.2
Then suppose we start a nested Mahimahi shell inside the first shell. According to my observation, the nested shell gets IP
100.64.0.4, and the dummy interfaces are always called
ingress. So inside the first shell, we need to run
iptables -A PREROUTING -i ingress -t nat -p udp -d 100.64.0.2 --dport 9000 -j DNAT --to-destination 100.64.0.4
Note that we run this inside the first shell. This is because each NS has its own iptable, and we want to configure the iptable for the first Mahimahi shell so that it forwards traffic to the nested shell.
You may test this setup by starting a
iperf server at port 9000 inside the inner shell, and try to connect to it
126.96.36.199:9000. If it works, then you have succeeded. The remaining quesiton is: how to automate this process so that
we can use this setup in our experiment scripts? Achieving that requires a tiny bit of dirty hack. Mahimahi accepts, as
command line arguments, a command to be executed when the shell starts. For example, you can use the following command
to start a wget as soon as the delay shell starts
mm-delay 80 wget 'google.com'
Nothing fance here, right? However, we need to complete a chain of tasks to make our setup work:
iptableson the host
- Start the first Mahimahi shell
iptablesinside the first Mahimahi shell
- Start the second Mahimahi shell
- Start our application
Here, task 1 is straightforward and can be done as a separate command. However, when we start the first Mahimahi shell
(step 2), we need to concatenate step 3, 4, 5 as a single command and pass it to Mahimahi. That is, the first Mahimahi shell
needs to set the iptable, start another Mahimahi shell, and tell that shell to start our application. The hack here is
to compose tasks 3, 4, 5 as a one-line Bash script, and run it using
bash -c. Note that you cannot simply pass the
one liner to Mahimahi, since Mahimahi does not know how to interpret Bash scripts. The following command switches to
a special user I created for experiments, and performs tasks 1-5 as that user.
Note that I inserted newlines for readability, and you should remove them before passing it into Bash.
su - test -c " sudo iptables -A PREROUTING -i ens3 -t nat -p udp -d %s --dport 9000 -j DNAT --to-destination 100.64.0.2 && mm-delay 80 bash -c "\"" sudo iptables -A PREROUTING -i ingress -t nat -p udp -d 100.64.0.2 --dport 9000 -j DNAT --to-destination 100.64.0.4 && mm-link /tmp/linkfile /tmp/linkfile -- sudo <your-application> --listen-to 0.0.0.0:9000 &> /home/test/your.log "\"