The 80 Gbps barrier has finally been broken (and yes we are rounding up) !!!!
Well at least it has been reached by someone other than MikroTik. It’s taken us quite a while to get all the right pieces to push 80 Gbps of traffic through the CC1072 but with the latest round of servers that just got delivered to our lab, we were able to go beyond our previous high water mark of 54 Gbps all the way to just under 80 Gbps. There have been a number of questions about this particular router and what the performance will look like in the real world. While this is still a lab test, we are using non-MikroTik equipment and iperf which is considered an extremely accurate performance measuring tool for TCP and UDP.
Video of the CCR1072-1G-8S+ in action (Turn up your volume to hear the roar of the ESXi servers as they approach 80 Gbps)
How we did it – The Hardware
CCR1072-1G-8S+ – Obviously you can’t have a test of the CCR1072 without one to test on. Our CCR1072-1G-8S+ is a pre-production model so there are some minor differences between it and the units that are shipping right now.
HP DL360-G6 – It’s hard to find a better value in the used server market than the HP DL360 series. This series of server can be found on ebay for under $500 USD in general and makes a great ESXi host for development and testing work.
Intel x520-DA2 10 Gbps PCI-E NIC – This NIC was definitely not the cheapest 10 gig NIC on the market, but after some research, we found it was one of the most compatible and stable across a wide variety of x86 hardware. The only downside to using this particular NIC is the requirement to use INTEL branded SPF+ modules. We tried several types of generics and after reading some forums posts, we were able to figure out that it only takes a few different types of INTEL SFP+ modules. This didn’t impact performance in any way but doubled the cost of the SFPs required.
How we did it – The Software
RouterOS – We tried several thoughought the testing cycle, but settled on 6.30.4 bugfix.
iperf3 – If you need to generate traffic for a load test, this is probably one of the best programs to use.
VM Ware ESXi 6.0 – There were certainly a fair amount of hypervisors to choose from, and we even considered a Docker deployment for the performance advantages but in the end, VM Ware was the easiest to deploy and manage since we needed some flexibility in the vswitch and 9000 MTU.
CentOS 6.6 Virtual Machines – We chose CentOS as the base OS for TCP throughput testing because it supports VM Ware tools and the VMXNET3 paravirtualized NIC. It is not possible to get a VM to 10 Gbps of throughput without using a paravirtualized NIC so this limits the range of Linux distros to mainstream types like Ubuntu and CentOS. Also, we tend to use CentOS more than the other flavors of linux and so it allowed us to get the VMs ready quickly to generate traffic.
Network Topology for the test – We essentially had to put two VLANs on every physical interface so that the link could be fully loaded by using the 1072 to route between VLANs. A VM was built on each VLAN and tagged in the vswitch for simplicity. In earlier tests, we tried to use LACP between ESXi and the 1072 but had problems loading the links fully with the hashing algorithms we had available.
RouterOS Config for the test
Conclusions, challenges and what we learned
The CCR1072-1G-8S+ is an extremely powerful router and had little difficulty in reaching full throughput once we were able to muster enough resources to load all of the links. As RouterOS continues to develop and moves into version 7, this router has the potential to be extremely disruptive to mainstream network vendors. At $3,000.00 USD list price, this router is ab absolute steal for the performance, 1U footprint and power consumption. Look for more and more of these to show up in data centers, ISPs and enterprises.
There have been a number of things that we have had to work through to get to 80 Gbps but we finally got there.
- VMWARE ESXi – LACP Hashing – Initially we built LACP channels between the ESXi hosts and the 1072 expecting to load the links by using multiple source and destination IPs but we ran into issues with traffic getting stuck on one side of the LACP channel and had to move to independent links.
- Paravirtualized NIC – The VM guest OS must support a paravirtualized NIC like VMXNET3 to reach 10 Gbps speeds.
- CPU Resources – TCP consumes an enormous amount of CPU cycles and we were only able to get 27 Gbps per ESXi host (we had two) and 54 Gbps total
- TCP Offload – Most NICs these days allow for offloading of TCP segmentation by hardware in the NIC rather than the CPU, we were never able to get this working properly in VM WARE to reduce the load on the hypervisor host CPUs.
- MTU – While the CCR1072 supports 10,226 bytes max MTU, VMWARE vswitch only supports 9000 so we lost some potential throughput by having to lower the MTU by more than 10%
What we learned
- MTU – In order for the CCR1072-1G-8S+ (and really any MikroTik) to reach maximum throughput, the highest L2/L3 MTU (we used 9000) must be used on:
- The CCR
- The Guest OS
- The Hypervisor VSWITCH
- Hypervisor Throughput – Although virtualization has come a long way, there is still a fair amount of performance lost especially in the network stack when putting a software layer between the Guest OS and the Hardware NIC.
- CPU Utilization with no firewall rules or queues is very low. We haven’t had a chance to test with queues or firewall rules but CPU utilization never went above 10% across all cores.
- TCP tuning for 10 Gbps in the Guest OS is very important – see this link: https://fasterdata.es.net/host-tuning/linux/
Please comment and let us know what CCR1072-1G-8S+ test you would like to see next!!!