WISP Design – Migrating from Bridged to Routed

TFW adding the 301st subscriber to your bridged WISP….

 

 

Why are bridged networks so popular?

  • Getting an ISP network started can be a daunting task. Especially, if you don’t have a networking background.
  • Understanding L1/L2/L3 is not easy – I spent a number of years working in IT before I really started to grasp concepts like subnetting, the OSI model and Layer 2 vs. Layer 3. It takes a while.
  • Bridged networks are very attractive when first starting out. No subnetting is required and the entire network can be NATted out an upstream router with minimal configuration.

 

What does a “bridged” network look like?

  • Bridged networks use a single Layer 3 subnet across the same Layer 2 broadcast domain (typically over switches and software/hardware bridges) which is extended to all towers in the WISP
  • Bridging can be done with or without VLANs but they are most commonly untagged.
  • The diagram below is a very common example of a bridged WISP network.

 

What is the difference between switching and bridging?

These days, there isn’t much difference between the two terms, switch is a marketing term for a multiport hardware-accelerated bridge that became popular in the 1990s to distinguish it from hubs which did not separate collision domains. Both types can use VLANs, spanning tree and forward Layer-2 frames to multiple ports

  • Bridging (Software) – Most radios use some variant of linux and a bridge that is dependent upon the CPU to forward frames. Most routers will also allow ports to be bridged together in software and the speed is dependent upon system resources and load
  • Bridging (Hardware) – Most commonly, you’ll find this in vendors like MikroTik and Juniper. Certain hardware models allow the bridge to be offloaded into hardware instead of CPU so that frames can be forwarded at wire speed
  • Switching (Hardware) – This category includes most all ethernet switches. Frames are forwarded using ASICs that depend on a CAM table to hold MAC addresses.

 

What are some of the issues with bridged Networks?

 

  • Broadcast – One L2 broadcast will go through every radio and backhaul in a WISP network
  • ARP Traffic -Also an L2 broadcast type, ARP storms and heavy ARP traffic can easily cripple a bridged network
  • MAC Table size limitations – some equipment types have limited MAC table sizes. Often in CPU based equipment, the limitation is either RAM or default settings in the linux bridge which is typically limited to 1024 entries per bridge.
  • Scale – Typically anything beyond a /24 (254 hosts) will start to have issues without a number of L2 enhancements like client isolation, MAC filtering, etc. At some point those solutions don’t scale
  • Subnetting – ideally you don’t want multiple subnets on the same broadcast domain for security and isolation of failure domains
  • Performance – most routers are more efficient in routing packets vs. bridging packets in a larger network.
  • Security – Implementing security policies and isolating customers and protected infrastructure is much easier at L3

 

How does routing help?

Routing separates broadcast domains

Here is what a single broadcast domain looks like in a bridged network

 

 

And to compare, here is the same network but routed

 

How do I  Migrate?

 

Answer: Patience and planning (and VLANs)

The question you’ve probably been waiting for….how do I migrate? So the dirty little secret is that you don’t have to migrate all at once.

There are a few different ways you can use VLANs to migrate the network one tower at a time.

 

Where do I start?

 

  • Prep work – be sure to back up all configs and if possible, put together a diagram of your network that includes physical connections and VLANs / Subnets where applicable – this can be done with Visio, LucidChart or using mapping software
  • Pick a good time – Look at your monitoring software and pick a day and time that represents your lowest volume of traffic
  • Be realistic about time – If you think it will take 1 hour, plan for 4 – you’ll be amazed how often 1 hour turns into 4 🙂
  • Have a rollback plan – Understand what steps you need to take to roll back – even better, write it down!

 

Types of migration

 

Type 1 – Last mile back to the core – start at the very end of a chain of towers and work your way back in – one tower at a time

  • Benefits
    • Lower risk, only affecting one or two towers at a time at the end of a chain of towers.
    • Doesn’t specifically require VLANs in some network topologies but they are still recommended
    • Easy rollback, if it doesn’t work, replace the original config and analyze what went wrong
    • If you’re successful, you can move to the next closest tower and repeat the process which continues to shrink the broadcast domain – this also has the side benefit of helping to stabilize your bridged network as you migrate by making it smaller.
  • Drawbacks
    • Distance – in some networks, getting all the way out to the edge during a late night maintenance window can be a challenge

 

 

Type 2 – Core out to the last mile – start at the core or where your bandwidth comes in (often the same place) and work your way out – one tower at a time

  • Benefits
    • Physically closer to migrate the first hop
    • Can use VLANs to keep the existing bridged topology but route to towers that are converted
    • Same as the first, if you’re successful, you can move to the next closest tower and repeat the process which continues to shrink the broadcast domain – this also has the side benefit of helping to stabilize your bridged network as you migrate by making it smaller.
  • Drawbacks
    • Risk – if you have an issue with the first hop, you may take down a larger number of towers than 1 or 2 as compared to the first method.
    • Requires more config – you need to preserve the legacy broadcast domain through converted towers and then go back and clean it up

 

 

Type 3 – Build L3 to a new tower from the core – If you happen to have a new tower build on deck that will directly connect back to the core (where the gateway for the bridged L3 network is), then you can build a new tower as L3.  This helps to understand what’s involved and then use one of the previous two methods to migrate the rest of the network. 

  • Benefits
    • Lowest risk option – building out new sites in a different design is one of the lowest risk ways to migrate
    • No need for rollback – it’s new and not in service, so if you don’t get it right on the first try, you can keep working on it
  • Drawbacks
    • There aren’t any major drawbacks to this approach, except the rest of the network must still be migrated using one of the previous two methods

 

 

Using switch-centric design to assist with migration

In order to pass VLANs and the legacy broadcast domain through the network easier, consider putting all physical links into a switch at the core and the tower instead of directly into the router.

This type of design makes operation of the WISP significantly easier as new subnets and services that are needed don’t always need a trip to the tower to add cabling.

It also makes config migration easier when upgrading the tower router by putting most of the interface references into VLANs instead of physical interfaces.

Example of a switch-centric tower design

 

 

Closing thoughts…

This will help to get you started down the road to migration. Using a virtual lab like EVE-NG or GNS3 will help to understand the concepts before you deploy it in prod and is a good addition to the process.

Take your time and think through what you want to do and write down your plan – often you’ll find gaps when you create a list of steps which you can correct before migration and save time.

Good luck!

 

Need help with your Migration? Call the WISP experts at IP ArchiTechs

https://iparchitechs.com/contact-us/

ISP Design – Building production MPLS networks with IP Infusion’s OcNOS.

Moving away from incumbent network vendors

 

1466540435IpInfusion interivew questions

 

One of the challenges service providers have faced in the last decade is lowering the cost per port or per MB while maintaining the same level of availability and service level.

And then add to that the constant pressure from subscribers to increase capacity and meet the rising demand for realtime content.

This can be an especially daunting task when routers with the feature sets ISPs need cost an absolute fortune – especially as new port speeds are released.
whitebox-switch_500px-wide

Whitebox, also called disaggregated networking, has started changing the rules of the game. ISPs are working to figure out how to integrate and move to production on disaggregated models to lower the cost of investing in higher speeds and feeds.

Whitebox often faces the perception problem of being more difficult to implement than traditional vendors – which is exactly why I wanted to highlight some of the work we’ve been doing at iparchitechs.com integrating whitebox into production ISP networks using IP Infusion’s OcNOS.

Things are really starting to heat up in the disaggregagted network space after the announcement by Amazon a few days ago that it intends to build and sell whitebox switches.

As I write this, I’m headed to Networking Field Day 18 where IP Infusion will be presenting and I expect whitebox will again be a hot topic.

This will be the second time IPI has presented at Networking Field Day but the first time that I’ve had a chance to see them present firsthand.

It’s especially exciting for me as I work on implementing IPI on a regular basis and integrating OcNOS into client networks.

 

What is OcNOS?

ip-ocnos-main-1

IP Infusion has been making network operating systems (NOS) for more than 20 years under the banner of its whitelabel NOS – ZebOS.

As disaggregated networking started to become popular, IPI created OcNOS which is an ONIE compatible NOS using elements and experience from 20 years of software development with ZebOS.

There is a great overview of OcNOS from Networking Field Day 15 here:

 

What does a production OcNOS based MPLS network look like?

Here is an overview of the EVE-NG lab we built based on an actual implementation.

 

IPI-VPLS-2

Use case – Building an MPLS core to deliver L2 overlay services

Although certainly not a new use case or implementation, MPLS and VPLS are very expensive to deploy using major vendors and are still a fundamental requirement for most ISPs.

This is where IPI really shines as they have feature sets like MPLS FRR, TE and the newer Segment Routing for OSPF and IS-IS that can be used in a platform that is significantly cheaper than incumbent network vendors.

The cost difference is so large that often ISPs are able to buy switches with a higher overall port speeds than they could from a major vendor. This in turn creates a significant competitive advantage as ISPs can take the same budget (or less) and roll out 100 gig instead of 10 gig – as an example

Unlike enterprise networks, cost is more consistently a significant driver when selecting network equipment for ISPs. This is especially true for startup ISPs that may be limited in the amount of capital that can be spent in a service area to keep ROI numbers relatively sane for investors.

Lab Overview

In the lab (and production) network we have above, OcNOS is deployed as the MPLS core at each data center and MikroTik routers are being used as MPLS PE routers.

VPLS is being run from one DC to the other and delivered via the PE routers to the end hosts.

Because the port density on whitebox switches is so high compared to a traditional aggregation router, we could even use LACP channels if dark fiber was available to increase the transport bandwidth between the DCs without a significant monetary impact on the cost of the deployment.

The type of switches that you’d use in production depend greatly on the speeds and feeds required, but for startup ISPs, we’ve had lots of success with Dell 4048s and Edge-Core 5812.


How hard is it to configure and deploy?

It’s not hard at all!

If you know how to use the up and down arrow keys in the bootloader and TFTP/FTP to load an image onto a piece of network hardware, you’re halfway there!

Here is a screenshot of the GRUB bootloader for an ONIE switch (this is a Dell) where you select which OS to boot the switch into

ONIE GRUB

The configuration is relatively straightforward as well if you’re familiar with industry standard Command Line Interfaces (CLI).

While this lab was configured in a more traditional way using a terminal session to paste commands in, OcNOS can easily be orchestrated and automated using tools like Ansible (also presenting at Networking Field Day 18) or protocols like NETCONF as well as a REST API.

Lab configs

I’ve included the configs from the lab in order to give engineers a better idea of what OcNOS actually looks like for a production deployment.

IPI-MPLS-1

 

IPI-MPLS-2

 

IPI-MPLS-3

 

IPI-MPLS-4

 

 

MikroTik PE-1

 

 

 MikroTik PE-2