Going from Static Routing to Dynamic routing from Edge to Towers. (Part 1)

Setting up BGP on the edge.

So, this is going to be my thoughts, preparations and experiences going through a network that was statically routed from customer to upstream. At the end we now have BGP running on our two core routers, isolated layer 2 links between routers and eBGP managing routes from/to my core and towers. I did most of this work during normal operating hours as well, not recommended, but I did so without significant interruptions to my customers and I was able to keep a sane sleep schedule.

Disclaimer, I am not recommending that anyone actually follows what I did at any point in these articles. There are many good options and arguably some of my solutions are not one of them, but I feel like the experience was good for learning and demonstrating cool things about networking. I will try to mention other good ideas as I go along and I will try to point out how my decisions might not be best for everyone. Secondly, I won’t be going into exact details of my config, unless it illustrates what I

Situation and Goals

Now that we have the disclaimers out of the way, I want give you a brief overview of how the network was setup and why I felt, and my boss knew, we needed to change to a different way of telling our routers how to route. This network has been built over 12 years, initially using hacked Linksys WiFi Routers to be both Access Points and routers, there have been several intervening steps, but 8 years ago we converted almost entirely to Mikrotik routers at both our towers and in our core network and that is what I have to work with today. Each tower is running a CCR-1009 and in our core we have two CCR-1036 routers, one for each of our upstreams. You can see a simplified diagram that shows mostly layer 2 links across the network. What is missing are lots of customer links that are inside the massive layer 2 and various bonded links that are not really important to this discussion.

A VERY simplified look a the network I was working with — Customers were connecting into lots of different parts of this network and it was very much more complicated then this. Though most were behind tower routers like RO, KB, PV and HU.
Unless otherwise noted, most links were wireless.

This network also came with massive firewall lists, a year ago we had ~500 firewall rules in each of the core routers. Very little documentation and some of the rules were made without knowledge to how the Mikrotik firewall actually works. I have been pairing them down to a more manageable number, but there is a massive amount of effort and stress in doing so. Sometime I will write up some notes on working on that, but this article is going to be long enough as is. Pretty much, no change could be implemented without a lot of testing to figure out which rules were breaking things.

My goals were pretty simple, increase network redundancy, simplify network layout (more PTP layer 2 connections, fewer/none large layer 2/3 networks connecting my core to towers), increase our benefit from having multiple connections out to the internet, allow us to utilize more different types of connectivity options and enable us to have a more flexible network architecture for growth.

With those requirements, my only real option for my edge routers was to get myself a pubic AS number and setup eBGP to each of my upstreams and iBGP between my core routers.

Actually implementing BGP:

So, my first step was to get BGP running on our “core routers”. To do that I had to do a lot of learning, I knew nothing besides generalities. Luckily for me, the resources of the internet are plentiful. I want to send a huge thanks out to Greg Sowel and his BGP lab tool, being able to get a full internet feed was invaluable for understanding the breadth of effects to my network. It also let me test my filtering rules and my overall BGP setup and find lots of flaws in my assumptions. Unfortunately, my test lab did not have the same equipment as my production network, it actually couldn’t take a single full BGP feed, and the sheer number of managed switches and different types of links made duplicating the network, even generally, impossible for my budget. So testing in a lab only had so much benefit. None the less it let me get the basics figured out and made the implementation into the real network go much more smoothly.

Next up was to get my company an AS number from ARIN, (Not technically necessary, if I wanted to I could have worked out a private AS for each of them and gotten similar results, but that won’t work for all situations) then contact my upstreams and let them know what I was doing. They had us fill out some paperwork, pretty straightforward stuff and probably the easiest portion of this whole process. My upstreams each sent me the information that I would need for setting up the BGP session and requested their needed info from me. (port for BGP, and other basic stuff.)

You don’t really just “turn on” BGP, you have to work with your upstreams and schedule times for both of you to be on the phone as you turn up the connection. Nobody likes to do work during service windows, (aka, middle of the night) so, we didn’t. Honestly, if you do your stuff right you are not going to break anything, it’s all in your prep work, the rest is handled by your routers. The biggest problem that can happen is with your filters, or I guess your router could tip over while loading the full BGP table, but you should be able to test/verify that your router will do it’s job ahead of time.

The transitions went pretty smoothly on my end, all things considered. I prepared ahead of time by testing out my config and making sure the general details were set correctly. Even so, my upstream and I confirmed the settings we were using to each other just before we turned everything on. Note, we decided to have 2 separate BGP sessions running, one for IPv4 and one for IPv6, while it wasn’t necessary, it felt like a good idea in case one protocol acted up.

Things I learned:

Routers route to the smallest active route that contains the destination IP address, that means you can keep your default gateway route up on your routers and let BGP load it’s IPs and the router will push traffic right along until BGP loads the routes into the table at which point it will push the IPs to their respective destination. This can be handy to use if your routers take a while to load all their BGP routes. Let your upstream know if you are keeping the default route.
BGP probably doesn’t work the way you think it does especially in the case of your certain router manufacture. Keep a close eye on the documentation and how routes are loaded.
Filter, then test your filters to make sure they are working. I won’t repeat what is already probably said very often about what and why you should filter.
It’s totally cool to bring up your BGP session with your incoming/outgoing filters set to block your own subnets and everything else at first. You get to verify that the BGP session comes up, then move on to allowing your subnets out and letting subnets in. Then just verify with your upstream that they see the routes coming from you. Maybe even consider blocking everything under the size of a /22 or /23 even as well. Then, when you have verified everything is working allowing more and more of the internet in. (If you do block everything under /23 or /22, make sure you have a default route setup, not all networks can be reached if you block /24 networks or larger)
If you have not dealt with ARIN (or whoever your regional registry is) before, their rules may look hard to understand at first and there is a lot of old/outdated/incorrect information out there. If you are unsure it helps to chat with someone who deals with them regularly or you can just ask them yourself.
You are not the only one who can screw up a config, be very clear with your upstream with what you are doing. Mine forgot a step and broke their network a bit when I started announcing subnets they had been announcing for me. There isn’t a lot you can do and don’t be a jerk, but it might help.
Watch your bandwidth going through your routers when you make any significant change, if there is a substantial drop then normally something has gone wrong. My upstream saw a drop and figured it was a blip, that blip turned into a multi hour head ache.
There are a bunch of online “looking glasses” which are basically companies that allow you to check out their routing table and see what they see. It’s pretty fun to see your subnets going out across the internet. Also useful to make sure your upstream is actually announcing all of your subnets correctly.
If your router supports BFD, you might consider using it for your BGP session. Ask your upstream provider if they support BFD and will turn it on for your connection. (In my experience, unless you are using an unreliable link you will see benefits in using BFD.
It’s a good idea to separate different protocols into different sessions, I separated IPv4 and IPv6 into their own sessions. This “could” allow one of the protocols to continue working if there was an issue with the other session/protocol. Not a lot of backup, but it does give you a little bit of protection and some flexibility for future configs. (Thanks Nick Buraglio for the tip)

Once I got my two core routers running eBGP to my upstreams I didn’t see many big changes in my network performance, of course. Effectively, I had not changed anything of the flow of my network besides how traffic could get back to me. Of course there were some changes, slightly lower latency to a couple services and a little bit better utilization of one of my upstreams bandwidth to me.

My next step was to connect my two edge routers together and have them share routes between each other so they could each send traffic out the better route. (aka, setup iBGP) I also needed to decide how I would connect my towers back to my core network so they could stay running in case one of my core routers went down.