Going from Static Routing to Dynamic routing from Edge to Towers. (Part 2)

In My last post I went over the issues I was facing and the first steps I took to implement BGP to my upstreams. In this post I will look more at the internal issues I was seeing and the decisions I made to handle them.

Game plan: Network Isolation, Vlans. Routing Protocol, eBGP:

First off, there are many different ways to skin this cat of dynamic routing. Many people would point to OSPF as a solution, MPLS or some other system that gives you similar end products with better results because of one thing or another. I am not recommending my solution to other people, I am merely documenting my thought processes and the issues involved with switching over your network. The way I am running BGP isn’t ideal for many networks, OSPF was my initial choice, but because of restrictions in my abilities, my networking hardware and the time frame I was working with I decided to go this route. So, have mercy in your considerations I don’t even have a degree in this stuff and I am working with what I have.

First off, a year ago I had experimented with implementing OSPF over the network. I took weeks of learning and testing, and got some really good results, but when it came to actually implementing it into the network for a single tower back to my core network, I couldn’t get it to work. So the operation was scrapped for another day.

So, I did seriously consider setting up OSPF instead of BGP, but;

I was gun-shy of trying to set it up again
I knew I had most of the knowledge already to do the project with BGP
I would lessen the amount of training I would have to do when showing someone else how the network runs
I didn’t see the benefits of OSPF being as helpful to my network. I mostly want the dynamic routing to pick the path I want and not use something else unless it must.
Finally, I had just recently been promoted and was given the task to make a decision quickly and implement it, so I did.

I considered looking at other protocols, but I didn’t feel I had time to start from scratch and I certainly didn’t have time to both learn a protocol and learn how Mikrotik “implemented” it

So I went back into my lab, and emerged with a game plan, but I also realized something. I couldn’t do this in the middle of the night. There was minimally 4 years of poorly documented firewall/NAT/route rules in each of a dozen routers, which I had been fixing, but never had the time to take care of completely. There were routes that went places even I didn’t know, un-commented and undocumented firewall rules, NAT rules that were for IPs we never used, and some of the fiber was acting up. Making a situation where I ended up having to do open heart surgery on the network. Never a good idea, but lot’s of fun.

Here’s what I was going for: a unique layer 2 between each router that had a direct connection to another router. Isolated layer 2s between each tower and it’s sub tower(s). Isolation anywhere and everywhere I could justify it. eBGP would the system that enabled each router to know where/how to forward traffic. If I could just run a cable between routers I pretty much did that, but most of the time I used VLANs to pass traffic and VLAN filtering on my switches to prevent unexpected traffic from moving around.

“eBGP?” you say, yes, I wanted to very tightly control my network and I didn’t want to have to setup a BGP peering between every single one of my routers or setup route reflectors in my network. I was going to have each tower be treated like just another network entity on the internet, it talked to it’s peers and share what routes were connected/peered to it. Although I didn’t want to push the entire internet routing table to each of my towers. I figured if eBGP works well enough for the internet, it would be fine for my uses. A side benefit to this was that as I added towers to the BGP network, I only had to change settings on routers that were going to be directly connected to each other. Note, to actually do eBGP, I used private AS Numbers and had to make sure I didn’t leak them outside of my network using Aggregation on my edge/core routers.

Actual implementation, the Edge:

First off I had to connect my two core routers together through iBGP, (Still managed to find a spot for it) have them share all their routes to each other and let them route things out whichever path was more efficient. This took forever, mostly because I didn’t understand exactly how Mikrotik implemented BGP. A side issue was that we were doing our NAT translation on our core/edge routers. While this was fine for our statically routed network, this needed tuning to work with the new dynamic routing system. (The edge routers were NATing to different IPs, which would cause issues for our customers that have resources in different network ranges thereby introducing the possibility that a single customer might present as two different public IP addresses to a service if they routed out both of our upstreams)

The best option, and probably where we will end up in the future, would be to move where we are doing our NAT from the edge to a location slightly inside our network. Instead, what I did was to change our firewall rules from:

chain=srcnat action=src-nat to-addresses=X.X.X.17 src-address=172.16.4.0/24 out-interface=sfp1

To:

chain=srcnat action=src-nat to-addresses=X.X.X.17 src-address=172.16.4.0/24 dst-address-list=!Internal-Networks

I used an address list called “Internal-Networks” that listed every single subnet we used internally. A handy address list that I had previously created in my purge of my firewall rules. This way, the packet would get NATed then sent out whichever upstream had the preferred route. I also did not need to buy/add more equipment to my network, always a plus in the boss’s book. Now, this isn’t as efficient as the previous solution, every single connection is having to be compared to the address list and that is much slower then just checking which interface the connection is heading out of. But I have a lot of processing power not being utilized on my core routers so I made the sacrifice for now.

With that implemented I connected my two edge/core routers together and tried to have them share routes so their respective BGP instances could decide which path would be more efficient for traffic to head out over. After a week of fighting with my config (not straight, I had other things to deal with as well) I figured out I had two different instances of BGP running on my routers, one for between my routers and one for my upstreams. I needed to have a single instance so all the public routes would be compared by BGP before being inserted into the routing table. Thanks to Greg Sowell who was helping me at the time I figured out my mistake.

When I finally have the iBGP system up and running I saw the real benefits, particularly in that I was utilizing my cheaper upstream more because they had better peering agreements then my main upstream. Not only was I saving myself money, but various services saw significant performance improvements.

This is because not only was my customer traffic following a, likely, more optimized route to it’s destinations, but it could now come back over a, likely, more optimized route. I say “likely” because BGP doesn’t promise the lowest latency or largest bandwidth path to your destination, but instead the path that traverses the least number of networks… Most of the time. While this is good enough “most of the time” I have already done some route optimization so my traffic gets a little guidance in how it gets out to the internet. (More on this later)

Even with those improvements, I was still stuck in my layer 2 statically routed network. The issue was compounded because we were about to introduce a loop into our network. I will talk about how I handled that next time.

Next article I will deal with connecting my towers to my core network and how I am controlling my network traffic.