…but tools have to be used responsibly.
first of all, short disclaimer - I’d like to make it perfectly clear before we go into this long piece, that I’m a:
…big fan of discussing merits of technology and technology overall. I love technology. I believe having opportunity to create networks, solutions that really connect people and give us chance to exchange information is something I could do for the rest of my life, with full focus and commitment.
so, off we go!
we recently had a short, but interesting exchange (to the point where you can have interesting exchanges in what’s today essentially another social media platform, that’s full of trash and narcissists - LinkedIn) with Pavel Odintsov about “payload match extension” draft for FlowSpec. only day later, almost like to prove my point, CenturyLink/Level3 was hit by very nasty outage, caused by BGP FlowSpec going berserk.
what happened is that apparently, as part of normal operations, they were trying to block some malicious traffic. instead specific filter criteria they distributed wildcard for source specification or destination specification. obviously, Murphy Law immediately kicked in - problem hit exactly where it could cause most damages - in their BGP peering sessions. some customers had their BGP sessions still up, so routers were trying to route traffic over non-functional forwarding paths, and this complicated things even further (as there was no clean failover to other links despite them being available but less preferred). outage lasted for over 5 hours and impacted seriously connectivity for most of US and some other parts of world. that obviously was not the first outage caused by problems with FlowSpec implementation or configuration. only a week before similar thing happened to Verizon Wireless.
going back to our discussion - I tried to make a point, that stacking complexity over complexity never ends well, and doesn’t scale. in this case it’s about complex extension (BGP FlowSpec payload matching) being proposed to be deployed over already complex protocol (BGP FlowSpec itself). and that true engineering is something much more than just jumping happily between one technology and some other, claiming that “technology is great”. of course it is, but people that use it, should also use brains.
honestly speaking, I was - and to some extent I am still - big fan of FlowSpec itself. the idea was initially drafted by Pedro Marques (Cisco Systems), Nischal Sheth (Juniper Networks), Robert Raszuk (Cisco Systems), Barry Greene (Juniper Networks), Jared Mauch (NTT America), and Danny McPherson (Arbor Networks) back in 2009 and soon enough we’ve had first implementations in networking gear. previously available mechanisms to fight with DDoS attacks on SP scale, were not very granular - actually, they consisted of decision whether to drop IP packet based on source or destination (BGP blackholing), redirect it somewhere (BGP sinkholing) and/or apply some QoS policy to it (QoS Policy Propagation via BGP).
the problem with FlowSpec at this very point was that it’s complex. not to read or understand by human, but rather complex to validate over BGP update by receiver, and properly implement in data plane hardware. while atomic operations mentioned above (drop, forward, apply QoS policy) were already implemented in hardware by all leading networking vendors by that time, they were there based on well known and well polished hardware capabilities. routers are made to route packets (that includes, paradoxically, dropping them as well if needed). to properly handle all specified FlowSpec operations at the same time, routers had to use much more complicated classifiers. that includes capabilities found up to now in typically three different separate router subsystems:
- routing subsystem, providing option to filter based on source and destination IP address; RFC 5575 defines those operations as FlowSpec Type 1 (Destination Prefix) and Type 2 (Source Prefix)
- filtering subsystem, providing ACL-like capabilities; RFC 5575 defines port matching operations (Type 4, 5 and 6), ICMP matching (Type 7 and Type 8), TCP flags matching (Type 9), IP Fragment (Type 12) and IP Protocol number (Type 3)
- QoS subsystem; RFC 5575 defines DSCP matching operations (Type 11) and packet length (Type 10)
if your favorite router architecture doesn’t have those, or they are implemented by different set of features - don’t worry, that’s not really important at this point. what’s important is that rules defining packet characteristics suddenly could be quite complex, like for example:
- TCP packets with only SYN and ACK set, going to destination ports 22, 23, 143 and range between of 6499-6899, with destination address of specific Customer prefix; each criteria by itself can be easily addressed by existing ACL filtering engines, once you stack them up - it becomes more complex, but not as:
- packet of sizes between 100 and 280 bytes, with source defined as “any”, destination address defined as our own network prefixes, destination port 53 with protocol set to UDP and DSCP set to AF31; for such specification you have to have filtering engine, QoS engine and likely packet length matching logic (if your router has one) to be active at the same time with each and every packet (because you never know what’s coming next)
if you paid attention and implemented this carefully, execution of complex filters and actions wasn’t too much taxing. at the same time, right from the beggining, one thing was clear - it’s not as scalable as BGP blackholing or sinkholing. you could put literally hundreds of thousands of prefixes in routing table of even smaller service provider edge routers.
some of SPs I know, tend to keep in their blackholing or sinkoling tables on average of anything between 50k and 100k prefixes (/32s and /128s) during “normal” daily operations - that’s the reality of todays typical day in the internet under rain of smaller and bigger DDoSes)
with FlowSpec… well, you can fit only between 3-8k and 5-8k of simplest filters. with anything as complex as above, scale falls down quickly and sometimes - unpredictably. will technology move forward and next generation platforms have increased scale? sure, they may have. but again, proper engineering is different than brute-forcing through problem.
configuration complexity of BGP FlowSpec itself is not so big by itself. what’s problematic is validation on receiving end (router that is) if the flow specification actually makes sense. early adopters found that in the hard way as implementation bugs and vulnerabilities were pretty common, and unfortunately still are. they will happen of course in future as well, because we’re dealing with new protocol extension, that binds together features from different router subsystems and it’s complex. mind it, those are 12 “easy” classifiers from basic BGP FlowSpec specification.
with recently submitted FlowSpec payload matching draft that Pavel was so kind to highlight, authors make couple of interesting observations. while implementing it could potentially simplify whole initial FlowSpec RFC (because of more flexible matching operators), fundamentally we’re still at the mercy of code quality on the receiving end, as there’s no way to validate formally if the definition makes sense within protocol itself. on top of it, there’s regular expression filter specification - which sounds great for your x86 or ARM laptop, but not so much for hardware router that’s built to be very fast at looking at headers, not at the payload.
BGP FlowSpec is complex tool by itself, with many scale constraints. is introducing complexity to your network bad? yes, it is.
if you need proofs - look at basically all major outages for the last three decades. they did happen because somebody decided that one additional tweak, knob, change, protocol or configuration addon will be good idea. and while some of the affected ones typically test this new knob, rarely such changes are tested as a system. and complex systems fail in complex ways.
if you happen to have time to read books like Russ White’s Navigating Network Complexity or blog of Ivan Pepelnjak you’ll notice the same theme is repeated again and again by people that spent literally their lifes on designing and troubleshooting networks - complexity is enemy of well engineered, scalable and sustainable network.
RFC 1925, called “The Twelve Networking Truths” and written down in 1996 has two interesting points to note in the context of this discussion:
(3) With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. […]
(12) In protocol design, perfection has been reached not when there is nothing left to add, but when there is nothing left to take away.
you can force your solution over common sense, but effects may be less than stellar. you can make things complex to the point where there’s need to build another layer of abstraction just to be able to comprehend it - but following KISS principle (as in “Keep It Simple, Stupid”) tends to fare better.
and that’s the story with FlowSpec - because of it’s complexity and the fact that existing edge and core router hardware architectures simply are not flexible enough to handle DPI functions with the same scale and capabilities like routing.
yes. have you ever wondered why “The Internet” scales well and is still working despite having over 40 years? because underlying protocols and concepts are simple. and they were designed with single principles in mind. in this specific context - that future holds so many unknowns we better be conservative to our own behavior and at the same time flexible with regards to external entities or forces (so called “Robustness principle” also known as John Postel law):
TCP implementations should follow a general principle of robustness: be conservative in what you do, be liberal in what you accept from others.
so I like to engage in all geeky discussions where and how technology can be used to serve specific purpose, and how new features can find their place as something useful and not something superflous. but just the fact that something works or can be done, doesn’t mean that it should be done.
just to again underline, I believe BGP FlowSpec overall is interesting tool that gives you additional capabilities. I even introduced BGP FlowSpec into my BGP Blackholing PL project, to make sure people working in real life operational networks will be able to understand major differences between traditional tools like BGP QPPB, BGP blackholing, BGP sinkholing, uRPF and ACLs and such new concepts like BGP FlowSpec. part of the reason is - I’m treating BGP Blackholing PL as opportunity for others to learn in very safe environment how to operate and properly treat external sources (feeds) of information using BGP protocol. however, couple of years ago, when talking in person to major European SPs on our annual Cisco SP Security Forum event, i’ve learned (after delivering BGP Blackholing/Sinkholing/FlowSpec deep dive session, with config examples, remediation examples and all that hardcore stuff!) that most of them are actually stepping back from using FlowSpec on routers, and instead opting to use dedicated software and appliances. why could that be? because of those bugs I mentioned above?
unfortunately yes, but that has also more to the deal with the fact, that adding such fragile components to already fragile networking stacks or whole Network Operating Systems only increases operational complexity and chances of failure - in form of outage.
so some SPs adopted or will adopt FlowSpec as one of the tools in network security teams (like - with CenturyLink/Level3 example that started the whole thread). but it requires much more effort to make sure it won’t go berserk after one of the filter specification updates, and sever links or drop network critical communication in unpredictable manner. our very own Cisco Service Provider team even validated FlowSpec as part of recent peering design refresh - but look how carefully crafted templates they have in mind to avoid causing breakdown. we’ll see.
BGP FlowSpec is complex tool by itself, with many scale constraints. and when I hear praises that it’s great tool that will save the world I literally cringe.
why? well, idea to match traffic based on packet payload, on SP edge/aggregation routers with payload specification announced over another extension to BGP smells very much like software-ish idea. turning hardware optimized router into glorified DPI engine is definitely one of those ideas that look good to “software” people. or, to someone that’s working for vendor that already wanted to put everything that’s possible into BGP decade ago, and failed miserably when bloated, single-threaded daemon couldn’t even serve IPv4 Unicast AF in stable and predictable manner. are we sure this time we took all the lessons that were to be taken and somehow this proposal fixes previous omissions? and again, to be clear - I’m not picking on the draft authors, or Pavel. I’m picking directly and with full consciousness on the YOLO approach to deploying software in the networks because it’s “great”.
if you do really believe it’s great idea that solves world hunger, and you’re coming from software world, I perfectly know reasoning behind your belief. you simply don’t have operational experience, and haven’t worked with problems caused by such “great ideas” injected in the middle of network.
the fact that you wrote applications or even created whole ecosystems in whats-the-name-of-this-weeks-favorite-software-framework is great, and kudos to you. I was professional programmer myself for couple of years and I know how hard and at the same time satisfying this work is. I’m not trying to disrespect anyone, and specially programmers. but such experience has nothing to do with assumption or even belief that having programming background immediately means you understand every other technology area with a flick of a finger.
and again, I’m not pulling things out of the blue just to make a point. I’m speaking from experience. I’ve seen programmers reinventing wheels when faced with networking problems in number of ways. trying to adopt “let’s restart until it works” way of thinking to anything they don’t understand or don’t want to understand. my close friend watched in disbelief how well-paid programming team working with cloud environments reinvented tagging (802.1Q to be exact, as around “4000” tags should be enough, right?) after 6 months of hard work (no, they didn’t want to listen). and their task? they were to create separation between application tenants working on single physical and/or virtual servers. imagine that! they reinvented frigging frame tagging concept to signify what goes where. what a colossal waste of time, money and actual life of talented people! and if you claim that as a programmer you have experience working for “big cloud or content provider”, I’m again calling this as not only totally unrelevant - actually, as something harmful. we have many proofs that people coming in from big cloud and content providers typically have skewed view of world, and tend to suffer from ’not-invented-here’ syndrome. that’s why if you talk to (for example) typical ex-Google programmer about solving some kind of problem, he’ll spin up couple of hundreds of Kubernetes containers with BigTable running in the background even before you end describing the actual issue at hand.
to sum it up - innovation is great, ideas are great, but when they surface, there will be people crazy enough to try to implement them. and sometimes they lack experience to decide if that’s actually good idea. don’t encourage them.
on the contrary. there’s always hope :)
first of all, try to select tools appropriate for the job, not vice versa. that’s one of the most important principles in engineering. there’s old saying that if all you have is a hammer, likely everything will look like a nail. it’s very easy to fall into and I have to admit, I was there as well.
in this specific idea - when you’re looking on a world from BGP perspective, everything can be bolted on top. text communication in communities? sure. chess? well, why not? traffic filtering and QoS? yeah, why not? don’t. you’ll end up like examples above sooner than later.
if you’re in a market for anti-DDoS function for your network, there are fine companies offering dedicated solutions. if you’re out for DPI function - there are others as well. if you look to classify applications at the edge of your network and then apply some policies - again, there are validated and proven technologies that work and can be scaled out for edge applications. would those solutions work with your networking infrastructure? sure they will. but try to keep their “soft” underbelly focused on their primary purpose and don’t let them offload all the heavy lifting back to your hardware. if you agree to that - why would you need them in the first place?
if you ask this question and answer starts with “because of potential of ML…”, or “because of AI inferred…” or “but blockchain has…” run - run as fast as you can and don’t look back!
if, like me, you believe that pushing BGP to carry more and more garbage is wrong, you should follow Robert Raszuk advice and subscribe to IETF IDR WG mailing list, to help stop such things being proposed in the first place.
obviously - if you want to tinker with fragile constructs that can be made to work but won’t scale and will increase complexity of your network - you have absolute right to adopt BGP to yet another role. of course, we have more and more raw CPU power today and GBs of RAM memory on our de-facto server-class Route Processor cards in all kinds of todays routers. but that means you’ll be brute forcing the design today. will it work in a month? year? three years? and again - let me quote rule #3 again from RFC 1925:
(3) With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. […]
please, don’t force pigs to fly. think of users (your users!) that those pigs will be flying over. would you feel comfortable being in your customer shoes, standing below one of these?