For two hours on Sunday, February 24, 2008, viewers across most of the world were unable to reach YouTube. The root cause of this outage was a flaw in the Border Gateway Protocol (BGP), which governs how routing data propagates across the Internet.
It all began the previous Friday when the Pakistan Telecommunications Authority ordered the nation's Internet service providers to black out a YouTube video it feared would trigger riots. Pakistan Telecommunication Company Ltd. (PTCL), Pakistan's largest Internet provider, responded by blocking the entire YouTube site through a quirk in BGP that's readily exploitable by anyone who controls a BGP-enabled router (border router) of an autonomous system (AS) such as PTCL.
As explained in RFC 4271, an AS is a network or collection of networks that "appears to other ASes to have a single coherent interior routing plan, and presents a consistent picture of the destinations that are reachable through it." Examples of ASes are the U.S. DOD's unclassified network, which is the world's largest; the Level 3 Communications backbone; and AT&T WorldNet Services. Leveraging off the interactions among the ASes, attackers can exploit flaws in BGP to deny service, reroute traffic through or to malicious hosts, expose network topologies, or even trigger instabilities that can damage vast swathes of the Internet. However, as detailed below, tools are available that can help sysadmins prevent or more easily fix these problems.
In the YouTube case, PTCL connects to the Internet solely through another AS, PCCW Ltd., a Hong Kong telecommunications company. PTCL technicians failed to warn PCCW to block the bogus route. Consequently, it spread through PCCW into the rest of the Internet.
The YouTube case was unusual because of its high profile. However, BGP blackouts of less well-known sites are common. For example, a study in 2006 found that some 200 to 1200 (between 0.2% and 1%) of the entries in a typical routing table were invalid.
On average, a bogus prefix can misroute 50% of Internet traffic, but losses may range from 0% to 100%. A prefix is a combination of a netmask and a network IP address that refers to a block of addresses. It looks something like 192.168.0/24, which is a combination of 192.168.0.0 for a network IP and 255.255.255.0 for the netmask, and points at the address block of 192.168.0.0 through 192.168.0.255.
Routing errors can be hard to detect because Internet connectivity changes often. According to a Sprint technical report, "there is continuous BGP 'noise' (around 50–200 updates/minute) interspersed with high 'churn' periods (9000 updates/minute)."
Worse, it's hard to see whether a routing change is legitimate. Most ASes try to prefer a route provided by an AS for addresses within itself, versus a route advertised by an outside source. However, BGP makes it difficult to detect routes created by outsiders, and does nothing to prevent their creation, despite the likelihood that these routes will be bogus.
The inherent complexity of BGP is another issue. Indeed, the causes of route oscillations and convergence difficulties are so obscure that researchers fail to agree on which flaws in the protocol make them possible.
Internet growth aggravates these problems. For example, from 1997 through 2005, routing tables grew from 3,000 to more than 17,000 ASes, and from 50,000 to more than 180,000 routing prefixes.
Several fixes to BGP have been proposed and rejected. In 1996, BBN Technologies proposed Secure BGP (S-BGP), which would have established a public-key infrastructure to prevent abuse of the system. Opponents said it would require routers with more memory and processing power than is feasible. The Internet Corporation for Assigned Names and Numbers (ICANN) proposed a trusted central database, but this, too, was rejected as unworkable.
An improved version of BGP is still over the horizon. Consequently, network defenders must remain alert to routing disruptions and prepared to combat them as best as possible given today's environment.
How Attackers Can Exploit BGP
Because a nation state can easily control the border routers of many ASes, both overtly and covertly, cyberwar is potentially the most serious scenario for BGP exploitation. However, any individual can get into the act by breaking into a BGP-enabled router. The Internet has many poorly managed edge networks whose border routers are more likely to suffer default passwords or remotely exploitable vulnerabilities, or allow Telnet logins.
Even without control of a border router, an attacker may send it off-path, third-party reset (RST) segments with forged source addresses of victim routers. This makes it appear as if the victim routers have gone offline, removing them from the routing table.
The TCP MD5 Signature protocol, adopted in 1998, was supposed to make it impossible to forge resets. However, discovery of mathematical flaws in MD5 and the power of today's processors mean that it only takes hours for a laptop to find a valid sequence number.