Distance Vector Routing Protocols
Most routing protocols fall into one of two classes: distance vector or link state. The basics of distance vector routing protocols are examined here; the next section covers link state routing protocols. Distance vector algorithms are based on the work done of R. E. Bellman,1 L. R. Ford, and D. R. Fulkerson2 and for this reason occasionally are referred to as Bellman-Ford or Ford-Fulkerson algorithms.
The name distance vector is derived from the fact that routes are advertised as vectors of (distance, direction), where distance is defined in terms of a metric and direction is defined in terms of the next-hop router. For example, "Destination A is a distance of 5 hops away, in the direction of next-hop router X." As that statement implies, each router learns routes from its neighboring routers' perspectives and then advertises the routes from its own perspective. Because each router depends on its neighbors for information, which the neighbors in turn may have learned from their neighbors, and so on, distance vector routing is sometimes facetiously referred to as "routing by rumor."
Distance vector routing protocols include the following:
- Routing Information Protocol (RIP) for IP
- Xerox Networking System's XNS RIP
- Novell's IPX RIP
- Cisco's Internet Gateway Routing Protocol (IGRP)
- DEC's DNA Phase IV
- AppleTalk's Routing Table Maintenance Protocol (RTMP)
Common Characteristics
A typical distance vector routing protocol uses a routing algorithm in which routers periodically send routing updates to all neighbors by broadcasting their entire route tables.3
The preceding statement contains a lot of information. Following sections consider it in more detail.
Periodic Updates
Periodic updates means that at the end of a certain time period, updates will be transmitted. This period typically ranges from 10 seconds for AppleTalk's RTMP to 90 seconds for Cisco's IGRP. At issue here is the fact that if updates are sent too frequently, congestion may occur; if updates are sent too infrequently, convergence time may be unacceptably high.
Neighbors
In the context of routers, neighbors always means routers sharing a common data link. A distance vector routing protocol sends its updates to neighboring routers4 and depends on them to pass the update information along to their neighbors. For this reason, distance vector routing is said to use hop-by-hop updates.
Broadcast Updates
When a router first becomes active on a network, how does it find other routers and how does it announce its own presence? Several methods are available. The simplest is to send the updates to the broadcast address (in the case of IP, 255.255.255.255). Neighboring routers speaking the same routing protocol will hear the broadcasts and take appropriate action. Hosts and other devices uninterested in the routing updates will simply drop the packets.
Full Routing Table Updates
Most distance vector routing protocols take the very simple approach of telling their neighbors everything they know by broadcasting their entire route table, with some exceptions that are covered in following sections. Neighbors receiving these updates glean the information they need and discard everything else.
Routing by Rumor
Figure 4.3 shows a distance vector algorithm in action. In this example, the metric is hop count. At time t0, routers A through D have just become active. Looking at the route tables across the top row, at t0 the only information any of the four routers has is its own directly connected networks. The tables identify these networks and indicate that they are directly connected by having no next-hop router and by having a hop count of 0. Each of the four routers will broadcast this information on all links.
Figure 4.3 Distance vector protocols converge hop-by-hop.
At time t1, the first updates have been received and processed by the routers. Look at router A's table at t1. Router B's update to router A said that router B can reach networks 10.1.2.0 and 10.1.3.0, both 0 hops away. If the networks are 0 hops from B, they must be 1 hop from A. Router A incremented the hop count by 1 and then examined its route table. It already knew about 10.1.2.0, and the hop count (0) was less than the hop count B advertised, (1), so A disregarded that information.
Network 10.1.3.0 was new information, however, so A entered this in the route table. The source address of the update packet was router B's interface (10.1.2.2) so that information is entered along with the calculated hop count.
Notice that the other routers performed similar operations at the same time t1. Router C, for instance, disregarded the information about 10.1.3.0 from B and 10.1.4.0 from C but entered information about 10.1.2.0, reachable via B's interface address 10.1.3.1, and 10.1.5.0, reachable via C's interface 10.1.4.2. Both networks were calculated as 1 hop away.
At time t2, the update period has again expired and another set of updates has been broadcast. Router B sent its latest table; router A again incremented B's advertised hop counts by 1 and compared. The information about 10.1.2.0 is again discarded for the same reason as before. 10.1.3.0 is already known, and the hop count hasn't changed, so that information is also discarded. 10.1.4.0 is new information and is entered into the route table.
The network is converged at time t3. Every router knows about every network, the address of the next-hop router for every network, and the distance in hops to every network.
Time for an analogy. You are wandering in the Sangre de Cristo mountains of northern New Mexicoa wonderful place to wander if you aren't lost. But you are lost. You come upon a fork in the trail and a sign pointing west, reading "Taos, 15 miles." You have no choice but to trust the sign. You have no clue what the terrain is like over that 15 miles; you don't know whether there is a better route or even whether the sign is correct. Someone could have turned it around, in which case you will travel deeper into the forest instead of to safety!
Distance vector algorithms provide road signs to networks.5 They provide the direction and the distance, but no details about what lies along the route. And like the sign at the fork in the trail, they are vulnerable to accidental or intentional misdirection. Following are some of the difficulties and refinements associated with distance vector algorithms.
Route Invalidation Timers
Now that the internetwork in Figure 4.3 is fully converged, how will it handle reconvergence when some part of the topology changes? If network 10.1.5.0 goes down, the answer is simple enoughrouter D, in its next scheduled update, flags the network as unreachable and passes the information along.
But what if, instead of 10.1.5.0 going down, router D fails? Routers A, B, and C still have entries in their route tables about 10.1.5.0; the information is no longer valid, but there's no router to inform them of this fact. They will unknowingly forward packets to an unreachable destinationa black hole has opened in the internetwork.
This problem is handled by setting a route invalidation timer for each entry in the route table. For example, when router C first hears about 10.1.5.0 and enters the information into its route table, C sets a timer for that route. At every regularly scheduled update from router D, C discards the update's already-known information about 10.1.5.0 as described in "Routing by Rumor." But as C does so, it resets the timer on that route.
If router D goes down, C will no longer hear updates about 10.1.5.0. The timer will expire, C will flag the route as unreachable and will pass the information along in the next update.
Typical periods for route timeouts range from three to six update periods. A router would not want to invalidate a route after a single update has been missed, because this event may be the result of a corrupted or lost packet or some sort of network delay. At the same time, if the period is too long, reconvergence will be excessively slow.
Split Horizon
According to the distance vector algorithm as it has been described so far, at every update period each router broadcasts its entire route table to every neighbor. But is this really necessary? Every network known by router A in Figure 4.3, with a hop count higher than 0, has been learned from router B. Common sense suggests that for router A to broadcast the networks it has learned from router B back to router B is a waste of resources. Obviously, B already knows about those networks.
A route pointing back to the router from which packets were received is called a reverse route. Split horizon is a technique for preventing reverse routes between two routers.
Besides not wasting resources, there is a more important reason for not sending reachability information back to the router from which the information was learned. The most important function of a dynamic routing protocol is to detect and compensate for topology changesif the best path to a network becomes unreachable, the protocol must look for a next-best path.
Look yet again at the converged internetwork of Figure 4.3 and suppose that network 10.1.5.0 goes down. Router D will detect the failure, flag the network as unreachable, and pass the information along to router C at the next update interval. However, before D's update timer triggers an update, something unexpected happens. C's update arrives, claiming that it can reach 10.1.5.0, one hop away! Remember the road sign analogy? Router D has no way of knowing that C is not advertising a legitimate next-best path. It will increment the hop count and make an entry into its route table indicating that 10.1.5.0 is reachable via router C's interface 10.1.4.1, just 2 hops away.
Now a packet with a destination address of 10.1.5.3 arrives at router C. C consults its route table and forwards the packet to D. D consults its route table and forwards the packet to C, C forwards it back to D, ad infinitum. A routing loop has occurred.
Implementing split horizon prevents the possibility of such a routing loop. There are two categories of split horizon: simple split horizon and split horizon with poisoned reverse.
The rule for simple split horizon is, When sending updates out a
The routers in Figure 4.4 implement simple split horizon. Router C sends an update to router D for networks 10.1.1.0, 10.1.2.0, and 10.1.3.0. Networks 10.1.4.0 and 10.1.5.0 are not included because they were learned from router D. Likewise, updates to router B include 10.1.4.0 and 10.1.5.0 with no mention of 10.1.1.0, 10.1.2.0, and 10.1.3.0.
Figure 4.4 Simple split horizon does not advertise routes back to the neighbors from whom the routes were learned.
Simple split horizon works by suppressing information. Split horizon with poisoned reverse is a modification that provides more positive information.
The rule for split horizon with poisoned reverse is, When sending
In the scenario of Figure 4.4, router C would in fact advertise 10.1.4.0 and 10.1.5.0 to router D, but the network would be marked as unreachable. Figure 4.5 shows what the route tables from C to B and D would look like. Notice that a route is marked as unreachable by setting the metric to infinity; in other words, the network is an infinite distance away. Coverage of a routing protocol's concept of infinity continues in the next section.
Figure 4.5 Split horizon with poisoned reverse advertises reverse routes but with an unreachable (infinite) metric.
Split horizon with poisoned reverse is considered safer and stronger than simple split horizona sort of "bad news is better than no news at all" approach. For example, suppose that router B in Figure 4.5 receives corrupted information causing it to believe that subnet 10.1.1.0 is reachable via router C. Simple split horizon would do nothing to correct this misperception, whereas a poisoned reverse update from router C would immediately stop the potential loop. For this reason, most modern distance vector implementations use split horizon with poisoned reverse. The trade-off is that routing update packets are larger, which may exacerbate any congestion problems on a link.
Counting to Infinity
Split horizon will break loops between neighbors, but it will not stop loops in a network such as the one in Figure 4.6. Again, 10.1.5.0 has failed. Router D sends the appropriate updates to its neighbors router C (the dashed arrows) and router B (the solid arrows). Router B marks the route via D as unreachable, but router A is advertising a next-best path to 10.1.5.0, which is 3 hops away. B posts that route in its route table.
Figure 4.6 Split horizon will not prevent routing loops here.
B now informs D that it has an alternative route to 10.1.5.0. D posts this information and updates C, saying that it has a 4-hop route to the network. C tells A that 10.1.5.0 is 5 hops away. A tells B that the network is now 6 hops away.
"Ah," router B thinks, "router A's path to 10.1.5.0 has increased in length. Nonetheless, it's the only route I've got, so I'll use it!"
B changes the hop count to 7, updates D, and around it goes again. This situation is the counting-to-infinity problem because the hop count to 10.1.5.0 will continue to increase to infinity. All routers are implementing split horizon, but it doesn't help.
The way to alleviate the effects of counting to infinity is to define
This method is also how routers advertise a network as unreachable. Whether it is a poisoned reverse route, a network that has failed, or a network beyond the maximum network diameter of 15 hops, a router will recognize any 16-hop route as unreachable.
Setting a maximum hop count of 15 helps solve the counting-to-infinity problem, but convergence will still be very slow. Given an update period of 30 seconds, a network could take up to 7.5 minutes to reconverge and is susceptible to routing errors during this time. The two methods for speeding up reconvergence are triggered updates and holddown timers.
Triggered Updates
Triggered updates, also known as flash updates, are very simple: If a metric changes for better or for worse, a router will immediately send out an update without waiting for its update timer to expire. Reconvergence will occur far more quickly than if every router had to wait for regularly scheduled updates, and the problem of counting to infinity is greatly reduced, although not completely eliminated. Regular updates may still occur along with triggered updates. Thus a router might receive bad information about a route from a not-yet-reconverged router after having received correct information from a triggered update. Such a situation shows that confusion and routing errors may still occur while an internetwork is reconverging, but triggered updates will help to iron things out more quickly.
A further refinement is to include in the update only the networks that actually triggered it, rather than the entire route table. This technique reduces the processing time and the impact on network bandwidth.
Holddown Timers
Triggered updates add responsiveness to a reconverging internetwork. Holddown timers introduce a certain amount of skepticism to reduce the acceptance of bad routing information.
If the distance to a destination increases (for example, the hop count increases from 2 to 4), the router sets a holddown timer for that route. Until the timer expires, the router will not accept any new updates for the route.
Obviously, a trade-off is involved here. The likelihood of bad routing information getting into a table is reduced but at the expense of the reconvergence time. Like other timers, holddown timers must be set with care. If the holddown period is too short, it will be ineffective, and if it is too long, normal routing will be adversely affected.
Asynchronous Updates
Figure 4.7 shows a group of routers connected to an Ethernet backbone. The routers should not broadcast their updates at the same time; if they do, the update packets will collide. Yet this situation is exactly what can happen when a several routers share a broadcast network. System delays related to the processing of updates in the routers tend to cause the update timers to become synchronized. As a few routers become synchronized, collisions will begin to occur, further contributing to system delays, and eventually all routers sharing the broadcast network may become synchronized.
Figure 4.7 If update timers become synchronized, collisions may occur.
Asynchronous updates may be maintained by one of two methods:
Each router's update timer is independent of the routing process and is, therefore, not affected by processing loads on the router.
A small random time, or timing jitter, is added to each update period as an offset.
If routers implement the method of rigid, system-independent timers, then all routers sharing a broadcast network must be brought online in a random fashion. Rebooting the entire group of routers simultaneously could result in all the timers attempting to update at the same time.
Adding randomness to the update period is effective if the variable is large enough in proportion to the number of routers sharing the broadcast network. Sally Floyd and Van Jacobson6 have calculated that a too-small randomization will be overcome by a large enough network of routers and that to be effective the update timer should range as much as 50% of the median update period.