Backbone Core Network Design
In early data networking, the topology for the network backbone was relatively simple: Operations were centralized, so a star topology made the most senseand, in some cases, this was the only topology the technology would support. This did cause the center of the star to become a single point of failure, but because no real traffic flows existed between spokes on the star, this was not a major cause for concern. With the move toward multiple client-server and peer-to-peer relationships, the choice of core network topology is not as clear.
The purpose of the backbone is to connect regional distribution networks and, in some instances, to provide connectivity to other peer networks. A national infrastructure usually forms a significant part of the operational cost of the network. Given its position at the top of the network hierarchy, two requirements of the backbone topology are clear: it must be reliable and it must scale.
Making the Backbone Reliable
Reliability can be acquired by employing two methods. First, you can create more reliable routers through the use of "carrier-class" characteristics, such as multiple CPUs, power supplies, and generators; and even redundant routers. Ultimately, however, any backbone will include WAN links that rely on a great deal of equipment and environmental stability for their operation, which represents a real risk of ultimate failure. If the carrier's up-time guarantees are not sufficient, you have no choice but to design a backbone that is resilient to link failure.
The second option is to simply connect all distribution networks with a full mesh. However, in terms of minimizing hop count within the network, the full mesh approach has several drawbacks:
First, given N regional distribution networks, you must have N(N-1)/2 backbone links in the core. This creates expense in WAN circuitry, as well as in router and WAN switch hardware (channelized or ATM technology can reduce these issues).
Moreover, PVC sizing requires that the traffic levels between any two distribution networks should be well understood, or that the network has the capability to circumvent congestion. Although traffic engineering calculations and circumventing congestion are common in the telephone network, common IP networks and their associated routing protocols do not provide this capability as readily. One good reason is that the resources required by any TCP/IP session are not known a priori, and IP networks are traditionally engineered as best-effort. Chapter 14 explores how to bypass best-effort by providing differentiated service in IP networks.
A full PVC mesh can also obviate one of the benefits of multiplexing, or trunking, in a best-effort network. Round-trip time and TCP window size permitting, any user can burst traffic up to the full line rate of the trunk. Furthermore, the routing complexity in a full mesh can consume bandwidth, computational, and operational management resources.
Most backbone topologies are, therefore, initially designed based on financial constraints, such as user population density, or application requirements; and WAN service availability. This initial design can be subsequently refined quite effectively by statistical analysis of traffic levels after the backbone is operational, and the availability of new WAN technologies is known. Data network requirements analysis is a relatively new art. See [McCabe, 1998] for thorough coverage of this area.
Building the Backbone Topology
Because you have a basic need for resilience in the backbone, a good starting point for the backbone topology is a ring connecting all distribution networks. This ring could represent the minimum cost of WAN circuits, compromised by an initial estimate of major traffic flows, and possibly some very particular delay requirements (although this is rare, with notable exceptions being high-performance networks).
Next, existing links can be fattened, or direct connections between backbone routers can be added as required or as is cost-effective. This incremental approach should be considered when selecting WAN technologies, routing nodes, and interface types.
Backbone routing protocols, such as IBGP, properly coupled with OSPF, IS-IS, and Enhanced IGRP, can rapidly circumvent failures by simple link-costing mechanisms. However, the bandwidth allocations with the core topology should consider failure modes. What happens when the ring is broken due to WAN or node failure? Is the re-routed path sufficient to carry the additional traffic load? Although TCP performs extremely well in congested environments compared with other protocols, it is still possible to render the network useless for most practical applications.
Analysis of historical traffic levels, captured by SNMP, for example, provides for a relatively accurate estimation of the consolidated load on the remaining links during various failure modes.
Traditionally, the use of a ring topology made it difficult to estimate the traffic levels between individual distribution networks. SNMP statistics, for example, provided only input and output byte counts for WAN interfaces, making it difficult to determine the appropriate sizing for new direct links between distribution networks.
Typically, this had to be accomplished using a cumbersome approach, such as "sniffers" on WAN links, or through accounting capabilities within routers that scaled rather poorly. However, IP accounting facilities, such as Netflow, now provide a scalable way for network managers to collect and analyze traffic flows, based on source and destination addresses, as well as many other flow parameters. This significantly eases traffic engineering and accounting activities. It is now possible to permanently collect and archive flow data for network design or billing purposes.
NOTE
Netflow is a high-performance switching algorithm that collects comprehensive IP accounting information and exports it to a collection agent.
Load sharing is possible on the backbone network. With Cisco routers, this can be either on a per-packet or a per-flow basis. The latter usually is recommended because it avoids possible packet re-ordering, is efficiently implemented, and avoids the potential for widely varying round-trip times, which interfere with the operation of TCP. This is not a problem for per-packet load sharing over parallel WAN circuits, but it can be a problem when each alternate path is one or more routed hops.
It is possible to connect regional networks directly, avoiding the backbone altogether and possibly providing more optimal routing. For example, in Figure 4-2, the DCs in SFO and LAX could be connected by a direct link. Traffic between the SFO and LAX regional networks could then travel over this link rather than over the backbone.
However, this exercise should be viewed as the effective consolidation of two regional distribution networks, and the overall routing architecture for the newly combined regions should be re-engineered to reflect this.
On an operational note, the backbone network and routers may be under different operational management teams to the regional networks. One historical example is the arrangement between the NSFNET backbone and the regional networks described in Chapter 1. Today, many smaller ISPs use the NSPs for WAN connectivity.
In this situation, the routing relationship between the backbone and the distribution networks is likely to be slightly different because an Exterior Gateway Protocol such as BGP will be used. In this book, the operators of the backbone and regional networks are generally considered to be the same, which makes it possible for the two to share a hierarchical IGP. In Chapter 16, "Design and Configuration Case Studies," you will examine a case study for scaling very large enterprise networks in which this is not the case.