After completing this chapter, you will be able to
- Design enterprise campus network infrastructures
- Review high-availability campus design features and make recommendations
- Describe Layer 2 campus design options and make recommendations
- Describe Layer 3 campus design options and make recommendations
- Discuss options for Layer 2 to Layer 3 boundary placement in the campus
- Describe infrastructure service considerations, including IP telephony, QoS, and Cisco Catalyst Integrated Security features
The complexity inherent in today's campus networks necessitates a design process capable of separating solutions into basic elements. The Cisco hierarchical network model achieves this goal by dividing the network infrastructure into modular components. Each module is used to represent a functional service layer within the campus hierarchy.
Designing High Availability in the Enterprise Campus
The Cisco hierarchical network model enables the design of high-availability modular topologies. Through the use of scalable building blocks, the network can support evolving business needs. The modular approach makes the network easier to scale, troubleshoot, and understand. It also promotes the deterministic traffic patterns.
This section reviews design models, recommended practices, and methodologies for high availability in the Cisco Enterprise Campus Architecture infrastructure.
Enterprise Campus Infrastructure Review
The building blocks of the enterprise campus infrastructure are the access layer, the distribution layer, and the core layer. The principal features associated with each layer are hierarchal design and modularity. A hierarchical design avoids the need for a fully meshed network in which all nodes are interconnected. A modular design enables a component to be placed in service or taken out of service with little or no impact on the rest of the network. This methodology also facilitates troubleshooting, problem isolation, and network management.
Access Layer
The access layer is the point of entry into the network for end devices, as illustrated in Figure 2-1.
Figure 2-1 Access Layer
The campus access layer aggregates end users and provides uplinks to the distribution layer. The access layer can support multiple features:
- High availability: At the access layer, high availability is supported through various hardware and software attributes. With hardware, system-level redundancy can be provided using redundant supervisor engines and redundant power supplies. It can also be provided by default gateway redundancy using dual connections from access switches to redundant distribution layer switches. With software, high availability is supported through the use of first-hop routing protocols (FHRP), such as the Hot Standby Router Protocol (HSRP), Virtual Router Redundancy Protocol (VRRP), and Gateway Load Balancing Protocol (GLBP).
- Convergence: The access layer supports inline Power over Ethernet (PoE) for IP telephony and wireless access points, allowing customers to converge voice onto their data network and providing roaming wireless LAN (WLAN) access for users.
- Security: The access layer provides services for additional security against unauthorized access to the network through the use of tools such as IEEE 802.1x, port security, DHCP snooping, Dynamic ARP Inspection (DAI), and IP Source Guard.
- Quality of service (QoS): The access layer allows prioritization of mission-critical network traffic using traffic classification and queuing as close to the ingress of the network as possible. It supports the use of the QoS trust boundary.
- IP multicast: The access layer supports efficient network and bandwidth management using software features such as Internet Group Management Protocol (IGMP) snooping.
Distribution Layer
The distribution layer aggregates traffic from all nodes and uplinks from the access layer and provides policy-based connectivity, as illustrated in Figure 2-2.
Figure 2-2 Distribution Layer
Availability, load balancing, QoS, and provisioning are the important considerations at this layer. High availability is typically provided through dual paths from the distribution layer to the core and from the access layer to the distribution layer. Layer 3 equal-cost load sharing allows both uplinks from the distribution to the core layer to be used.
The distribution layer is the place where routing and packet manipulation are performed and can be a routing boundary between the access and core layers. The distribution layer represents a redistribution point between routing domains or the demarcation between static and dynamic routing protocols. The distribution layer performs tasks such as controlled routing and filtering to implement policy-based connectivity and QoS. To further improve routing protocol performance, the distribution layer summarizes routes from the access layer. For some networks, the distribution layer offers a default route to access layer routers and runs dynamic routing protocols when communicating with core routers.
The distribution layer uses a combination of Layer 2 and multilayer switching to segment workgroups and isolate network problems, preventing them from impacting the core layer. The distribution layer may be used to terminate VLANs from access layer switches. The distribution layer connects network services to the access layer and implements QoS, security, traffic loading, and routing policies. The distribution layer provides default gateway redundancy using an FHRP, such as HSRP, GLBP, or VRRP, to allow for the failure or removal of one of the distribution nodes without affecting endpoint connectivity to the default gateway.
Core Layer
The core layer provides scalability, high availability, and fast convergence to the network, as illustrated in Figure 2-3. The core layer is the backbone for campus connectivity, and is the aggregation point for the other layers and modules in the Cisco Enterprise Campus Architecture. The core provides a high level of redundancy and can adapt to changes quickly. Core devices are most reliable when they can accommodate failures by rerouting traffic and can respond quickly to changes in the network topology. The core devices implement scalable protocols and technologies, alternate paths, and load balancing. The core layer helps in scalability during future growth.
Figure 2-3 Core Layer
The core is a high-speed, Layer 3 switching environment using hardware-accelerated services. For fast convergence around a link or node failure, the core uses redundant point-to-point Layer 3 interconnections because this design yields the fastest and most deterministic convergence results. The core layer is designed to avoid any packet manipulation, such as checking access lists and filtering, which would slow down the switching of packets.
Not all campus implementations require a campus core. The core and distribution layer functions can be combined at the distribution layer for a smaller campus.
Without a core layer, the distribution layer switches need to be fully meshed, as illustrated in Figure 2-4. This design can be difficult to scale, and increases the cabling requirements, because each new building distribution switch needs full-mesh connectivity to all the distribution switches. The routing complexity of a full-mesh design increases as new neighbors are added.
Figure 2-4 Is a Core Layer Needed?
In Figure 2-4, a distribution module in the second building of two interconnected switches requires four additional links for full-mesh connectivity to the first module. A third distribution module to support the third building would require 8 additional links to support connections to all the distribution switches, or a total of 12 links. A fourth module supporting the fourth building would require 12 new links, for a total of 24 links between the distribution switches. Four distribution modules impose eight Interior Gateway Protocol (IGP) neighbors on each distribution switch.
As a recommended practice, deploy a dedicated campus core layer to connect three or more buildings in the enterprise campus, or four or more pairs of building distribution switches in a very large campus. The campus core helps make scaling the network easier by addressing the requirements for the following:
- Gigabit density
- Data and voice integration
- LAN, WAN, and MAN convergence
High-Availability Considerations
In the campus, high availability is concerned with minimizing link and node failures and optimizing recovery times to minimize convergence and downtime.
Implement Optimal Redundancy
The recommended design is redundant distribution layer switches and redundant connections to the core with a Layer 3 link between the distribution switches. Access switches should have redundant connections to redundant distribution switches, as illustrated in Figure 2-5.
Figure 2-5 Optimal Redundancy
As a recommended practice, the core and distribution layers are built with redundant switches and fully meshed links to provide maximum redundancy and optimal convergence. Access switches should have redundant connections to redundant distribution switches. The network bandwidth and capacity is engineered to withstand a switch or link failure, supporting 120 to 200 ms to converge around most events. Open Shortest Path First (OSPF) and Enhanced Interior Gateway Routing Protocol (EIGRP) timer manipulation attempt to quickly redirect the flow of traffic away from a router that has experienced a failure toward an alternate path.
In a fully redundant topology with tuned IGP timers, adding redundant supervisors with Cisco NSF and SSO may cause longer convergence times than single supervisors with tuned IGP timers. NSF attempts to maintain the flow of traffic through a router that has experienced a failure. NSF with SSO is designed to maintain a link-up Layer 3 up state during a routing convergence event. However, because an interaction occurs between the IGP timers and the NSF timers, the tuned IGP timers can cause NSF-aware neighbors to reset the neighbor relationships.
In nonredundant topologies, using Cisco NSF with SSO and redundant supervisors can provide significant resiliency improvements.
Provide Alternate Paths
The recommended distribution layer design is redundant distribution layer switches and redundant connections to the core with a Layer 3 link between the distribution switches, as illustrated in Figure 2-6.
Figure 2-6 Provide Alternate Paths
Although dual distribution switches connected individually to separate core switches will reduce peer relationships and port counts in the core layer, this design does not provide sufficient redundancy. In the event of a link or core switch failure, traffic will be dropped.
An additional link providing an alternate path to a second core switch from each distribution switch offers redundancy to support a single link or node failure. A link between the two distribution switches is needed to support summarization of routing information from the distribution layer to the core.
Avoid Single Points of Failure
Cisco NSF with SSO and redundant supervisors has the most impact in the campus in the access layer. An access switch failure is a single point of failure that causes outage for the end devices connected to it. You can reduce the outage to one to three seconds in this access layer, as shown in Figure 2-7, by using SSO in a Layer 2 environment or Cisco NSF with SSO in a Layer 3 environment.
Figure 2-7 Avoid Single Points of Failure
Cisco NSF with SSO
Cisco NSF with SSO is a supervisor redundancy mechanism in Cisco IOS Software that allows extremely fast supervisor switchover at Layers 2 to 4.
SSO allows the standby route processor (RP) to take control of the device after a hardware or software fault on the active RP. SSO synchronizes startup configuration, startup variables, and running configuration; and dynamic runtime data, including Layer 2 protocol states for trunks and ports, hardware Layer 2 and Layer 3 tables (MAC, Forwarding Information Base [FIB], and adjacency tables) and access control lists (ACL) and QoS tables.
Cisco NSF is a Layer 3 function that works with SSO to minimize the amount of time a network is unavailable to its users following a switchover. The main objective of Cisco NSF is to continue forwarding IP packets following an RP switchover. Cisco NSF is supported by the EIGRP, OSPF, Intermediate System-to-Intermediate System (IS-IS), and Border Gateway Protocol (BGP) for routing. A router running these protocols can detect an internal switchover and take the necessary actions to continue forwarding network traffic using Cisco Express Forwarding while recovering route information from the peer devices. With Cisco NSF, peer networking devices continue to forward packets while route convergence completes and do not experience routing flaps.
Routing Protocol Requirements for Cisco NSF
Usually, when a router restarts, all its routing peers detect that routing adjacency went down and then came back up. This transition is called a routing flap, and the protocol state is not maintained. Routing flaps create routing instabilities, which are detrimental to overall network performance. Cisco NSF helps to suppress routing flaps.
Cisco NSF allows for the continued forwarding of data packets along known routes while the routing protocol information is being restored following a switchover. With Cisco NSF, peer Cisco NSF devices do not experience routing flaps because the interfaces remain up during a switchover and adjacencies are not reset. Data traffic is forwarded while the standby RP assumes control from the failed active RP during a switchover. User sessions established before the switchover are maintained.
The ability of the intelligent line cards to remain up through a switchover and to be kept current with the FIB on the active RP is crucial to Cisco NSF operation. While the control plane builds a new routing protocol database and restarts peering agreements, the data plane relies on pre-switchover forwarding-table synchronization to continue forwarding traffic. After the routing protocols have converged, Cisco Express Forwarding updates the FIB table and removes stale route entries, and then it updates the line cards with the refreshed FIB information.
The switchover must be completed before the Cisco NSF dead and hold timers expire; otherwise, the peers will reset the adjacency and reroute the traffic.
Cisco NSF protocol enhancements enable a Cisco NSF capable router to signal neighboring Cisco NSF-aware devices during switchover.
A Cisco NSF-aware neighbor is needed so that Cisco NSF-capable systems can rebuild their databases and maintain their neighbor adjacencies across a switchover.
Following a switchover, the Cisco NSF-capable device requests that the Cisco NSF-aware neighbor devices send state information to help rebuild the routing tables as a Cisco NSF reset.
The Cisco NSF protocol enhancements allow a Cisco NSF-capable router to signal neighboring Cisco NSF-aware devices. The signal asks that the neighbor relationship not be reset. As the Cisco NSF-capable router receives and communicates with other routers on the network, it can begin to rebuild its neighbor list. After neighbor relationships are reestablished, the Cisco NSF-capable router begins to resynchronize its database with all of its Cisco NSF-aware neighbors.
Based on platform and Cisco IOS Software release, Cisco NSF with SSO support is available for many routing protocols:
- EIGRP
- OSPF
- BGP
- IS-IS
Cisco IOS Software Modularity Architecture
The Cisco Catalyst 6500 series with Cisco IOS Software Modularity supports high availability in the enterprise. Figure 2-8 illustrates the key elements and components of the Cisco Software Modularity Architecture.
Figure 2-8 Cisco IOS Software Modularity Architecture
When Cisco IOS Software patches are needed on systems without Cisco IOS Software Modularity, the new image must be loaded on the active and redundant supervisors, and the supervisor must be reloaded or the switchover to the standby completed to load the patch.
The control plane functions (that manage routing protocol updates and management traffic) on the Catalyst 6500 series run on dedicated CPUs on the multilayer switch forwarding card complex (MSFC). A completely separate data plane is responsible for traffic forwarding. When the hardware is programmed for nonstop operation, the data plane continues forwarding traffic even if there is a disruption in the control plane. The Catalyst 6500 series switches benefit from the more resilient control plane offered by Cisco IOS Software Modularity.
The Cisco Catalyst 6500 series with Cisco IOS Software Modularity enables several Cisco IOS control plane subsystems to run in independent processes. Cisco IOS Software Modularity boosts operational efficiency and minimizes downtime:
- It minimizes unplanned downtime through fault containment and stateful process restarts, raising the availability of converged applications.
- It simplifies software changes through subsystem in-service software upgrades (ISSU), significantly reducing code certification and deployment times and decreasing business risks.
- It enables process-level, automated policy control by integrating Cisco IOS Embedded Event Manager (EEM), offloading time-consuming tasks to the network and accelerating the resolution of network issues. EEM is a combination of processes designed to monitor key system parameters such as CPU utilization, interface counters, Simple Network Management Protocol (SNMP), and syslog events. It acts on specific events or threshold counters that are exceeded.
Example: Software Modularity Benefits
Cisco IOS Software Modularity on the Cisco Catalyst 6500 series provides these benefits:
- Operational consistency: Cisco IOS Software Modularity does not change the operational point of view. Command-line interfaces (CLI) and management interfaces such as SNMP or syslog are the same as before. New commands to EXEC and configuration mode and new show commands have been added to support the new functionality.
- Protected memory: Cisco IOS Software Modularity enables a memory architecture where processes make use of a protected address space. Each process and its associated subsystems live in an individual memory space. Using this model, memory corruption across process boundaries becomes nearly impossible.
- Fault containment: The benefit of protected memory space is increased availability because problems occurring in one process cannot affect other parts of the system. For example, if a less-critical system process fails or is not operating as expected, critical functions required to maintain packet forwarding are not affected.
- Process restartability: Building on the protected memory space and fault containment, the modular processes are now individually restartable. For test purposes or nonresponding processes, the process restart process-name command is provided to manually restart processes. Restarting a process allows fast recovery from transient errors without the need to disrupt forwarding. Integrated high-availability infrastructure constantly checks the state of processes and keeps track of how many times a process restarted in a defined time interval. If a process restart does not restore the system, the high-availability infrastructure will take more drastic actions, such as initiating a supervisor engine switchover or a system restart.
-
Modularized processes: Several control plane functions have been modularized to cover the most commonly used features. Examples of modular processes include but are not limited to these:
- Routing process
- Internet daemon
- Raw IP processing
- TCP process
- User Datagram Protocol (UDP) process
- Cisco Discovery Protocol process
- Syslog daemon
- Any EEM components
- File systems
- Media drivers
- Install manager
- Subsystem ISSU: Cisco IOS Software Modularity allows selective system maintenance during runtime through individual patches. By providing versioning and patch-management capabilities, Cisco IOS Software Modularity allows patches to be downloaded, verified, installed, and activated without the need to restart the system. Because data plane packet forwarding is not affected during the patch process, the network operator now has the flexibility to introduce software changes at any time through ISSU. A patch affects only the software components associated with the update.