NSF/SSO, NSR, Graceful Restart to Ensure Robust Routing
Nonstop forwarding (NSF) refers to the capability of the data plane to continue to function hitless when the routing plane disappears (momentarily, that is) and most likely fails over to a standby RP. Of course, the routing information and topology might change during this time and result in an invalid FIB, and therefore the switchover times should be as small as possible. The Cisco ASR 1000 provides switchover times of less than 50 ms RP to RP (or IOS daemon [IOSD] to IOSD for the ASR 1002-F/ASR 1002/ASR 1004).
Stateful switchover (SSO) refers to the capability of the control plane to hold configuration and various states during this switchover, and to thus effectively reduce the time to utilize the newly failed-over control plane. This is also handy when doing scheduled hitless upgrades within the ISSU execution path. The time to reach SSO for the newly active RP may vary depending on the type and scale of the configuration.
Graceful restart (GR) refers to the capability of the control plane to delay advertising the absence of a peer (going through control-plane switchover) for a "grace period," and thus help minimize disruption during that time (assuming the standby control plane comes up). GR is based on extensions per routing protocol, which are interoperable across vendors. The downside of the grace period is huge when the peer completely fails and never comes up, because that slows down the overall network convergence, which brings us to the final concept: nonstop routing (NSR).
NSR is an internal (vendor-specific) mechanism to extend the awareness of routing to the standby routing plane so that in case of failover, the newly active routing plane can take charge of the already established sessions.
Table 12-1 shows the compatibility and support matrix for ASR 1000 IOS XE software 2.2, and outlines the various states that are preserved during FP/ESP failover.
Table 12-1. Protocols and Their State Preservation via NSF/SSO
Technology Focus |
NSF |
SSO |
Routing protocols |
Enhanced Interior Gateway Routing Protocol (EIGRP), Open Shortest Path First Version 2 (OSPFv2), OSPFv3, Intermediate System-to-Intermediate System (IS-IS), and Border Gateway Protocol Version 4 (BGPv4) |
|
IPv4 services |
— |
Address Resolution Protocol (ARP), Hot Standby Routing Protocol (HSRP), IPsec, Network Address Translation (NAT), IPv6 Neighbor Discovery Protocol (NDP), Unicast Reverse Path Forwarding (uRPF), Simple Network Management Protocol (SNMP), Gateway Load Balancing Protocol (GLBP), Virtual Router Redundancy Protocol (VRRP), Multicast (Internet Group Management Protocol [IGMP]) |
IPv6 services |
— |
IPv6 Multicast (Multicast Listener Discovery [MLD], Protocol Independent Multicast-Source Specific Multicast [PIM-SSM], MLD Access group) |
L2/L3 protocols |
— |
Frame Relay, PPP, Multilink PPP (MLPPP), High-Level Data Link Control (HDLC), 802.1Q, bidirectional forwarding detection (BFD) |
Multiprotocol Label Switching (MPLS) |
— |
MPLS Layer 3 VPN (L3 VPN), MPLS Label Distribution Protocol (LDP) |
SBC |
— |
SBC Data Border Element (DBE) |
See the "Further Reading" section at the end of this chapter to find out where to look for complete route scale testing details.
Use Case: Achieving High Availability Using NSF/SSO
To command higher revenues and consistent profitability, service providers and enterprises are increasingly putting more mission-critical, time-sensitive services on their IP infrastructure. One of the key challenges to this is achieving and delivering high network availability with strict service level agreement (SLA) requirements. It is universally understood that availability of the network is directly linked with the overall total cost of ownership (TCO).
An enterprise has an ASR 1006 / ASR1000-ESP10 router used in the core of the network running OSPF as the routing protocol used to connect to multiple distribution hub routers, where distribution hub routers might not all be Cisco.
The goal is to reduce the route/prefix recomputation churn caused by RP switchover and reestablishment of OSPF peers.
To address the requirements, you need to implement Internet Engineering Task Force (IETF) NSF for OSPF because that is interoperable with all vendors that are NSF-aware (a term used for a neighboring router that understands the GR protocol extensions). In this case, when NSF-capable ASR 1000 switches over from active RP to standby RP, there will be no packet loss at all, and downstream neighbors will not restart adjacencies.
Figure 12-1 shows the ASR 1000 core router and its neighbors, which are all NSF-aware and can act as helpers during RP SSO.
Figure 12-1 Logical view of many regional WAN aggregation routers coming into a consolidated WAN campus edge router.
To turn on IETF helper mode on all the distribution hub routers, including the Cisco ASR 1000, you need to execute the following configuration steps:
- Step 1. Configure NSF within the given OSPF process ID:
ASR1006# configure terminal ASR1006(config)# router ospf 100 ASR1006(config-router)# nsf ietf restart-interval 300
- Step 2. Check that the NSF is turned on, for sure, on the helper router:
Router-helper# show ip ospf 100 Routing Process "ospf 100" with ID 172.16.1.2 ----output truncated----
IETF Non-Stop Forwarding enabled restart-interval limit: 300 sec IETF NSF helper support enabled Cisco NSF helper support enabled
Reference bandwidth unit is 100 mbps Area BACKBONE(0) ASR1006# sh ip ospf 100 Routing Process "ospf 1" with ID 10.1.1.1 ----output truncated----IETF Non-Stop Forwarding enabled restart-interval limit: 300 sec IETF NSF helper support enabled Cisco NSF helper support enabled
- Step 3. Now you need to verify that both RPs are active (using the show platform command) and OSPF neighbor relationships are established (using the show ip ospf neighbors command):
! active ESP:
ASR1006# show platform software ip fp active cef summary Forwarding Table Summary Name VRF id Table id Protocol Prefixes State ---------------------------------------------------------------- Default 0 0 IPv4 10000 cpp: 0x10e265d8 (created)! standby ESP:
ASR1006# show platform software ip fp standby cef summary Forwarding Table Summary Name VRF id Table id Protocol Prefixes State --------------------------------------------------------------------- Default 0 0 IPv4 10000 cpp: 0x10e265d8 (created)You can also view the prefixes downloaded into both the active and standby Embedded Service Processor (ESP) before failing over the router.
The preceding output shows that about 10K routes are created and exist in both ESPs before the failover.
- Step 4. Now you'll induce the RP SSO failover (using redundancy force-switchover) from the active RP enable mode CLI. The following output shows the effects from the newly active RP:
ASR1006# show ip ospf 100 ----output truncated---- IETF Non-Stop Forwarding enabled restart-interval limit: 300 sec, last IETF NSF restart 00:00:10 ago IETF NSF helper support enabled Cisco NSF helper support enabled
- Step 5. RP SSO will not result in any packet loss, because forwarding continues during this entire process. During this switchover process, you can execute the show platform command to verify that the former active RP is booting ("booting" state).
In case of ASR1000-ESP10 failover, some small packet loss will occur (packets that are being processed inside the QuantumFlow Processor [QFP]), although that would account for much less than 1-ms worth of transit traffic loss.
NSF/SSO allows RPs to fail over without any packet loss, and ESPs can fail over with extremely small packet loss. The Cisco ASR 1000 shows core benefits of a carrier-class router where failover times beat even the Automatic Protection Switching (APS) gold standard of 50 ms.
In today's networks, where SLAs are enforced and networks are participating in life- and mission-critical scenarios, a robust infrastructure with faster failover based on modern architectures is a must.