This chapter discusses some of the common problems in EIGRP and how to resolve those problems. Debugs, configurations, and useful show commands are also given where necessary.
NOTE
Debugs can be CPU-intensive and can adversely affect your network. Therefore, debugs are not recommended on a production network unless being instructed by Cisco's Technical Assistance Center (TAC).
Sometimes, there might be multiple causes for the same problem. Therefore, if one scenario doesn't fix the network problem, always check into other scenarios.
Troubleshooting EIGRP Neighbor Relationships
This section discusses methods of troubleshooting issues regarding EIGRP neighbor relationships. The following are the most common causes of problems with EIGRP neighbor relationships:
Unidirectional link
Uncommon subnet, primary, and secondary address mismatch
Mismatched masks
K value mismatches
Mismatched AS numbers
Stuck in active
Layer 2 problem
Access list denying multicast packets
Manual change (summary router, metric change, route filter)
Figure 7-1 illustrates a general troubleshooting flowchart on EIGRP neighbor relationships.
Figure 7-1 General Flowchart on Troubleshooting EIGRP Neighbor Relationships
Consulting the EIGRP Log for Neighbor Changes
Whenever EIGRP resets its neighbor relationship, it is noted in the log with the reason for the reset. In the earlier Cisco IOS Software releases, configuration to enable this feature is required. The command eigrp log-neighbor-change is configured under router EIGRP. In Cisco IOS Software Release 12.1.3 and later, the eigrp log-neighbor-change command becomes the default setting for the router. An example of the EIGRP neighbor log looks something like this:
%DUAL-5-NBRCHANGE: IP-EIGRP EIGRP AS number: Neighbor neighbor IP
address is down: reason for neighbor down.
Table 7-1 documents the neighbor changes that you can find in the EIGRP log, along with the meaning and required action to fix the problem based on the log message.
Table 7-1 Neighbor Changes Documented in the EIGRP Log
Log Message |
Meaning |
Action for Troubleshooting |
NEW ADJACENCY |
Indicates that a new neighbor has been established. |
No action is required. |
PEER RESTARTED |
Indicates that the other neighbor initiates the reset of the neighbor relationship. The router getting the message is not the one resetting the neighbor. |
No action is required on the router that is getting the message. Gather EIGRP neighbor log information on the other neighbor. |
HOLD TIME EXPIRED |
Indicates that the router has not heard any EIGRP packets from the neighbor within the hold-time limit. |
Because this is a packet-loss problem, check for a Layer 2 problem. Trouble-shoot by using the flowchart shown in Figure 7-2. |
RETRY LIMIT EXCEEDED |
Indicates that EIGRP did not receive the acknowledgement from the neighbor for EIGRP reliable packets and that EIGRP already has tried to retransmit the reliable packet 16 times without any success. |
Troubleshoot using the flowchart shown in Figure 7-3. |
ROUTE FILTER CHANGED |
Indicates that the EIGRP neighbor is resetting because there is a change in the route filter (distribute-list command under router EIGRP). |
No action is needed. This is normal behavior in EIGRP, which needs to reset the neighbor when the route filter is changed, and to resynchronize the EIGRP topology table between neighbors. |
INTERFACE DELAY CHANGED |
Indicates that the EIGRP neighbor is resetting because there is a manual configuration change in the delay parameter on the interface. |
No action is needed. This is normal behavior in EIGRP, which needs to reset the neighbor when the delay parameter is changed. |
INTERFACE BANDWIDTH CHANGED |
Indicates that the EIGRP neighbor is resetting because there is a manual configuration change in the interface bandwidth on the interface. |
No action is needed. This is normal behavior in EIGRP, which needs to reset the neighbor when the band-width parameter is changed. |
STUCK IN ACTIVE |
Indicates that the EIGRP neighbor is resetting because EIGRP is stuck in active state. The neighbor getting reset is the result of stuck in active. |
Troubleshoot from the stuck in active point of view. Refer to the section "EIGRP Neighbor ProblemCause: Stuck in Active." |
Figure 7-2 Flowchart for Troubleshooting EIGRP Neighbor Relationship When Getting Neighbor Log Message HOLD TIME EXPIRED
Figure 7-3 Flowchart for Troubleshooting EIGRP Neighbor Relationship When Getting Neighbor Log RETRY LIMIT EXCEEDED
EIGRP Neighbor ProblemCause: Unidirectional Link
Sometimes, a problem with a WAN connection causes EIGRP to have a one-way neighbor relationship. A one-way neighbor relationship usually is caused by a unidirectional connection between the neighbors. The cause for unidirectional connection is usually a Layer 2 problem. For example, a link might be experiencing many CRC errors, a switch problem, or a ping test failure with large or small packets. In this case, you need a call to the group that is responsible for the link to check the integrity of the link. Sometimes, a simple misconfigured access list causes EIGRP to form a one-way neighbor relationship. Figure 7-4 illustrates an example of an EIGRP problem as a result of a unidirectional link.
Figure 7-4 Network Topology Vulnerable to an EIGRP Neighbor Problem Because of a Unidirectional Link
In Figure 7-4, Routers RTR A and RTR B are connected by a WAN connection. The circuit from RTR A to RTR B is fine, but the circuit from RTR B to RTR A is broken. The results from the show ip eigrp neighbor command on RTR A will not show anything because RTR B's EIGRP hello packet can't make it to RTR A. Example 7-1 shows the output from show ip eigrp neighbor on RTR B.
Example 7-1 show ip eigrp neighbors Command Output on RTR B
RtrB#show ip eigrp neighbors IP-EIGRP neighbors for process 1 H Address Interface Hold Uptime SRTT RTO Q Seq (sec) (ms) Cnt Num 1 10.88.18.2 S0 14 00:00:15 0 5000 4 0
RTR B shows RTR A as a neighbor because RTR A's EIGRP hello packet has no problem reaching RTR B. From the output of the show command, the SRTT is at 0 ms, the retransmission timeout (RTO) timer is at 5000 ms, and the Q count is at 4 and is not decrementing. These three numbers give the biggest clue that this is a unidirectional link problem. The following is the meaning of SRTT, RTO, and Q count:
Smooth round-trip time (SRTT)The number of milliseconds it takes for an EIGRP packet to be sent to this neighbor and for the local router to receive an acknowledgment of that packet
Retransmission timeout (RTO), in millisecondsThe amount of time that the software waits before retransmitting a packet from the retransmission queue to a neighbor
Q countThe number of EIGRP packets (Update, Query, and Reply) that the software is waiting to send
Referring to Example 7-1, the fact that the SRTT timer is 0 indicates that no acknowledge-ment packets are being received. The Q count is not decrementing, which indicates that the router is trying to send EIGRP packets but no acknowledgement is being received. RTR B will retry 16 times to resend the packet; eventually, RTR B will reset the neighbor relationship with the log indicating RETRY LIMIT EXCEEDED, and the process starts again. Also, keep in mind that the 16 times retransmission of the same packet is done using unicast, not multicast. Therefore, the RETRY LIMIT EXCEEDED message indicates a problem with transmitting unicast packets over the link, and this is most likely a Layer 1 or Layer 2 problem.
The solution to this problem is to troubleshoot from a Layer 2 perspective. In this example, a call to the WAN provider is needed to find out why the circuit from RTR B to RTR A is broken. After the link between RTR B to RTR A is fixed, the problem will be resolved. Output from show ip eigrp neighbors in Example 7-2 shows that the neighbor relationship after the WAN link has been fixed.
Example 7-2 show ip eigrp neighbors Command Output Confirms Problem Resolution
RtrB#show ip eigrp neighbors IP-EIGRP neighbors for process 1 H Address Interface Hold Uptime SRTT RTO Q Seq (sec) (ms) Cnt Num 1 10.88.18.2 S0 14 01:26:30 149 894 0 291
Notice that the Q count column is 0 and that the SRTT and RTO have valid values now.
EIGRP Neighbor ProblemCause: Uncommon Subnet
Many times, EIGRP won't establish neighbor relationships because the neighbors are not in the same subnet. Usually, the cause of this problem is router misconfiguration. When EIGRP has problems establishing neighbor relationships because of an uncommon subnet, the following error message appears:
IP-EIGRP: Neighbor ip address not on common subnet for interface
Figure 7-5 shows the flowchart for troubleshooting the problem when the "Neighbor not on common subnet" error appears on the router.
Figure 7-5 Problem-Resolution Flowchart
According to the troubleshooting flowchart in Figure 7-5, the three causes of getting the "EIGRP neighbor not on common subnet" error message are the following:
The IP address has been misconfigured on interfaces.
The primary and secondary IP addresses of the neighboring interface don't match.
A switch or hub between the EIGRP neighbor connection is misconfigured or is leaking multicast packet to other ports.
Misconfiguration of the IP Address on the Interfaces
Sometimes, an EIGRP neighbor that is not on a common subnet with other EIGRP neigh-bors is simply the result of misconfiguring the IP address on the interfaces. For example, the network administrator might mistype IP address 192.168.3.1 255.255.255.252 as 192.168.3.11 255.255.255.252, which causes EIGRP to complain about the neighbor not being on a common subnet.
Primary and Secondary IP Addresses of the Neighboring Interface Don't Match
As mentioned in Chapter 6, "Understanding Enhanced Interior Gateway Routing Protocol (EIGRP)," EIGRP sources the hello packet from the primary address of the interface. If the primary network address on one router is used as a secondary network address on the second router, and vice versa, no neighbor relationship will be formed and the routers will complain about the neighbor not being on a common subnet. Figure 7-6 illustrates such a scenario.
Figure 7-6 Network Topology Vulnerable to EIGRP Neighbor Problems Because of Primary and Secondary IP Address Mismatch
In Figure 7-6, Router A and Router B have a primary address in the 10.1.1.0/24 network range, while Router C has an address range of 50.1.1.0/24 configured. When Router A or Router B sends out the EIGRP hello packet, the source of the hello packet will be either 10.1.1.1 or 10.1.1.2, depending on which router sends out the hello. When Router C receives the hello packet from Router A or Router B, it notices that the source is from the 10.1.1.0 network. Because Router C has an IP address of 50.1.1.3 configured on the interface, Router C will not process the hello packet from Router A or Router B because they are from a different network. Therefore, no neighbor relationship is formed from Router C to either Router A or Router B.
The solution for this example is to match all the IP addresses on the segment to the primary address space. For the network in Figure 7-6, you need to configure Router C to be in the primary address space of 10.1.1.0/24.
Switch or Hub Between EIGRP Neighbor Connection Is Misconfigured or Is Leaking Multicast Packets to Other Ports
If the IP address configuration is correct on the interface between EIGRP neighbors, you might want to check the configuration on the switch or the hub that connects the EIGRP neighbors. If a single LAN hub connects the EIGRP neighbors for different LAN segment, the hub passes broadcast and multicast packets to other ports between two logical LAN seg-ments. So, the multicast EIGRP hello from LAN segment 1 will be seen on the neighbor located in LAN segment 2 if a single hub connects all the LAN devices on different LAN segments. The solution is to break up the broadcast domain by using a separate hub for each LAN segment or simply configuring no eigrp log-neighbor-warnings under EIGRP con-figuration to stop seeing the error message.
If a LAN switch connects the LAN devices, you might want to check the configuration of the switch. Make sure that the switch is not configured so that different LAN segments reside within the same VLAN. Make sure that the switch is configured so that each LAN segment has its own broadcast domain and does not share its broadcast domain with other LAN segments.
EIGRP Neighbor ProblemCause: Mismatched Masks
Sometimes, a simple misconfiguration on the interface subnet mask causes an EIGRP neighbor problem. Figure 7-7 illustrates a network diagram for such a scenario.
Figure 7-7 Network Topology Vulnerable to EIGRP Neighbor Problems Because of Mismatched Masks
Example 7-3 shows the configuration for Routers A, B, and C.
Example 7-3 Router A, B, and C Configurations for the Network in Figure 7-7
Router A#interface serial 0 ip address 10.1.1.2 255.255.255.128 interface serial 1 ip address 10.1.3.1 255.255.255.0 Router B#interface serial 0 ip address 10.1.1.1 255.255.255.0 interface ethernet 0 ip address 10.1.2.1 255.255.255.0 Router C#interface ethernet 0 ip address 10.1.2.2 255.255.255.0 interface serial 0 ip address 10.1.3.2 255.255.255.0
Notice the mismatched mask on the serial interface of Router A and Router B. Router A has a mask of 255.255.255.128, while Router B has a mask of 255.255.255.0 on Serial 0. Initially, EIGRP has no problem forming the neighbor between Router A and Router B because 10.1.1.1 and 10.1.1.2 are in the same subnet. The problem occurs when a neighbor relationship is established and Router A and Router B begin to exchange EIGRP topology tables and install routes based on the EIGRP topology table, as demonstrated in Example 7-4.
Example 7-4 Routing Tables from Router B and Router C
Router B#show ip route Codes: C - connected, S - static, I - IGRP, R - RIP, M - mobile, B - BGP D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2 E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGP i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, ia - IS-IS inter area * - candidate default, U - per-user static route, o - ODR P - periodic downloaded static route Gateway of last resort is not set C 10.1.1.0/24 Serial 0 D 10.1.1.0/25 10.1.2.2 Router c#show ip route eigrp Codes: C - connected, S - static, I - IGRP, R - RIP, M - mobile, B - BGP D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2 E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGP i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, ia - IS-IS inter area * - candidate default, U - per-user static route, o - ODR P - periodic downloaded static route Gateway of last resort is not set D 10.1.1.0/24 10.1.2.1 D 10.1.1.0/25 10.1.3.1
When Router B sends Router A an EIGRP update, Router A responds to the update with an EIGRP acknowledgement packet with a destination address of 10.1.1.1 to Router B. When Router B receives the packet, it forwards the ACK packet to Router C instead of processing it because Router B has a more specific route from Router C. Router B has a more specific route of 10.1.1.0/25 with the next hop to 10.1.2.2. This /25 route overrides the /24 route because /25 is more specific than /24. When Router C receives the ACK packet from Router B, it looks at its routing table for the 10.1.1.1 entry, and the routing table points to Router A. Router C then forwards the ACK packet back to Router A. This creates a routing loop. The packet to 10.1.1.1 loops from Router A to Router B, from Router B to Router C, and back from Router C to Router A. As a result, Router B won't process the ACK packet from Router A; Router B will think that Router A never ACK'ed the update packet, and Router B will reset the neighbor after 16 retries.
The solution for this problem: Configure the right subnet mask on Router A's Serial 0 interface to 255.255.255.0.
EIGRP Neighbor ProblemCause: Mismatched K Values
For EIGRP to establish its neighbors, the K constant value to manipulate the EIGRP metric must be the same. Refer to Chapter 6 for an explanation of the K values. In EIGRP's metric calculation, the default for the K value is set so that only the bandwidth and the delay of the interface are used to calculate the EIGRP metric. Many times, the network administrator might want other interface factors, such as load and reliability, to determine the EIGRP metric. Therefore, the K values are changed. Because only bandwidth and delay are used in calculations, the remaining K values are set to a value of 0 by default. However, the K values must be the same for all the routers, or EIGRP won't establish a neighbor relationship. Figure 7-8 shows an example of this case.
Figure 7-8 Network Vulnerable to EIGRP Neighbor Problems Because of Mismatched K Values
For the network in Figure 7-8, K1 is bandwidth and K3 is delay. The network administrator changed the K values of RTR B to all 1s from K1 to K4, while RTR A retains the default value of K1 and K3 to be 1. In this example, RTR A and RTR B will not form EIGRP neighbor relationship because the K values don't match. Example 7-5 shows the configuration for RTR B.
Example 7-5 Configuration for RTR B in Figure 7-8
RTR B#router eigrp 1 network xxxx metric weights 0 1 1 1 1 0
RTR B's configuration includes the extra metric weights command. The first number is the type of service (ToS) number, which, because it's not supported, gets a value of 0. The five numbers after the ToS are the K1 through K5 values.
Troubleshooting this problem requires careful scrutiny of the router's configuration. The solu-tion for this problem is to change all the K values to be the same on all the neighboring routers. In this example, in Router A, changing the K values to match the K value of Router B will solve the problem, as demonstrated in Example 7-6.
Example 7-6 Configuring the K Values on Router A to Match Router B
RTR A#router eigrp 1 network xxxx metric weights 0 1 1 1 1 0
EIGRP Neighbor ProblemCause: Mismatched AS Number
EIGRP won't form any neighbor relationships with neighbors in different autonomous systems. If the AS numbers are mismatched, no adjacency is formed. This problem is usually caused by misconfiguration on the routers. Figure 7-9 illustrates such a problem.
Figure 7-9 Network Experiencing an EIGRP Neighbor Problem Because Mismatched AS Numbers
In the network shown in Figure 7-9, RTR A and RTR B are in the EIGRP AS number of 1 and the proper network numbers have been configured; however, no EIGRP neighbor relationship is formed between RTR A and RTR B. Begin by checking the configuration of RTR A and RTR B in Example 7-7.
Example 7-7 Configurations for RTR A and RTR B in Figure 7-9
RTR B#show running-config interface serial 0 IP address 10.1.1.1 255.255.255.0 router eigrp 11 network 10.0.0.0 RTR A#show running-config Interface serial 0 IP address 10.1.1.2 255.255.255.0 router eigrp 1 network 10.0.0.0
You should notice the misconfiguration immediately. RTR B's Serial 0 interface is con-figured to be in EIGRP AS number 11, while RTR A's Serial 0 is configured to be in EIGRP AS number 1. Because the AS numbers don't match across the link, no EIGRP neighbor relationship will be formed. To resolve this problem, simply configure both routers with the same EIGRP AS number, as shown in Example 7-8. In this example, both routers will be configured to be in EIGRP AS 1.
Example 7-8 Configuring Both Routers with the Same EIGRP AS Numbers
RTR A#router eigrp 1 network 10.0.0.0 RTR B#router eigrp 1 network 10.0.0.0
EIGRP Neighbor ProblemCause: Stuck in Active
Sometimes, EIGRP resets the neighbor relationship because of a "stuck in active" condition. The error message is
%DUAL-3-SIA: Route network mask stuck-in-active state in IP-EIGRP AS. Cleaning up
This section discusses the method of troubleshooting the EIGRP stuck in active error.
Reviewing the EIGRP DUAL Process
To resolve an EIGRP stuck in active error, you need to understand the DUAL process in EIGRP. Refer to Chapter 6 for thorough coverage of the DUAL process, although it is reviewed here as well.
EIGRP is an advanced distance-vector protocol; it doesn't have LSA flooding, like OSPF, or a link-state protocol to tell the protocol the overall view of the network. EIGRP relies only on its neighbors for information on network reachability and availability. EIGRP keeps a list of backup routes called feasible successors. When the primary route is not available, EIGRP immediately uses the feasible successor as the backup route. This shortens convergence time. Now, if the primary route is gone and no feasible successor is available, the route is in active state. The only way for EIGRP to converge quickly is to query its neighbors about the unavailable route. If the neighbor doesn't know the status of the route, the neighbor asks its neighbors, and so on, until the edge of the network is reached. The query stops if one of the following occurs:
All queries are answered from all the neighbors.
The end of network is reached.
The lost route is unknown to the neighbors.
The problem is that, if there are no query boundaries, EIGRP potentially can ask every router in the network for a lost route. When EIGRP first queries its neighbor, a stuck in active timer starts. By default, the timer is three minutes. If, in three minutes, EIGRP doesn't receive the query response from all its neighbors, EIGRP declares that the route is stuck in active state and resets the neighbor that has not responded to the query. Figure 7-10 illustrates the query process of EIGRP when a route is lost.
Figure 7-10 Illustration of EIGRP Query Process When a Route Is Lost
In Figure 7-10, Router A lost its Ethernet interface. Because it doesn't have a feasible successor, the route becomes active and Router A queries its neighbors, Router B and Router C. Now, Router B doesn't know how to reach the lost network, so it asks its neighbors, Router D and Router E. Similarly, Router C asks its neighbors, Router F and Router G. Because Routers D, E, F, and G also don't know how to reach the lost network, they query the downstream neighbors. At this point, the edge of the network is reached and the edge router doesn't have any more neighbors to query. The edge router then replies back to Routers D, E, F, and G. Those routers reply back to Routers B and C, and finally to Router A. The query process then stops. Figure 7-10 shows the cascade effect of the EIGRP query process, in which the query travels from the original router to the edge of the network and back to the original router.
Determining Active/Stuck in Active Routes with show ip eigrp topology active
You must answer two questions to troubleshoot the EIGRP stuck in active problem:
Why is the route active?
Why is the route stuck?
Determining why the route is active is not a difficult task. Sometimes, the route that constantly is going active could be due to flapping link. Or, if the route is a host route (/32 route), it's possible that it is from a dial-in connection that gets disconnected. However, trying to deter-mine why the active route becomes stuck is a much harder taskand more important to learn. Usually, an active route gets stuck for one of the following reasons:
Bad or congested links
Low router resources, such as low memory or high CPU on the router
Long query range
Excessive redundancy
By default, the stuck in active timer is only three minutes. In other words, if the EIGRP neighbor doesn't hear a reply for the query in three minutes, neighbors are reset. This adds difficulty in troubleshooting EIGRP stuck in active because every time an active route is stuck, you have only three minutes to track down the active route query path and hopefully find the cause.
The tool that you need to troubleshoot the EIGRP stuck in active error is the show ip eigrp topology active command. This command shows what routes are currently active, how long the routes have been active, and which neighbors have and have not replied to the query. From the output, you can determine which neighbors have not replied to the query, and you can track the query path and find out the status of the query by hopping to the neighbors that have not replied. Example 7-9 shows sample output from the show ip eigrp topology active command.
Example 7-9 Sample Output of show ip eigrp topology active Command
Router#show ip eigrp topology active IP-EIGRP Topology Table for AS(1)/ID(10.1.4.2) A 20.2.1.0/24, 1 successors, FD is Inaccessible, Q 1 replies, active 00:01:43 , query-origin: Successor Origin via 10.1.3.1 (Infinity/Infinity), Serial1/0 via 10.1.4.1 (Infinity/Infinity), Serial1/1, serno 146 Remaining r eplies : Via 10.1 .5.2, r , Serial 1/2
As the output in Example 7-9 indicates, the route for 20.2.1.0 is in active state and has been active for 1 minute and 43 seconds. query-origin is Successor Origin, which means that this route's successor sends the query to this router. At this point, it has gotten replies from 10.1.3.1 and 10.1.4.1; the reply is infinity, which means that these two routers also don't know about the route 20.2.1.0. The most important output of the show ip eigrp topology active command is the Remaining replies: section. From the output of Example 7-9, this router shows that the neighbor 10.1.5.2 from interface Serial1/2 has not replied to the query.
To proceed further with troubleshooting, you must Telnet to the 10.1.5.2 router to see the status of its EIGRP active routes using the same command, show ip eigrp topology active. Sometimes, the router does not list the neighbors that have not replied to the queries under the Remaining replies: section. Example 7-10 shows another output of show ip eigrp topology active.
Example 7-10 Another Sample Output of the show ip eigrp topology active Command
Router#show ip eigrp topology active IP-EIGRP Topology Table for AS(110)/ID(175.62.8.1) A 11.11.11.0/24, 1 successors, FD is Inaccessible 1 replies, active 00:02:06 , query-origin: Successor Origin via 1.1.1.2 (Infinity/Infinity), r , Serial1/0, serno 171 via 10.1.1.2 (Infinity/Infinity), Serial1/1, serno 173
In Example 7-10, the only difference in output from Example 7-9 is the list of neighbors that have not replied to the router. However, this doesn't mean that all of the neighbors have replied to the queries. In Example 7-10, neighbor 1.1.1.2 has an r next to the address of 1.1.1.2. This also means that the neighbor has not replied to the queries. In other words, the router has two ways of representing neighbors that have not replied to the queries. One is to have them listed under the Remaining replies: section; the other is to have an r next to the neighbor interface IP address. When using the show ip eigrp topology active com-mand, the router can use any combination of these methods to represent neighbors that have not yet replied to the queries, as demonstrated in Example 7-11.
Example 7-11 Output of show ip eigrp topology active That Shows a Combination Representation of Neighbors That Have Not Replied to the Queries
Router#show ip eigrp topology active IP-EIGRP Topology Table for AS(110)/ID(175.62.8.1) A 11.11.11.0/24, 1 successors, FD is Inaccessible 1 replies, activ e 0 0:02:06 , query-origin: Successor Origin via 1. 1.1.2 (I n finity/ Infinity), r , Serial1/0, serno 171 via 10.1.1.2 (Infinity/Infinity), Serial1/1, serno 173 Remaining re plies : via 10. 1.5.2 , r , Seria l1/2
In Example 7-11, the neighbors that have not replied to the queries are 1.1.1.2 and 10.1.5.2. Only one of the nonreplying neighbors 10.1.5.2 is listed under the Remaining replies: section; the other neighbor, 1.1.1.2, that has not replied is listed with the other replying neighbor. To summarize, when issuing the show ip eigrp topology active command, the most important part to look for is the neighbors that have not replied to the query. To look for such a neighbor, look for neighbors that have the r next to their interface IP addresses.
Methodology for Troubleshooting the Stuck in Active Problem
The methods for troubleshooting an EIGRP stuck in active problem and the show ip eigrp topology active command are useful only when the problem is happening. When the stuck in active event is over and the network stabilizes, it is extremely difficult, if not impossible, to backtrack the problem and find out the cause.
Figure 7-11 shows the flowchart for troubleshooting the EIGRP stuck in active problem.
Figure 7-11 Flowchart for Resolving the EIGRP Stuck in Active Problem
Consider the network shown in Figure 7-12 for an example of troubleshooting the EIGRP stuck in active problem.
Figure 7-12 Network Topology for EIGRP Stuck in Active Troubleshooting Example
In Figure 7-12, Router A has an Ethernet interface with network 20.2.1.0/24 that just went away. Router A doesn't have a feasible successor to go to as a backup route. Router A has no choice but to put the 20.2.1.0/24 route into active state and query its neighbor, Router B. Notice the output of show ip eigrp topology active in Router A. The 20.2.1.0/24 route has gone active for 1 minute and 12 seconds, and the neighbor that has not responded is listed as 10.1.1.2 from Serial0, which is Router B. The next step is to Telnet to Router B to see the active route status in Router B. Figure 7-13 shows the active route status in Router B by performing the command show ip eigrp topology active.
Figure 7-13 Active Route Status on Router B for Troubleshooting EIGRP Stuck in Active Example
In Figure 7-13, the command show ip eigrp topology active on Router B shows that the route 20.2.1.0/24 is also in active status in Router B and that it has gone active for 1 minute and 23 seconds. Most importantly, Router B can't reply to Router A about route 20.2.1.0/24 because Router B is still waiting for the neighbor with IP address of 10.1.3.2 (Router D) from Serial1/2 to reply to the query. The next step is to go to Router D to see the status of the active route 20.2.1.0/24 and see why Router D has not replied to the query. Figure 7-14 shows the output of show ip eigrp topology active on Router D.
Figure 7-14 Active Route Status on Router D for Troubleshooting EIGRP Stuck in Active Example
Router D also put the route 20.2.1.0/24 in active state, and it has been in active state for 1 minute and 43 seconds. Router D can't answer Router B's query because Router D is waiting for the router with the IP address of 10.1.5.2 from Serial1/2 (Router E) to re-spond to the query. The next step is to go to Router E to see the status of the active route 20.2.1.0/24 and to find out why Router E is not replying to the query. Figure 7-15 shows the status of the active route on Router E.
Figure 7-15 Active Route Status on Router E for the Troubleshooting EIGRP Stuck in Active Example
The output for show ip eigrp topology active didn't show anything for Router E. This indicates that, as far as Router E is concerned, there are no routes in active state. Now you should Telnet back to Router D to double-check whether the router is still in the active state for route 20.2.1.0/24. Telnetting back to Router D shows that Router D is still in active state for route 20.2.1.0/24, but Router E doesn't have any routes in active state. What's going on?
Router A went active for route 20.2.1.0/24 and is waiting for Router B to reply to the query.
Router B can't reply because it is waiting for Router D's query response.
Router D can't reply because it is waiting for Router E to reply to the query.
Finally, the show ip eigrp topology active command in Router E shows that Router E does not think that any routes are active, while going back to Router D shows that the route 20.2.1.0/24 is still in active state.
From this sequence of events, you can see that there is clearly a discrepancy between Router D and Router E. More investigation is needed between these routers.
A look at Router D and Router E's router CPU utilization and memory usage doesn't show a problem. Both routers' CPU utilization and available memory are normal. You need to look at Router D's neighbor list to see if there is a problem with the neighbors. Example 7-12 shows Router D's EIGRP neighbor list.
Example 7-12 Router D's EIGRP Neighbor List
RTRD#show ip eigrp neighbors IP-EIGRP neighbors for process 1 H Address Interface Hold Uptime SRTT RTO Q Seq (sec) (ms) Cnt Num 2 10.1.5.2 Se1/2 13 00:00:14 0 5000 1 0 1 10.1.3.1 Se1/0 13 01:22:54 227 1362 0 385 0 10.1.4.1 Se1/1 10 01:24:08 182 1140 0 171
From Example 7-12, notice that there is a problem in Router D with EIGRP sending a reliable packet to the neighbor with IP address of 10.1.5.2 (Router E). The Q count is 1, and performing the show ip eigrp neighbors command a few times in succession shows that the Q count is not decrementing.
The RTO counter is at its maximum value of 5000 ms. This indicates that Router D is trying to send a reliable packet to Router E, but Router E never acknowledges the reliable packet back to Router D. Because Router E doesn't appear to have a high CPU or memory prob-lem, you should test the link reliability between Router D and Router E. Now send five ping packets from Router D to IP address 10.1.5.2 (Router E's serial interface) to see what happens. Example 7-13 shows the result of the ping test.
Example 7-13 Result of ping Test from Router D to Router E
Router D#ping 10.1.5.2 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 10.1.5.2, timeout is 2 seconds: ..... Success rate is 0 percent (0/5)
The ping test in Example 7-13 shows the success rate is 0 percent. This test shows that a link problem exists between Router D and Router E. The link is capable of passing a multicast packet to establish an EIGRP neighbor relationship, but it is having problems transmitting a unicast packet. This link problem is the root cause of the EIGRP stuck in active problem in this example. The way to troubleshoot the EIGRP stuck in active problem is to chase hop by hop the query path and find out the status of active route at each hop.
The aforementioned process is typical troubleshooting methodology for combatting the EIGRP stuck in active problem.
Sometimes, chasing the query path hop by hop leads to a loop, or there are simply too many neighbors that didn't reply to the query. In this case, simplify and reduce the complexity of the EIGRP topology by cutting down the redundancy. The simpler the EIGRP topology is, the simpler it is to troubleshoot an EIGRP stuck in active problem.
The ultimate solution for preventing the EIGRP stuck in active problem is to manually sum-marize the routes whenever possible and to have a hierarchical network design. The more network EIGRP summarizes, the less work EIGRP has to do when a major convergence takes place. Therefore, this reduces the number of queries being sent out and ultimately reduces the occurrence of an EIGRP stuck in active error. Figure 7-16 shows an example of a poor network design that will not scale in a large EIGRP network.
Figure 7-16 Example of a Nonscalable EIGRP Network
In Figure 7-16, each core router represents a region of the entire network and shows that there is no hierarchy in IP addressing scheme. The Core 1 router is injecting routes 1.1.1.0, 3.3.4.0, 1.1.2.0, and 2.2.3.0 into the core network. The addresses are so scattered that no manual summarization is possible. The other core routers are experiencing the same problem. The Core 3 and Core 4 routers can't summarize any routes into the core network. As a result, if the Ethernet link of the 3.3.3.0 network keeps flapping, the query would travel to the Core 3 router and then the query also would be seen in the Core 1 and Core 4 region. Ultimately, the query will traverse to all the routers in the internetwork; this would dramatically increase the likelihood of an EIGRP stuck in active problem. The best practice is to readdress the IP address scheme. One region should take only a block of IP addresses; this way, the core routers would be capable of summarizing the routes into the core, resulting in a reduced routing table in the core: The routers and the query would be contained only in one region. Figure 7-17 shows an improved and more scalable EIGRP network design.
Figure 7-17 Scalable EIGRP Network Design Improvement on Network in Figure 7-16
Comparing Figures 7-16 and 7-17, you can see that the network presented in Figure 7-17 is more structured. The Core 1 router region takes only the 1.0.0.0 block of IP addresses, the Core region 4 takes only the 2.0.0.0 block, and Core 3 region takes only the 3.0.0.0 block of IP addresses. This enables the three core routers to summarize their routes into the core. If the Ethernet network of 3.3.3.0 flaps in the Core 3 region, the query would be bounded only in the Core 3 region and would not travel the entire network to affect all the routers in the network. Summarization and hierarchy are the best design practices for a large-scale EIGRP network.