Foundation Topics
Resolving InterVLAN Routing Issues
As mentioned in Chapter 4, "Basic Cisco Catalyst Switch Troubleshooting," for traffic to pass from one VLAN to another VLAN, that traffic has to be routed. Several years ago, one popular approach to performing interVLAN routing with a Layer 2 switch was to create a router on a stick topology, where a Layer 2 switch is interconnected with a router via a trunk connection, as seen in Figure 5-1.
Figure 5-1 Router on a Stick
In Figure 5-1, router R1's Fast Ethernet 1/1/1 interface has two subinterfaces, one for each VLAN. Router R1 can route between VLANs 100 and 200, while simultaneously receiving and transmitting traffic over the trunk connection to the switch.
More recently, many switches have risen above their humble Layer 2 beginnings and started to route traffic. Some literature refers to these switches that can route as Layer 3 switches. Other sources might call such switches multilayer switches, because of the capability of a switch to make forwarding decisions based on information from multiple layers of the OSI model.
This section refers to these switches as Layer 3 switches because the focus is on the capability of the switches to route traffic based on Layer 3 information (that is, IP address information). Specifically, this section discusses troubleshooting Layer 3 switch issues and contrasts troubleshooting a Layer 3 switch versus a router.
Contrasting Layer 3 Switches with Routers
Because a Layer 3 switch performs many of the same functions as a router, it is important for a troubleshooter to distinguish between commonalities and differences in these two platforms.
Table 5-2 lists the characteristics that Layer 3 switches and routers have in common, as well as those characteristics that differ.
Table 5-2. Layer 3 Switch and Router Characteristics: Compare and Contrast
Layer 3 Switch/Router Shared Characteristics |
Layer 3 Switch/Router Differentiating Characteristics |
Both can build and maintain a routing table using both statically configured routes and dynamic routing protocols. |
Routers usually support a wider selection of interface types (for example, non-Ethernet interfaces). |
Both can make packet forwarding decisions based on Layer 3 information (for example, IP addresses). |
Switches leverage application-specific integrated circuits (ASIC) to approach wire speed throughput. Therefore, most Layer 3 switches can forward traffic faster than their router counterparts. |
A Cisco IOS version running on routers typically supports more features than a Cisco IOS version running on a Layer 3 switch, because many switches lack the specialized hardware required to run many of the features available on a router. |
Control Plane and Data Plane Troubleshooting
Many router and Layer 3 switch operations can be categorized as control plane or data plane operations. For example, routing protocols operate in a router's control plane, whereas the actual forwarding of data is handled by a router's data plane.
Fortunately, the processes involved in troubleshooting control plane operations are identical on both Layer 3 switch and router platforms. For example, the same command-line interface (CLI) commands could be used to troubleshoot an Open Shortest Path First (OSPF) issue on both types of platforms.
Data plane troubleshooting, however, can vary between Layer 3 switches and routers. For example, if you were troubleshooting data throughput issues, the commands you issued might vary between types of platforms, because Layer 3 switches and routers have fundamental differences in the way traffic is forwarded through the device.
First, consider how a router uses Cisco Express Forwarding (CEF) to efficiently forward traffic through a router. CEF creates a couple of tables that reside at the data plane. These are the forwarding information base (FIB) and the adjacency table. These tables are constructed from information collected from the router's control plane (for example, the control plane's IP routing table and Address Resolution Protocol [ARP] cache). When troubleshooting a router, you might check control plane operations with commands such as show ip route. However, if the observed traffic behavior seems to contradict information shown in the output of control plane verification commands, you might want to examine information contained in the router's CEF Forwarding Information Base (FIB) and adjacency tables. You can use the commands presented in Table 5-3 to view information contained in a router's FIB and adjacency table.
Table 5-3. Router Data Plan Verification Commands
Command |
Description |
show ip cef |
Displays the router's Layer 3 forwarding information, in addition to multicast, broadcast, and local IP addresses. |
show adjacency |
Verifies that a valid adjacency exists for a connected host. |
Example 5-1 and Example 5-2 provide sample output from the show ip cef and show adjacency commands, respectively.
Example 5-1. show ip cef Command Output
R4# show ip cef Prefix Next Hop Interface 0.0.0.0/0 10.3.3.1 FastEthernet0/0 0.0.0.0/32 receive 10.1.1.0/24 10.3.3.1 FastEthernet0/0 10.1.1.2/32 10.3.3.1 FastEthernet0/0 10.3.3.0/24 attached FastEthernet0/0 10.3.3.0/32 receive 10.3.3.1/32 10.3.3.1 FastEthernet0/0 10.3.3.2/32 receive 10.3.3.255/32 receive 10.4.4.0/24 10.3.3.1 FastEthernet0/0 10.5.5.0/24 10.3.3.1 FastEthernet0/0 10.7.7.0/24 10.3.3.1 FastEthernet0/0 10.7.7.2/32 10.3.3.1 FastEthernet0/0 10.8.8.0/24 attached FastEthernet0/1 10.8.8.0/32 receive 10.8.8.1/32 receive 10.8.8.4/32 10.8.8.4 FastEthernet0/1 10.8.8.5/32 10.8.8.5 FastEthernet0/1 10.8.8.6/32 10.8.8.6 FastEthernet0/1 10.8.8.7/32 10.8.8.7 FastEthernet0/1 10.8.8.255/32 receive 192.168.0.0/24 10.3.3.1 FastEthernet0/0 224.0.0.0/4 drop 224.0.0.0/24 receive 255.255.255.255/32 receive
Example 5-2. show adjacency Command Output
R4# show adjacency Protocol Interface Address IP FastEthernet0/0 10.3.3.1(21) IP FastEthernet0/1 10.8.8.6(5) IP FastEthernet0/1 10.8.8.7(5) IP FastEthernet0/1 10.8.8.4(5) IP FastEthernet0/1 10.8.8.5(5)
Although many Layer 3 switches also leverage CEF to efficiently route packets, some Cisco Catalyst switches take the information contained in CEF's FIB and adjacency table and compile that information into Ternary Content Addressable Memory (TCAM). This special memory type uses a mathematical algorithm to very quickly look up forwarding information.
The specific way a switch's TCAM operates depends on the switch platform. However, from a troubleshooting perspective, you can examine information stored in a switch's TCAM using the show platform series of commands on Cisco Catalyst 3560, 3750, and 4500 switches. Similarly, TCAM information for a Cisco Catalyst 6500 switch can be viewed with the show mls cef series of commands.
Comparing Routed Switch Ports and Switched Virtual Interfaces
On a router, an interface often has an IP address, and that IP address might be acting as a default gateway to hosts residing off of that interface. However, if you have a Layer 3 switch with multiple ports belonging to a VLAN, where should the IP address be configured?
You can configure the IP address for a collection of ports belonging to a VLAN under a virtual VLAN interface. This virtual VLAN interface is called a Switched Virtual Interface (SVI). Figure 5-2 shows a topology using SVIs, and Example 5-3 shows the corresponding configuration. Notice that two SVIs are created: one for each VLAN (that is, VLAN 100 and VLAN 200). An IP address is assigned to an SVI by going into interface configuration mode for a VLAN. In this example, because both SVIs are local to the switch, the switch's routing table knows how to forward traffic between members of the two VLANs.
Figure 5-2 SVI Used for Routing
Example 5-3. SVI Configuration
Cat3550# show run ...OUTPUT OMITTED... ! interface GigabitEthernet0/7switchport access vlan 100
switchport mode access ! interface GigabitEthernet0/8switchport access vlan 100
switchport mode access ! interface GigabitEthernet0/9switchport access vlan 200
switchport mode access ! interface GigabitEthernet0/10switchport access vlan 200
switchport mode access ! ...OUTPUT OMITTED... !interface Vlan100
ip address 192.168.1.1 255.255.255.0
!interface Vlan200
ip address 192.168.2.1 255.255.255.0
Although SVIs can route between VLANs configured on a switch, a Layer 3 switch can be configured to act more as a router (for example, in an environment where you are replacing a router with a Layer 3 switch) by using routed ports on the switch. Because the ports on many Cisco Catalyst switches default to operating as switch ports, you can issue the no switchport command in interface configuration mode to convert a switch port to a routed port. Figure 5-3 and Example 5-4 illustrate a Layer 3 switch with its Gigabit Ethernet 0/9 and 0/10 ports configured as routed ports.
Figure 5-3 Routed Ports on a Layer 3 Switch
Example 5-4. Configuration for Routed Ports on a Layer 3 Switch
Cat3550# show run ...OUTPUT OMITTED... ! interface GigabitEthernet0/9no switchport
ip address 192.168.1.2 255.255.255.0
! interface GigabitEthernet0/10no switchport
ip address 192.168.2.2 255.255.255.0
! ...OUTPUT OMITTED...
When troubleshooting Layer 3 switching issues, keep the following distinctions in mind between SVIs and routed ports:
- A routed port is considered to be in the down state if it is not operational at both Layer 1 and Layer 2.
- An SVI is considered to be in a down state only when none of the ports in the corresponding VLAN are active.
- A routed port does not run switch port protocols such as Spanning Tree Protocol (STP) or Dynamic Trunking Protocol (DTP).
Router Redundancy Troubleshooting
Many devices, such as PCs, are configured with a default gateway. The default gateway parameter identifies the IP address of a next-hop router. As a result, if that router were to become unavailable, devices that relied on the default gateway's IP address would be unable to send traffic off their local subnet.
Fortunately, Cisco offers technologies that provide next-hop gateway redundancy. These technologies include HSRP, VRRP, and GLBP.
This section reviews the operation of these three first-hop redundancy protocols and provides a collection of Cisco IOS commands that can be used to troubleshoot an issue with one of these three protocols.
Note that although this section discusses router redundancy, keep in mind that the term router is referencing a device making forwarding decisions based on Layer 3 information. Therefore, in your environment, a Layer 3 switch might be used in place of a router to support HSRP, VRRP, or GLBP.
HSRP
Hot Standby Router Protocol (HSRP) uses virtual IP and MAC addresses. One router, known as the active router, services requests destined for the virtual IP and MAC addresses. Another router, known as the standby router, can service such requests in the event the active router becomes unavailable. Figure 5-4 illustrates a basic HSRP topology.
Examples 5-5 and 5-6 show the HSRP configuration for routers R1 and R2.
Figure 5-4 Basic HSRP Operation
Example 5-5. HSRP Configuration on Router R1
R1# show run ...OUTPUT OMITTED... interface FastEthernet0/0 ip address 172.16.1.1 255.255.255.0standby 10 ip 172.16.1.3
standby 10 priority 150
standby 10 preempt
...OUTPUT OMITTED...
Example 5-6. HSRP Configuration on Router R2
R1# show run
...OUTPUT OMITTED...
interface Ethernet0/0
ip address 172.16.1.2 255.255.255.0
standby 10 ip 172.16.1.3
...OUTPUT OMITTED...
Notice that both routers R1 and R2 have been configured with the same virtual IP address of 172.16.1.3 for an HSRP group of 10. Router R1 is configured to be the active router with the standby 10 priority 150 command. Router R2 has a default HSRP priority of 100 for group 10, and with HSRP, higher priority values are more preferable. Also, notice that router R1 is configured with the standby 10 preempt command, which means that if router R1 loses its active status, perhaps because it is powered off, it will regain its active status when it again becomes available.
Converging After a Router Failure
By default, HSRP sends hello messages every three seconds. Also, if the standby router does not hear a hello message within ten seconds by default, the standby router considers the active router to be down. The standby router then assumes the active role.
Although this ten-second convergence time applies for a router becoming unavailable for a reason such as a power outage or a link failure, convergence happens more rapidly if an interface is administratively shut down. Specifically, an active router sends a resign message if its active HSRP interface is shut down.
Also, consider the addition of another router to the network segment whose HSRP priority for group 10 is higher than 150. If it were configured for preemption, the newly added router would send a coup message, to inform the active router that the newly added router was going to take on the active role. If, however, the newly added router were not configured for preemption, the currently active router would remain the active router.
HSRP Verification and Troubleshooting
When verifying an HSRP configuration or troubleshooting an HSRP issue, you should begin by determining the following information about the HSRP group under inspection:
- Which router is the active router
- Which routers, if any, are configured with the preempt option
- What is the virtual IP address
- What is the virtual MAC address
The show standby brief command can be used to show a router's HSRP interface, HSRP group number, and preemption configuration. Additionally, this command identifies the router that is currently the active router, the router that is currently the standby router, and the virtual IP address for the HSRP group. Examples 5-7 and 5-8 show the output from the show standby brief command issued on routers R1 and R2, where router R1 is currently the active router.
Example 5-7. show standby brief Command Output on Router R1
R1# show standby brief P indicates configured to preempt. |Interface Grp Prio P State Active Standby Virtual IP
Fa0/0 10 150 P Active local 172.16.1.2 172.16.1.3
Example 5-8. show standby brief Command Output on Router R2
R2# show standby brief P indicates configured to preempt. |Interface Grp Prio P State Active Standby Virtual IP
Et0/0 10 100 Standby 172.16.1.1 local 172.16.1.3
In addition to an interface's HSRP group number, the interface's state, and the HSRP group's virtual IP address, the show standby interface_id command also displays the HSRP group's virtual MAC address. Issuing this command on router R1, as shown in Example 5-9, shows that the virtual MAC address for HSRP group 10 is 0000.0c07.ac0a.
Example 5-9. show standby fa 0/0 Command Output on Router R1
R1# show standby fa 0/0
FastEthernet0/0 - Group 10
State is Active
1 state change, last state change 01:20:00
Virtual IP address is 172.16.1.3
Active virtual MAC address is 0000.0c07.ac0a
Local virtual MAC address is 0000.0c07.ac0a (v1 default)
Hello time 3 sec, hold time 10 sec
Next hello sent in 1.044 secs
Preemption enabled
Active router is local
Standby router is 172.16.1.2, priority 100 (expires in 8.321 sec)
Priority 150 (configured 150)
IP redundancy name is "hsrp-Fa0/0-10" (default)
The default virtual MAC address for an HSRP group, as seen in Figure 5-5, is based on the HSRP group number. Specifically, the virtual MAC address for an HSRP group begins with a vendor code of 0000.0c, followed with a well-known HSRP code of 07.ac. The last two hexadecimal digits are the hexadecimal representation of the HSRP group number. For example, an HSRP group of 10 yields a default virtual MAC address of 0000.0c07.ac0a, because 10 in decimal equates to 0a in hexadecimal.
Figure 5-5 HSRP Virtual MAC Address
Once you know the current HSRP configuration, you might then check to see if a host on the HSRP virtual IP address' subnet can ping the virtual IP address. Based on the topology previously shown in Figure 5-4, Example 5-10 shows a successful ping from Workstation A.
Example 5-10. Ping Test from Workstation A to the HSRP Virtual IP Address
C:\>ping 172.16.1.3 Pinging 172.16.1.3 with 32 bytes of data: Reply from 172.16.1.3: bytes=32 time=2ms TTL=255 Reply from 172.16.1.3: bytes=32 time=1ms TTL=255 Reply from 172.16.1.3: bytes=32 time=1ms TTL=255 Reply from 172.16.1.3: bytes=32 time=1ms TTL=255 Ping statistics for 172.16.1.3: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 1ms, Maximum = 2ms, Average = 1ms
A client could also be used to verify the appropriate virtual MAC address learned by the client corresponding to the virtual MAC address reported by one of the HSRP routers. Example 5-11 shows Workstation A's ARP cache entry for the HSRP virtual IP address of 172.16.1.3. Notice in the output that the MAC address learned via ARP does match the HSRP virtual MAC address reported by one of the HSRP routers.
Example 5-11. Workstation A's ARP Cache
C:\>arp -a
Interface: 172.16.1.4 --- 0x4
Internet Address Physical Address Type
172.16.1.3 00-00-0c-07-ac-0a dynamic
You can use the debug standby terse command to view important HSRP changes, such as a state change. Example 5-12 shows this debug output on router R2 because router R1's Fast Ethernet 0/0 interface is shut down. Notice that router R2's state changes from Standby to Active.
Example 5-12. debug standby terse Command Output on Router R2: Changing HSRP to Active
R2#
*Mar 1 01:25:45.930: HSRP: Et0/0 Grp 10 Standby: c/Active timer expired (172.16.1.1)
*Mar 1 01:25:45.930: HSRP: Et0/0 Grp 10 Active router is local, was 172.16.1.1
*Mar 1 01:25:45.930: HSRP: Et0/0 Grp 10 Standby router is unknown, was local
*Mar 1 01:25:45.930: HSRP: Et0/0 Grp 10 Standby -> Active
*Mar 1 01:25:45.930: %HSRP-6-STATECHANGE: Ethernet0/0 Grp 10 state Standby -> Active
*Mar 1 01:25:45.930: HSRP: Et0/0 Grp 10 Redundancy "hsrp-Et0/0-10" state Standby -> Active
*Mar 1 01:25:48.935: HSRP: Et0/0 Grp 10 Redundancy group hsrp-Et0/0-10 state Active -> Active
*Mar 1 01:25:51.936: HSRP: Et0/0 Grp 10 Redundancy group hsrp-Et0/0-10 state Active -> Active
When router R1's Fast Ethernet 0/0 interface is administratively brought up, router R1 reassumes its previous role as the active HSRP router for HSRP group 10, because router R1 is configured with the preempt option. The output shown in Example 5-13 demonstrates how router R2 receives a coup message, letting router R2 know that router R1 is taking back its active role.
Example 5-13. debug standby terse Command Output on Router R2: Changing HSRP to Standby
R2# *Mar 1 01:27:57.979: HSRP: Et0/0 Grp 10 Coup in 172.16.1.1 Active pri 150 vIP 172.16.1.3*Mar 1 01:27:57.979: HSRP: Et0/0 Grp 10 Active: j/Coup rcvd from higher pri
router (150/172.16.1.1)
*Mar 1 01:27:57.979: HSRP: Et0/0 Grp 10 Active router is 172.16.1.1, was local*Mar 1 01:27:57.979: HSRP: Et0/0 Grp 10 Active -> Speak
*Mar 1 01:27:57.979: %HSRP-6-STATECHANGE: Ethernet0/0 Grp 10 state Active -> Speak *Mar 1 01:27:57.979: HSRP: Et0/0 Grp 10 Redundancy "hsrp-Et0/0-10" state Active -> Speak *Mar 1 01:28:07.979: HSRP: Et0/0 Grp 10 Speak: d/Standby timer expired (unknown) *Mar 1 01:28:07.979: HSRP: Et0/0 Grp 10 Standby router is local*Mar 1 01:28:07.979: HSRP: Et0/0 Grp 10 Speak -> Standby
*Mar 1 01:28:07.979: HSRP: Et0/0 Grp 10 Redundancy "hsrp-Et0/0-10" state Speak -> Standby
VRRP
Virtual Router Redundancy Protocol (VRRP), similar to HSRP, allows a collection of routers to service traffic destined for a single IP address. Unlike HSRP, the IP address serviced by a VRRP group does not have to be a virtual IP address. The IP address can be the address of a physical interface on the virtual router master, which is the router responsible for forwarding traffic destined for the VRRP group's IP address. A VRRP group can have multiple routers acting as virtual router backups, as shown in Figure 5-6, any of which could take over in the event of the virtual router master becoming unavailable.
Figure 5-6 Basic VRRP Operation
GLBP
Global Load Balancing Protocol (GLBP) can load balance traffic destined for a next-hop gateway across a collection of routers, known as a GLBP group. Specifically, when a client sends an Address Resolution Protocol (ARP) request, in an attempt to determine the MAC address corresponding to a known IP address, GLBP can respond with the MAC address of one member of the GLBP group. The next such request would receive a response containing the MAC address of a different member of the GLBP group, as depicted in Figure 5-7. Specifically, GLBP has one active virtual gateway (AVG), which is responsible for replying to ARP requests from hosts. However, multiple routers acting as active virtual forwarders (AVFs) can forward traffic.
Figure 5-7 Basic GLBP Operation
Troubleshooting VRRP and GLBP
Because VRRP and GLBP perform a similar function to HSRP, you can use a similar troubleshooting philosophy. Much like HSRP's show standby brief command, similar information can be gleaned for VRRP operation with the show vrrp brief command and for GLBP operation with the show glbp brief command.
Although HSRP, VRRP, and GLBP have commonalities, it is important for you as a troubleshooter to understand the differences. Table 5-4 compares several characteristics of these first-hop router redundancy protocols.
Table 5-4. Comparing HSRP, VRRP, and GLBP
Characteristic |
HSRP |
VRRP |
GLBP |
Cisco proprietary |
Yes |
No |
No |
Interface IP address can act as virtual IP address |
No |
Yes |
No |
More than one router in a group can simultaneously forward traffic for that group |
No |
No |
Yes |
Hello timer default value |
3 seconds |
1 second |
3 seconds |
Cisco Catalyst Switch Performance Troubleshooting
Switch performance issues can be tricky to troubleshoot, because the problem reported is often subjective. For example, if a user reports that the network is running "slow," the user's perception might mean that the network is slow compared to what he expects. However, network performance might very well be operating at a level that is hampering productivity and at a level that is indeed below its normal level of operation. At that point, as part of the troubleshooting process, you need to determine what network component is responsible for the poor performance. Rather than a switch or a router, the user's client, server, or application could be the cause of the performance issue.
If you do determine that the network performance is not meeting technical expectations (as opposed to user expectations), you should isolate the source of the problem and diagnose the problem on that device. This section assumes that you have isolated the device causing the performance issue, and that device is a Cisco Catalyst switch.
Cisco Catalyst Switch Troubleshooting Targets
Cisco offers a variety of Catalyst switch platforms, with different port densities, different levels of performance, and different hardware. Therefore, troubleshooting one of these switches can be platform dependent. Many similarities do exist, however. For example, all Cisco Catalyst switches include the following hardware components:
- Ports: A switch's ports physically connect the switch to other network devices. These ports (also known as interfaces) allow a switch to receive and transmit traffic.
- Forwarding logic: A switch contains hardware that makes forwarding decisions. This hardware rewrites a frame's headers.
- Backplane: A switch's backplane physically interconnects a switch's ports. Therefore, depending on the specific switch architecture, frames flowing through a switch enter via a port (that is, the ingress port), flow across the switch's backplane, and are forwarded out of another port (that is, an egress port).
- Control plane: A switch's CPU and memory reside in a control plane. This control plane is responsible for running the switch's operating system.
Figure 5-8 depicts these switch hardware components. Notice that the control plane does not directly participate in frame forwarding. However, the forwarding logic contained in the forwarding hardware comes from the control plane. Therefore, there is an indirect relationship between frame forwarding and the control plane. As a result, a continuous load on the control plane could, over time, impact the rate at which the switch forwards frames. Also, if the forwarding hardware is operating at maximum capacity, the control plane begins to provide the forwarding logic. So, although the control plane does not architecturally appear to impact switch performance, it should be considered when troubleshooting.
Figure 5-8 Cisco Catalyst Switch Hardware Components
The following are two common troubleshooting targets to consider when diagnosing a suspected switch issue:
- Port errors
- Mismatched duplex settings
The sections that follow evaluate these target areas in greater detail.
Port Errors
When troubleshooting a suspected Cisco Catalyst switch issue, a good first step is to check port statistics. For example, examining port statistics can let a troubleshooter know if an excessive number of frames are being dropped. If a TCP application is running slow, the reason might be that TCP flows are going into TCP slow start, which causes the window size, and therefore the bandwidth efficiency, of TCP flows to be reduced. A common reason that a TCP flow enters slow start is packet drops. Similarly, packet drops for a UDP flow used for voice or video could result in noticeable quality degradation, because dropped UDP segments are not retransmitted.
Although dropped frames are most often attributed to network congestion, another possibility is that the cabling could be bad. To check port statistics, a troubleshooter could leverage a show interfaces command. Consider Example 5-14, which shows the output of the show interfaces gig 0/9 counters command on a Cisco Catalyst 3550 switch. Notice that this output shows the number of inbound and outbound frames seen on the specified port.
Example 5-14. show interfaces gig 0/9 counters Command Output
SW1# show interfaces gig 0/9 countersPort InOctets InUcastPkts InMcastPkts InBcastPkts
Gi0/9 31265148 20003 3179 1
Port OutOctets OutUcastPkts OutMcastPkts OutBcastPkts
Gi0/9 18744149 9126 96 6
To view errors that occurred on a port, you could add the keyword of errors after the show interfaces interface_id counters command. Example 5-15 illustrates sample output from the show interfaces gig 0/9 counters errors command.
Example 5-15. show interfaces gig 0/9 counters errors Command Output
SW1# show interfaces gig 0/9 counters errorsPort Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize
Gi0/9 0 0 0 0 0Port Single-Col Multi-Col Late-Col Excess-Col Carri-Sen Runts Giants
Gi0/9 5603 0 5373 0 0 0 0
Table 5-5 provides a reference for the specific errors that might show up in the output of the show interfaces interface_id counters errors command.
Table 5-5. Errors in the show interfaces interface_id counters errors Command
Error Counter |
Description |
Align-Err |
An alignment error occurs when frames do not end with an even number of octets, while simultaneously having a bad Cyclic Redundancy Check (CRC). An alignment error normally suggests a Layer 1 issue, such as cabling or port (either switch port or NIC port) issues. |
FCS-Err |
A Frame Check Sequence (FCS) error occurs when a frame has an invalid checksum, although the frame has no framing errors. Like the Align-Err error, an FCS-Err often points to a Layer 1 issue. |
Xmit-Err |
A transmit error (that is, Xmit-Err) occurs when a port's transmit buffer overflows. A speed mismatch between inbound and outbound links often results in a transmit error. |
Rcv-Err |
A receive error (that is, Rcv-Err) occurs when a port's receive buffer overflows. Congestion on a switch's backplane could cause the receive buffer on a port to fill to capacity, as frames await access to the switch's backplane. However, most likely, a Rcv-Err is indicating a duplex mismatch. |
UnderSize |
An undersize frame is a frame with a valid checksum but a size less than 64 bytes. This issue suggests that a connected host is sourcing invalid frame sizes. |
Single-Col |
A Single-Col error occurs when a single collisions occurs before a port successfully transmits a frame. High bandwidth utilization on an attached link or a duplex mismatch are common reasons for a Single-Col error. |
Multi-Col |
A Multi-Col error occurs when more than one collision occurs before a port successfully transmits a frame. Similar to the Single-Col error, high bandwidth utilization on an attached link or a duplex mismatch are common reasons for a Multi-Col error. |
Late-Col |
A late collision is a collision that is not detected until well after the frame has begun to be forwarded. While a Late-Col error could indicate that the connected cable is too long, this is an extremely common error seen in mismatched duplex conditions. |
Excess-Col |
The Excess-Col error occurs when a frame experienced sixteen successive collisions, after which the frame was dropped. This error could result from high bandwidth utilization, a duplex mismatch, or too many devices on a segment. |
Carri-Sen |
The Carri-Sen counter is incremented when a port wants to send data on a half-duplex link. This is normal and expected on a half-duplex port, because the port is checking the wire, to make sure no traffic is present, prior to sending a frame. This operation is the carrier sense procedure described by the Carrier Sense Multiple Access with Collision Detect (CSMA/CD) operation used on half-duplex connections. Full-duplex connections, however, do not use CSMA/CD. |
Runts |
A runt is a frame that is less than 64 bytes in size and has a bad CRC. A runt could result from a duplex mismatch or a Layer 1 issue. |
Giants |
A giant is a frame size greater than 1518 bytes (assuming the frame is not a jumbo frame) that has a bad FCS. Typically, a giant is caused by a problem with the NIC in an attached host. |
Mismatched Duplex Settings
As seen in Table 5-5, duplex mismatches can cause a wide variety of port errors. Keep in mind that almost all network devices, other than shared media hubs, can run in full-duplex mode. Therefore, if you have no hubs in your network, all devices should be running in full-duplex mode.
A new recommendation from Cisco is that switch ports be configured to autonegotiate both speed and duplex. Two justifications for this recommendation are as follows:
- If a connected device only supported half-duplex, it would be better for a switch port to negotiate down to half-duplex and run properly than being forced to run full-duplex which would result in multiple errors.
- The automatic medium-dependent interface crossover (auto-MDIX) feature can automatically detect if a port needs a crossover or a straight-through cable to interconnect with an attached device and adjust the port to work regardless of which cable type is connected. You can enable this feature in interface configuration mode with the mdix auto command on some models of Cisco Catalyst switches. However, the auto-MDIX feature requires that the port autonegotiate both speed and duplex.
In a mismatched duplex configuration, a switch port at one end of a connection is configured for full-duplex, whereas a switch port at the other end of a connection is configured for half-duplex. Among the different errors previously listed in Table 5-5, two of the biggest indicators of a duplex mismatch are a high Rcv-Err counter or a high Late-Col counter. Specifically, a high Rcv-Err counter is common to find on the full-duplex end of a connection with a mismatched duplex, while a high Late-Col counter is common on the half-duplex end of the connection.
To illustrate, examine Examples 5-16 and 5-17, which display output based on the topology depicted in Figure 5-9. Example 5-16 shows the half-duplex end of a connection, and Example 5-17 shows the full-duplex end of a connection.
Figure 5-9 Topology with Duplex Mismatch
Example 5-16. Output from the show interfaces gig 0/9 counters errors and the show interfaces gig 0/9 | include duplex Commands on a Half-Duplex Port
SW1# show interfaces gig 0/9 counters errors Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize Gi0/9 0 0 0 0 0Port Single-Col Multi-Col Late-Col Excess-Col Carri-Sen Runts Giants
Gi0/9 5603 0 5373 0 0 0 0
SW1# show interfaces gig 0/9 | include duplexHalf-duplex, 100Mb/s, link type is auto, media type is 10/100/1000BaseTX
SW1# show interfaces gig 0/9 counters errors
Example 5-17. Output from the show interfaces fa 5/47 counters errors and the show interfaces fa 5/47 | include duplex Commands on a Full-Duplex Port
SW2# show interfaces fa 5/47 counters errorsPort Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize OutDiscards
Fa5/47 0 5248 0 5603 27 0
Port Single-Col Multi-Col Late-Col Excess-Col Carri-Sen Runts Giants Fa5/47 0 0 0 0 0 227 0 Port SQETest-Err Deferred-Tx IntMacTx-Err IntMacRx-Err Symbol-Err Fa5/47 0 0 0 0 0 SW2# show interfaces fa 5/47 include duplexFull-duplex, 100Mb/s
SW1# show interfaces gig 0/9 counters errors
In your troubleshooting, even if you only have access to one of the switches, if you suspect a duplex mismatch, you could change the duplex settings on the switch over which you do have control. Then, you could clear the interface counters to see if the errors continue to increment. You could also perform the same activity (for example, performing a file transfer) the user was performing when he noticed the performance issue. By comparing the current performance to the performance experienced by the user, you might be able to conclude that the problem has been resolved by correcting a mismatched duplex configuration.
TCAM Troubleshooting
As previously mentioned, the two primary components of forwarding hardware are forwarding logic and backplane. A switch's backplane, however, is rarely the cause of a switch performance issue, because most Cisco Catalyst switches have high-capacity backplanes. However, it is conceivable that in a modular switch chassis, the backplane will not have the throughput to support a fully populated modular chassis, where each card in the chassis supports the highest combination of port densities and port speeds.
The architecture of some switches allows groups of switch ports to be handled by separated hardware. Therefore, you might experience a performance gain by simply moving a cable from one switch port to another. However, to strategically take advantage of this design characteristic, you must be very familiar with the architecture of the switch with which you are working.
A multilayer switch's forwarding logic can impact switch performance. Recall that a switch's forwarding logic is compiled into a special type of memory called ternary content addressable memory (TCAM), as illustrated in Figure 5-10. TCAM works with a switch's CEF feature to provide extremely fast forwarding decisions. However, if a switch's TCAM is unable, for whatever reason, to forward traffic, that traffic is forwarded by the switch's CPU, which has a limited forwarding capability.
Figure 5-10 Populating the TCAM
The process of the TCAM sending packets to a switch's CPU is called punting. Consider a few reasons why a packet might be punted from a TCAM to its CPU:
- Routing protocols, in addition to other control plane protocols such as STP, that send multicast or broadcast traffic will have that traffic sent to the CPU.
- Someone connecting to a switch administratively (for example, establishing a Telnet session with the switch) will have their packets sent to the CPU.
- Packets using a feature not supported in hardware (for example, packets traveling over a GRE tunnel) are sent to the CPU.
- If a switch's TCAM has reached capacity, additional packets will be punted to the CPU. A TCAM might reach capacity if it has too many installed routes or configured access control lists.
From the events listed, the event most likely to cause a switch performance issue is a TCAM filling to capacity. Therefore, when troubleshooting switch performance, you might want to investigate the state of the switch's TCAM. Please be sure to check documentation for your switch model, because TCAM verification commands can vary between platforms.
As an example, the Cisco Catalyst 3550 Series switch supports a collection of show tcam commands, whereas Cisco Catalyst 3560 and 3750 Series switches support a series of show platform tcam commands. Consider the output from the show tcam inacl 1 statistics command issued on a Cisco Catalyst 3550 switch, as shown in Example 5-18. The number 1 indicates TCAM number one, because the Cisco Catalyst 3550 has three TCAMs. The inacl refers to access control lists applied in the ingress direction. Notice that fourteen masks are allocated, while 402 are available. Similarly, seventeen entries are currently allocated, and 3311 are available. Therefore, you could conclude from this output that TCAM number one is not approaching capacity.
Example 5-18. show tcam inacl 1 statistics Command Output on a Cisco Catalyst 3550 Series Switch
Cat3550# show tcam inacl 1 statistics Ingress ACL TCAM#1: Number of active labels: 3Ingress ACL TCAM#1: Number of masks allocated: 14, available: 402
Ingress ACL TCAM#1: Number of entries allocated: 17, available: 3311
On some switch models (for example, a Cisco Catalyst 3750 platform), you can use the show platform ip unicast counts command to see if a TCAM allocation has failed. Similarly, you can use the show controllers cpu-interface command to display a count of packets being forwarded to a switch's CPU.
On most switch platforms, TCAMs cannot be upgraded. Therefore, if you conclude that a switch's TCAM is the source of the performance problems being reported, you could either use a switch with higher-capacity TCAMs or reduce the number of entries in a switch's TCAM. For example, you could try to optimize your access control lists or leverage route summarization to reduce the number of route entries maintained by a switch's TCAM. Also, some switches (for example, Cisco Catalyst 3560 or 3750 Series switches) enable you to change the amount of TCAM memory allocated to different switch features. For example, if your switch ports were configured as routing ports, you could reduce the amount of TCAM space used for storing MAC addresses, and instead use that TCAM space for Layer 3 processes.
High CPU Utilization Level Troubleshooting
The load on a switch's CPU is often low, even under high utilization, thanks to the TCAM. Because the TCAM maintains a switch's forwarding logic, the CPU is rarely tasked to forward traffic. The show processes cpu command that you earlier learned for use on a router can also be used on a Cisco Catalyst switch to display CPU utilization levels, as demonstrated in Example 5-19.
Example 5-19. show processes cpu Command Output on a Cisco Catalyst 3550 Series Switch
Cat3550# show processes cpu
CPU utilization for five seconds: 19%/15%; one minute: 20%; five minutes: 13%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
1 0 4 0 0.00% 0.00% 0.00% 0 Chunk Manager
2 0 610 0 0.00% 0.00% 0.00% 0 Load Meter
3 128 5 25600 0.00% 0.00% 0.00% 0 crypto sw pk pro
4 2100 315 6666 0.00% 0.05% 0.05% 0 Check heaps
...OUTPUT OMITTED...
Notice in the output in Example 5-19 that the switch is reporting a 19 percent CPU load, with 15 percent of the CPU load used for interrupt processing. The difference between these two numbers is 4, suggesting that 4 percent of the CPU load is consumed with control plane processing.
Although such load utilization values might not be unusual for a router, these values might be of concern for a switch. Specifically, a typical CPU load percentage dedicated to interrupt processing is no more than five percent. A value as high as ten percent is considered acceptable. However, the output given in Example 5-19 shows a fifteen percent utilization. Such a high level implies that the switch's CPU is actively involved in forwarding packets that should normally be handled by the switch's TCAM. Of course, this value might only be of major concern if it varies from baseline information. Therefore, your troubleshooting efforts benefit from having good baseline information.
Periodic spikes in processor utilization are also not a major cause for concern if such spikes can be explained. Consider the following reasons that might cause a switch's CPU utilization to spike:
- The CPU processing routing updates
- Issuing a debug command (or other processor-intensive commands)
- Simple Network Management Protocol (SNMP) being used to poll network devices
If you determine that a switch's high CPU load is primarily the result of interrupts, you should examine the switch's packet switching patterns and check the TCAM utilization. If, however, the high CPU utilization is primarily the result of processes, you should investigate those specific processes.
A high CPU utilization on a switch might be a result of STP. Recall that an STP failure could lead to a broadcast storm, where Layer 2 broadcast frames endlessly circulate through a network. Therefore, when troubleshooting a performance issue, realize that a switch's high CPU utilization might be a symptom of another issue.
Trouble Ticket: HSRP
This trouble ticket focuses on HSRP. HSRP was one of three first-hop redundancy protocols discussed in this chapter's "Router Redundancy Troubleshooting" section.
Trouble Ticket #2
You receive the following trouble ticket:
- A new network technician configured HSRP on routers BB1 and BB2, where BB1 was the active router. The configuration was initially working; however, now BB2 is acting as the active router, even though BB1 seems to be operational.
This trouble ticket references the topology shown in Figure 5-11.
Figure 5-11 Trouble Ticket #2 Topology
As you investigate this issue, you examine baseline data collected after HSRP was initially configured. Examples 5-20 and 5-21 provide show and debug command output collected when HSRP was working properly. Notice that router BB1 was acting as the active HSRP router, whereas router BB2 was acting as the standby HSRP router.
Example 5-20. Baseline Output for Router BB1
BB1# show standby brief P indicates configured to preempt. |Interface Grp Prio P State Active Standby Virtual IP
Fa0/1 1 150 Active local 172.16.1.3 172.16.1.4
BB1# debug standby HSRP debugging is on*Mar 1 01:14:21.487: HSRP: Fa0/1 Grp 1 Hello in 172.16.1.3 Standby pri 100 vIP 172.16.1.4
*Mar 1 01:14:23.371: HSRP: Fa0/1 Grp 1 Hello out 172.16.1.1 Active pri 150 vIP 172.16.1.4
BB1# u all All possible debugging has been turned off BB1# show standby fa 0/1 1 FastEthernet0/1 - Group 1State is Active
10 state changes, last state change 00:12:40Virtual IP address is 172.16.1.4
Active virtual MAC address is 0000.0c07.ac01 Local virtual MAC address is 0000.0c07.ac01 (v1 default) Hello time 3 sec, hold time 10 sec Next hello sent in 1.536 secs Preemption disabled Active router is local Standby router is 172.16.1.3, priority 100 (expires in 9.684 sec) Priority 150 (configured 150) IP redundancy name is "hsrp-Fa0/1-1" (default)BB1# show run
...OUTPUT OMITTED... hostname BB1 ! interface Loopback0 ip address 10.3.3.3 255.255.255.255 ! interface FastEthernet0/0 ip address 10.1.2.1 255.255.255.0 !interface FastEthernet0/1
ip address 172.16.1.1 255.255.255.0standby 1 ip 172.16.1.4
standby 1 priority 150
! router ospf 1 network 0.0.0.0 255.255.255.255 area 0
Example 5-21. Baseline Output for Router BB2
BB2# show standby brief P indicates configured to preempt. |Interface Grp Prio P State Active Standby Virtual IP
Fa0/1 1 100 Standby 172.16.1.1 local 172.16.1.4
BB2# show run ...OUTPUT OMITTED... hostname BB2 ! interface Loopback0 ip address 10.4.4.4 255.255.255.255 ! interface FastEthernet0/0 ip address 10.1.2.2 255.255.255.0 ! interface FastEthernet0/1 ip address 172.16.1.3 255.255.255.0standby 1 ip 172.16.1.4
! router ospf 1 network 0.0.0.0 255.255.255.255 area 0
As part of testing the initial configuration, a ping was sent to the virtual IP address of 172.16.1.4 from router R2 in order to confirm that HSRP was servicing requests for that IP address. Example 5-22 shows the output from the ping command.
Example 5-22. PINGing the Virtual IP Address from Router R2
R2# ping 172.16.1.4
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.1.4, timeout is 2 seconds:
!!!!!
As you begin to gather information about the reported problem, you reissue the show standby brief command on routers BB1 and BB2. As seen in Examples 5-23 and 5-24, router BB1 is administratively up with an HSRP priority of 150, whereas router BB2 is administratively up with a priority of 100.
Example 5-23. Examining the HSRP State of Router BB1's FastEthernet 0/1 Interface
BB1# show standby brief P indicates configured to preempt. |Interface Grp Prio P State Active Standby Virtual IP
Fa0/1 1 150 Standby 172.16.1.3 local 172.16.1.4
Example 5-24. Examining the HSRP State of Router BB2's FastEthernet 0/1 Interface
BB2# show standby brief P indicates configured to preempt. |Interface Grp Prio P State Active Standby Virtual IP
Fa0/1 1 100 Active local 172.16.1.1 172.16.1.4
Take a moment to look through the baseline information, the topology, and the show command output. Then, hypothesize the underlying cause, explaining why router BB2 is currently the active HSRP router, even thought router BB1 has a higher priority. Finally, on a separate sheet of paper, write out a proposed action plan for resolving the reported issue.
Suggested Solution
Upon examination of BB1's output, it becomes clear that the preempt feature is not enabled for the Fast Ethernet 0/1 interface on BB1. The absence of the preempt feature explains the reported symptom. Specifically, if BB1 had at one point been the active HSRP router for HSRP group 1, and either router BB1 or its Fast Ethernet 0/1 interface became unavailable, BB2 would have become the active router. Then, if BB1 or its Fast Ethernet 0/1 interface once again became available, BB1 would assume a standby HSRP role, because BB1's FastEthernet 0/1 interface was not configured for the preempt feature.
To resolve this configuration issue, the preempt feature is added to BB1's Fast Ethernet 0/1 interface, as shown in Example 5-25. After enabling the preempt feature, notice that router BB1 regains its active HSRP role.
Example 5-25. Enabling the Preempt Feature on Router BB1's FastEthernet 0/1 Interface
BB1# conf term Enter configuration commands, one per line. End with CNTL/Z. BB1(config)#int fa 0/1 BB1(config-if)#standby 1 preempt BB1(config-if)#end BB1#*Mar
1 01:17:39.607: %HSRP-5-STATECHANGE: FastEthernet0/1 Grp 1 state Standby ->Active
BB1#show standby briefP indicates configured to preempt.
|Interface Grp Prio P State Active Standby Virtual IP
Fa0/1 1 150 P Active local 172.16.1.3 172.16.1.4