Problems Associated with Extended Layer 2 Networks
A common practice is to add redundancy when interconnecting data centers to avoid split-subnet scenarios and interruption of the communication between servers, as illustrated in Figure 1-2. The split-subnet is not necessarily a problem if the routing metric makes one site preferred over the other. Also, if the servers at each site are part of a cluster and the communication is lost, mechanisms such as the quorum disk avoid a splitbrain condition.
Figure 1-2 Layout of multiple data center interconnect with redundant N-PEs in each data center.
Adding redundancy to an extended Ethernet network typically means relying on STP to keep the topology loop free. STP domains should be reduced as much as possible and limited within the data center. Cisco does not recommend deploying the legacy 802.1d because of its old timer-based mechanisms that make the recovery time too slow for most applications, including typical clustering software.
An extended Layer 2 network does introduce some problems to contend with, however.
STP operates at Layer 2 of the Open Systems Interconnection (OSI) model, and the primary function STP is to prevent loops that redundant links create in bridge networks. By exchanging bridge protocol data units (BPDU) between bridges, STP elects the ports that eventually forward or block traffic.
The conservative default values for the STP timers impose a maximum network diameter of seven. Therefore, two bridges cannot be more than seven hops away from each other.
When a BPDU propagates from the root bridge toward the leaves of the tree, the age field increments each time the BPDU goes through a bridge. Eventually, the bridge discards the BPDU when the age field goes beyond maximum age. Therefore, convergence of the spanning tree is affected if the root bridge is too far away from some bridges in the network.
An aggressive value for the max-age parameter and the forward delay can lead to an unstable STP topology. In such cases, the loss of some BPDUs can cause a loop to appear. Take special care if you plan to change STP timers from the default value to achieve faster STP convergence.
Unlike legacy STP, Rapid STP (RSTP) converges faster because it does not depend on the timers to make a rapid transition. However, STP does not provide the required robustness for large-scale Layer 2 deployments:
- Network stability is compromised as a result of slow response to network failures (slow convergence). Even new spanning-tree developments such as RSTP and Multiple Spanning Tree (MST) assume good-quality physical connections such as dark fiber or WDM connections. These STP protocols are not built to accommodate frequent link-flapping conditions, high error rates, unidirectional failures, or nonreport of loss of signal. These typical and frequent behaviors of long- and medium-distance links could lead to STP slow convergence or even instability.
- The primary reason for multisite data centers is disaster recovery. However, because data centers typically require Layer 2 connectivity, failure in one data center can affect other data centers, which could lead to a blackout of all data centers at the same time.
- A broadcast storm propagates to every data center, which, if uncontrolled, could result in network-wide outage.
- STP blocks links, which prevents load balancing of traffic across redundant paths in the core network.