Fully Redundant Layer 2 and Layer 3 Designs
Up to this point, all the topologies that have been presented are fully redundant. This section explains the various aspects of a redundant and scalable Data Center design by presenting multiple possible design alternatives, highlighting sound practices, and pointing out practices to be avoided.
The Need for Redundancy
Figure 4-12 explains the steps in building a redundant topology.
Figure 4-12 depicts the logical steps in designing the server farm infrastructure. The process starts with a Layer 3 switch that provides ports for direct server connectivity and routing to the core. A Layer 2 switch could be used, but the Layer 3 switch limits the broadcasts and flooding to and from the server farms. This is option a in Figure 4-12. The main problem with the design labeled a is that there are multiple single point of failure problems: There is a single NIC and a single switch, and if the NIC or switch fails, the server and applications become unavailable.
The solution is twofold:
Make the components of the single switch redundant, such as dual power supplies and dual supervisors.
Add a second switch.
Redundant components make the single switch more tolerant, yet if the switch fails, the server farm is unavailable. Option b shows the next step, in which a redundant Layer 3 switch is added.
Figure 4-12 Multilayer Redundant Design
By having two Layer 3 switches and spreading servers on both of them, you achieve a higher level of redundancy in which the failure of one Layer 3 switch does not completely compromise the application environment. The environment is not completely compromised when the servers are dual-homed, so if one of the Layer 3 switches fails, the servers still can recover by using the connection to the second switch.
In options a and b, the port density is limited to the capacity of the two switches. As the demands for more ports increase for the server and other service devices, and when the maximum capacity has been reached, adding new ports becomes cumbersome, particularly when trying to maintain Layer 2 adjacency between servers.
The mechanism used to grow the server farm is presented in option c. You add Layer 2 access switches to the topology to provide direct server connectivity. Figure 4-12 depicts the Layer 2 switches connected to both Layer 3 aggregation switches. The two uplinks, one to each aggregation switch, provide redundancy from the access to the aggregation switches, giving the server farm an alternate path to reach the Layer 3 switches.
The design described in option c still has a problem. If the Layer 2 switch fails, the servers lose their only means of communication. The solution is to dual-home servers to two different Layer 2 switches, as depicted in option d of Figure 4-12.
NOTE
Throughout this book, the terms access layer and access switches refer to the switches used to provide port density. The terms aggregation layer and aggregation switches refer to the switches used both to aggregate the traffic to and from the access switches and to connect service devices (load balancers, SSL offloaders, firewalls, caches, and so on).
The aggregation switches are Layer 3 switches, which means that they have a built-in router that can forward traffic at wire speed.
The access switches are predominantly Layer 2 switches, yet they could be Layer 3 switches merely operating in Layer 2 mode for the server farms.
Layer 2 and Layer 3 in Access Layer
Option d in Figure 4-12 is detailed in option a of Figure 4-13.
Figure 4-13 Layer 3 and Layer 2 in the Data Center
Figure 4-13 presents the scope of the Layer 2 domain(s) from the servers to the aggregation switches. Redundancy in the Layer 2 domain is achieved mainly by using spanning tree, whereas in Layer 3, redundancy is achieved through the use of routing protocols.
Historically, routing protocols have proven more stable than spanning tree, which makes one question the wisdom of using Layer 2 instead of Layer 3 at the access layer. This topic was discussed previously in the "Need for Layer 2 at the Access Layer" section. As shown in option b in Figure 4-13, using Layer 2 at the access layer does not prevent the building of pure Layer 3 designs because of the routing between the access and distribution layer or the supporting Layer 2 between access switches.
The design depicted in option a of Figure 4-13 is the most generic design that provides redundancy, scalability, and flexibility. Flexibility relates to the fact that the design makes it easy to add service appliances at the aggregation layer with minimal changes to the rest of the design. A simpler design such as that depicted in option b of Figure 4-13 might better suit the requirements of a small server farm.
Layer 2, Loops, and Spanning Tree
The Layer 2 domains should make you think immediately of loops. Every network designer has experienced Layer 2 loops in the network. When Layer 2 loops occur, packets are replicated an infinite number of times, bringing down the network. Under normal conditions, the Spanning Tree Protocol keeps the logical topology free of loops. Unfortunately, physical failures such as unidirectional links, incorrect wiring, rogue bridging devices, or bugs can cause loops to occur.
Fortunately, the introduction of 802.1w has addressed many of the limitations of the original spanning tree algorithm, and features such as Loopguard fix the issue of malfunctioning transceivers or bugs.
Still, the experience of deploying legacy spanning tree drives network designers to try to design the Layer 2 topology free of loops. In the Data Center, this is sometimes possible. An example of this type of design is depicted in Figure 4-14. As you can see, the Layer 2 domain (VLAN) that hosts the subnet 10.0.0.x is not trunked between the two aggregation switches, and neither is 10.0.1.x. Notice that GigE3/1 and GigE3/2 are not bridged together.
Figure 4-14 Loop-Free Layer 2 Design
TIP
It is possible to build a loop-free access layer if you manage to keep subnets specific to a single access switch. If subnets must span multiple access switches, you should have a "looped" topology. This is the case when you have dual-attached servers because NIC cards configured for "teaming" typically use a floating IP and MAC address, which means that both interfaces belong to the same subnet.
Keep in mind that a "loop-free" topology is not necessarily better. Specific requirements such as those mandated by content switches actually might require the additional path provided by a "looped" topology.
Also notice that a "looped" topology simply means that any Layer 2 device can reach any other Layer 2 device from at least two different physical paths. This does not mean that you have a "forwarding loop," in which packets are replicated infinite times: Spanning tree prevents this from happening.
In a "looped" topology, malfunctioning switches can cause Layer 2 loops. In a loop-free topology, there is no chance for a Layer 2 loop because there are no redundant Layer 2 paths.
If the number of ports must increase for any reason (dual-attached servers, more servers, and so forth), you could follow the approach of daisy-chaining Layer 2 switches, as shown in Figure 4-15.
Figure 4-15 Alternate Loop-Free Layer 2 Design
To help you visualize a Layer 2 loop-free topology, Figure 4-15 shows each aggregation switch broken up as a router and a Layer 2 switch.
The problem with topology a is that breaking the links between the two access switches would create a discontinuous subnetthis problem can be fixed with an EtherChannel between the access switches.
The other problem occurs when there are not enough ports for servers. If a number of servers need to be inserted into the same subnet 10.0.0.x, you cannot add a switch between the two existing servers, as presented in option b of Figure 4-15. This is because there is no workaround to the failure of the middle switch, which would create a split subnet. This design is not intrinsically wrong, but it is not optimal.
Both the topologies depicted in Figures 4-14 and 4-15 should migrate to a looped topology as soon as you have any of the following requirements:
An increase in the number of servers on a given subnet
Dual-attached NIC cards
The spread of existing servers for a given subnet on a number of different access switches
The insertion of stateful network service devices (such as load balancers) that operate in active/standby mode
Options a and b in Figure 4-16 show how introducing additional access switches on the existing subnet creates "looped topologies." In both a and b, GigE3/1 and GigE3/2 are bridged together.
Figure 4-16 Redundant Topologies with Physical Layer 2 Loops
If the requirement is to implement a topology that brings Layer 3 to the access layer, the topology that addresses the requirements of dual-attached servers is pictured in Figure 4-17.
Figure 4-17 Redundant Topology with Layer 3 to the Access Switches
Notice in option a of Figure 4-17, almost all the links are Layer 3 links, whereas the access switches have a trunk (on a channel) to provide the same subnet on two different switches. This trunk also carries a Layer 3 VLAN, which basically is used merely to make the two switches neighbors from a routing point of view. The dashed line in Figure 4-17 shows the scope of the Layer 2 domain.
Option b in Figure 4-17 shows how to grow the size of the server farm with this type of design. Notice that when deploying pairs of access switches, each pair has a set of subnets disjointed from the subnets of any other pair. For example, one pair of access switches hosts subnets 10.0.1.x and 10.0.2.x; the other pair cannot host the same subnets simply because it connects to the aggregation layer with Layer 3 links.
NOTE
If you compare the design in Figure 4-17 with option b in Figure 4-12, the natural questions are these: Why is there an aggregation layer, and are the access switches not directly connected to the core? These are valid points, and the answer actually depends on the size of the Data Center. Remember that the access layer is added for reasons of port density, whereas the aggregation layer is used mainly to attach appliances, such as load-balancing devices, firewalls, caches, and so on.
So far, the discussions have centered on redundant Layer 2 and Layer 3 designs. The Layer 3 switch provides the default gateway for the server farms in all the topologies introduced thus far. Default gateway support, however, could also be provided by other service devices, such as load balancers and firewalls. The next section explores the alternatives.