Detecting, Troubleshooting, and Preventing Congestion in Storage Networks
- By Paresh Gupta, Edward Mazurek
- Published Feb 9, 2024 by Cisco Press. Part of the Networking Technology series.
eBook
- Your Price: $53.59
- List Price: $66.99
- Includes EPUB and PDF
- About eBook Formats
This eBook includes the following formats, accessible from your Account page after purchase:
EPUB The open industry format known for its reflowable content and usability on supported mobile devices.
PDF The popular standard, used most often with the free Acrobat® Reader® software.
This eBook requires no passwords or activation to read. We customize your eBook by discreetly watermarking it with your name, making it uniquely yours.
Also available in other formats.
- Copyright 2024
- Edition: 1st
- eBook
- ISBN-10: 0-13-788715-9
- ISBN-13: 978-0-13-788715-6
The complete practical guide to architect storage networks for maximum efficiency, and to quickly troubleshoot congestion issues.
Data is the most important entity in a data center. In addition to being stored securely for the long term, data must be accessible to applications with low latency so that high performance can be maintained 24/7. To meet this performance goal of applications, the storage infrastructure services must be designed accordingly.
In Detecting, Troubleshooting, and Preventing Congestion in Storage Networks, leading Cisco experts take a practical approach to explaining the congestion handling mechanisms of the transport technologies like FC, FCoE, RoCE, and TCP. The authors share proven troubleshooting methodology developed through years of firsthand experience as well as analytical techniques for monitoring the storage fabrics and gaining predictive insights.
Through real-world experiences and case study examples, you'll learn what questions to ask, how to start planning, what exists today, what components still must evolve, and how to drive value in building custom applications in detecting congestion in large-scale storage networks.
- Optimize user experience with faster resolution of congestion in storage networks in production data centers
- Master congestion handling mechanisms in technologies like FC, FCoE, RoCE, and TCP
- Applicable to networks connected to all storage array types and vendors including Dell, HPE, IBM, NetApp, Hitachi
- Real-world case studies and troubleshooting methodology ensuring storage SLAs are consistently met
- Increase uptime with custom analytical tools for predicting and resolving congestion
- Boost storage infrastructure efficiency in a hybrid cloud model
- Save on employee training and reduce support ticket hassles with vendors
Table of Contents
Introduction xxxii
Chapter 1 Introduction to Congestion in Storage Networks 1
Types of Storage in a Data Center 1
Storage Protocols, Transports, and Networks 6
Storage Networks 21
Congestion in Storage Networks: An Overview 28
NVMe over Fabrics 43
Quality of Service (QoS) 46
Summary 51
References 52
Chapter 2 Understanding Congestion in Fibre Channel Fabrics 55
Fibre Channel Flow Control 55
Congestion Spreading in Fibre Channel Fabrics 67
Frame Flow Within a Fibre Channel Switch 86
The Effects of Bit Errors on Congestion 92
B2B Credit Loss and Recovery 112
Fibre Channel Counters Summary 123
Summary 127
References 127
Chapter 3 Detecting Congestion in Fibre Channel Fabrics 129
Congestion Detection Workflow 129
Congestion Detection Metrics 135
Congestion Detection Metrics on Cisco MDS Switches 137
Automatic Alerting 168
Detecting Congestion Using Remote Monitoring Platforms 177
Detecting Congestion Due to Slow Drain and Overutilization 192
Slow Drain and Overutilization at the Same Time 194
Detecting Congestion on long-distance links 195
Summary 195
References 196
Chapter 4 Troubleshooting Congestion in Fibre Channel Fabrics 199
Troubleshooting Methodology and Workflow 199
Hints and Tips for Troubleshooting Congestion 214
Cisco MDS NX-OS Commands for Troubleshooting Congestion 219
Case Study 1: Finding Congestion Culprits and Victims in a Single-Switch Fabric 242
Case Study 2: Credit Loss Recovery Causing Frame Drops 271
Case Study 3: Overutilization on a Single Device Causing Massive Congestion Problems 297
Case Study 4: Long-Distance ISLs Causing Congestion 323
Summary 336
References 337
Chapter 5 Solving Congestion with Storage I/O Performance Monitoring 339
Why Monitor Storage I/O Performance? 339
How and Where to Monitor Storage I/O Performance 340
Cisco SAN Analytics Architecture 344
Understanding I/O Flows in a Storage Network 347
I/O Flow Metrics 350
I/O Operations and Network Traffic Patterns 358
Summary 379
References 379
Chapter 6 Preventing Congestion in Fibre Channel Fabrics 381
An Overview of Eliminating or Reducing Congestion 382
Link Capacity 386
Congestion Recovery by Disconnecting the Culprit Device 387
Congestion Recovery by Dropping Frames 388
Traffic Segregation 398
Congestion Prevention Using Rate Limiters on Storage Arrays 433
Congestion Prevention Using Dynamic Ingress Rate Limiting on Switches 436
Preventing Congestion by Notifying the End Devices 457
Network Design Considerations 469
Summary 475
References 476
Chapter 7 Congestion Management in Ethernet Storage Networks 479
Ethernet Flow Control 479
Understanding Congestion in Lossless Ethernet Networks 506
Detecting Congestion in Lossless Ethernet Networks 511
Troubleshooting Congestion in Lossless Ethernet Networks 534
Preventing Congestion in Lossless Ethernet Networks 547
Lossless Traffic with VXLAN 565
Summary 569
References 570
Chapter 8 Congestion Management in TCP Storage Networks 573
Understanding Congestion in TCP Storage Networks 574
Storage I/O Performance Monitoring 587
Preventing Congestion in TCP Storage Networks 597
Detecting Congestion in TCP Storage Networks 615
Troubleshooting Congestion in TCP Storage Networks 625
iSCSI and NVMe/TCP in a Lossless Network 630
iSCSI and NVMe/TCP with VXLAN 631
Fibre Channel over TCP/IP (FCIP) 631
Modified TCP Implementations 637
Summary 638
References 639
Chapter 9 Congestion Management in Cisco UCS Servers 641
Cisco UCS Architecture 641
Understanding Congestion in a UCS Domain 644
Detecting Congestion in a UCS Domain 645
The UCS Traffic Monitoring (UTM) App 648
Summary 668
References 669
9780137887231, TOC, 1/17/2024
Other Things You Might Like
- CCNP and CCIE Data Center Core DCCOR 350-601 Official Cert Guide Premium Edition and Practice Test, 2nd Edition
- Premium Edition eBook $71.99
- Cisco Cloud Infrastructure
- eBook $46.39