This chapter explains the use of storage I/O performance monitoring for handling network congestion problems.
This chapter covers the following topics:
Why Monitor Storage I/O Performance?
How and Where to Monitor Storage I/O Performance.
Cisco SAN Analytics Architecture
Understanding I/O Flows in a Storage Network
I/O Flow Metrics
I/O Operations and Network Traffic Patterns
Case studies
Why Monitor Storage I/O Performance?
Storage I/O performance monitoring provides advanced insights into network traffic, which can then be used to accurately address network congestion. This information is in addition to what the network ports already provide by counting the number of packets sent and received, the number of bytes sent and received, and link errors. In addition, storage I/O performance monitoring brings visibility to the upper layers of the stack and can explain why a network has or lacks traffic by providing the following information:
The upper-layer protocol—SCSI or NVMe—that generated the network traffic
Upper-layer protocol errors such as SCSI queue full, reservation conflict, NVMe namespace not ready, and so on
IOPS, throughput, I/O size, and so on
How long I/O operations take to complete, the delay caused by storage arrays, and the delay caused by hosts
This performance can also be monitored for every flow, giving granular insights into the traffic on a network port. This flow-level performance monitoring is extremely useful because most production environments are virtualized. When a host causes congestion due to overutilization of its link, the network can detect this condition, as explained in earlier chapters. In addition, storage I/O performance monitoring can detect the cause of the high amount of traffic and which virtual machine (VM) is asking for it.
Likewise, when a host causes congestion due to slow drain, investigating the SCSI- and NVMe-level performance and error metrics can explain why the host has become slower in processing the traffic. It is also possible to determine whether a particular VM has caused the entire host to slow down. In addition, storage I/O performance monitoring can also predict the likeliness of network congestion. These and many more benefits of storage I/O performance monitoring are explained in this chapter, and case studies are provided.
Storage I/O performance monitoring is a detailed subject. Its use cases involve application and storage performance insights, storage provisioning recommendations, infrastructure optimization, change management, audits, reporting, and so on. The scope of this book, however, is limited only to congestion use cases. We recommend continuing your education on this topic beyond this book. Refer to the References section later in this chapter.
This chapter focuses on the SCSI and NVMe protocols in the block-storage stack for performance monitoring. But these protocols initiate I/O operations only when an application wants them to read or write data. Therefore, monitoring higher layers in the stack, up to the application layer, can provide even more insights into why the network has traffic. Application-level monitoring, however—such as that provided by the Cisco AppDynamics observability platform—is beyond the scope of this book. This is another area that we recommend to continue your education outside this book.