The Network Operator's Arsenal: Management Tools
We conclude this chapter by taking a look at some of the tools that assist people who manage networks for a living—people like Pat, Chris, and Sandy. Ultimately, it is the goal of network management technology to provide tools that make people efficient. Having an impression of what such tools can do provides a helpful context for material covered in later chapters.
We start with simple and relatively basic tools and move progressively toward tools of greater complexity, concluding with tools that are typically found only in large-scale network operations.
The list is by no means complete but covers many of the most important tool categories. It illustrates the kaleidoscope of different functionality that is available to network providers. Perhaps it also explains why it is not uncommon to find literally hundreds of different management applications at large service providers. Don't worry, though. Many environments use far fewer applications, as with enterprise IT departments of medium-size businesses like the one encountered in the example with Chris. In addition, although the breadth of tools and functions might seem overwhelming at first, in later chapters we discuss how to bring order to all of this. For example, in Chapters 4, "The Dimensions of Management," and 5, "Management Functions and Reference Models: Getting Organized," we discuss systemic ways of categorizing and organizing management functionality; Chapter 10, "Management Integration: Putting the Pieces Together," picks up on the challenge of how to integrate different tools into one operational support environment.
Device Managers and Craft Terminals
Craft terminals, sometimes also referred to as device managers (not to be confused with element managers, discussed shortly), provide a user-friendly way for humans to interact with individual network equipment. Craft terminals are used to log into equipment one device at a time, view its current status, view and possibly change its configuration settings, and trigger the equipment to execute certain actions, such as performing diagnostic self-tests and downloading new software images. Frequently, craft terminals provide a graphical view of the equipment that shows the physical configuration of the equipment with its different cards and ports, viewed from both the front and the back sides. Figure 2-8 shows an example of such a view. The view might even be animated to show which LEDs will be currently lit or blinking, depending on the device's status.
Figure 2-8 Sample Screen of a Device Manager Offering a Graphical Device (Chassis) View (CiscoView for Catalyst 6500)
Contrary to most other management tools, craft terminals generally do not retain any information about the managed equipment in a database, nor do they offer electronic interfaces to other management applications. All they provide is a remote real-time view of the equipment you want to look at, one at a time. In some cases, managed equipment might already provide a "built-in" craft interface, for example, by way of a mini-web server that renders a device view. In this case, separate craft terminal software is not needed because all that a user needs to do is point a web browser at the device.
Craft terminals are often used by field technicians, who might have craft terminal software loaded onto their notebook computers with which they connect to the device that needs to be managed through a universal serial bus (USB) or serial interface, much as you find on most PCs. In general, craft terminal functionality can also be launched from other management applications, such as from management platforms (see the section, "Management Platforms"), to provide a remote graphical view of the managed device. This was also the case in the earlier scenario, when Chris was using the function of a craft terminal that he launched from a management platform to take a look at the router at the edge of the branch that was having a performance problem.
Network Analyzers
Network analyzers come under many different names, including packet sniffers, packet analyzers, and traffic analyzers. They are used to view and analyze current traffic on a network, generally to understand the way in which the network is behaving and to diagnose and troubleshoot particular problems. Network analyzers capture or "sniff" packets that flow over a port of a network device, such as a router or switch, and present them in a human-readable format that an experienced network operator can interpret. In the earlier example, Chris was planning to use a network analyzer to analyze the type of traffic that occurred during times when the network performance problem was observed.
Element Managers
Element managers are systems that are used to manage equipment in a network. Typically, element managers are designed for equipment of a specific type and of a particular vendor; in fact, they are often provided by an equipment vendor. Element managers are similar to craft terminals, in that they allow operators to access devices to view their status and configuration, and possibly modify their parameter settings. The functions of element managers, however, far exceed the functions of craft terminals. For example, element managers typically include a database in which they retain information about all the various devices (at least, for those that are supported) in the network.
This enables users to view how devices are configured without the devices themselves needing to be repeatedly queried. More important, it enables users to back up and archive how devices are configured, to restore device configurations if that ever is required, and to manage the distribution of software images to the devices. In addition, element managers can receive event messages from the devices, which enables users to monitor the various pieces of equipment across the entire network, not just one device at a time. Element managers might also be able to automatically discover equipment that is deployed on the network. The tool that Chris used in the earlier example to manage the IP PBX was an element manager.
Element managers also often offer an electronic interface to other applications. This allows other applications to manage the equipment through the element manager instead of having to interface to the equipment directly. This can have important advantages:
- Less possibility exists that data about the network will run out of synch between different applications. The element manager not only serves as an authoritative data store about the device, but also coordinates management requests that applications might issue concurrently.
- The interface that the element manager offers might be easier to use and, hence, build to than the interfaces offered by the devices themselves. The element manager can also shield applications from minor variations in device interfaces.
- The management load on the managed equipment is reduced. Not only can the element manager coordinate requests that are received from other management applications, but, in many cases, it can respond to requests by providing information about the device from its own database instead of needing to talk to the device.
Management Platforms
Management platforms are general-purpose management applications that are used to manage networks. The functionality of management platforms is generally comparable to that of element managers. However, management platforms are typically designed to be vendor independent, offering device support for equipment of multiple vendors. Typically, the primary task of a management platform is to monitor the network to make sure it is functioning properly. Therefore, it was also the main tool that Chris used in the earlier example. Management platforms are often accompanied by development toolkits. Those toolkits enable users, systems integrators, and third-party management application developers to adapt and extend the management platform. Its functionality can be customized and adapted to different environments, it can be extended with new capabilities, and it can be integrated with additional management applications whose functionality is made accessible through the management platform.
These capabilities can make management platforms resemble a sort of "operating system" for management applications. Indeed, in some ways, analogies between a management platform and a PC operating system can be drawn.
For example, the PC operating system includes basic functionality such as a file explorer and Internet browser, and might come bundled with a basic word-processing program and spreadsheet, with an abundance of additional applications available that run on top of it and make the PC operating system more useful. Those applications leverage certain operating system infrastructure, such as the file system. The management platform, on the other hand, provides out-of-the-box support for basic management needs such as network monitoring and discovery, with additional add-on applications available to cater to more advanced needs. Those applications use management platform infrastructure. An example are functions that allow applications to communicate with network devices, as well as functions that keep an inventory of the equipment in the network and that cache their configuration in an internal database. Also, where a PC operating system offers plug-in support for additional device drivers, management platforms need to support similar capabilities to support additional networking equipment.
Collectors and Probes
Collectors and probes are auxiliary systems that offload applications from simple functions.
Collectors are used to gather and store different types of data from the network. An example is Netflow collectors, which collect data about traffic that traverses a router. Such data can be generated by routers in high volumes and is commonly represented in a format known as Netflow. Another example is loggers, which collect so-called syslog messages from network equipment that provides a trail of the processing and activities that occur at a router.
Probes are similar to collectors but are "active," in the sense that they trigger certain activities in the network and collect the responses—for example, they perform periodic tests. In the earlier scenario, Chris used a probe to take periodic measurements of the network response time over a certain link.
In each case, the data that is collected is made available to other applications, such as a management platform.
Intrusion Detection Systems
Intrusion detection systems (IDSs) help network providers to detect suspicious communication patterns on the network that might be indicative of an ongoing attack. Attacks include attempted break-ins into routers or, much more common, into servers, and denial-of-service (DoS) attacks that could be caused by Internet worms designed to overload and, hence, effectively shut down a service. IDSs use a wide variety of techniques, including analyzing traffic on the network, listening to alarms, inspecting activity logs, and observing load patterns. IDSs help operators quickly recognize such threats and mitigate their effects—for example, by shutting off network ports through which attacks occur.
Performance Analysis Systems
Performance analysis systems enable users to analyze traffic and performance data, with the goal of recognizing trends and patterns in that traffic. They have to deal with massive amounts of data that has been collected over long periods of time; hence, they frequently involve data mining (techniques to recognize common patterns in large amounts of data), as well as advanced visualization techniques to display data in the form of graphical patterns that make sense to a user. Users such as Sandy from our earlier scenario use this information for a variety of activities, such as for network planning. Sandy can use information she gathers from a performance analysis system to anticipate where additional capacity will be needed in the near future and to tune data center performance based on an analysis of bottlenecks. Information gathered from performance analysis might even be helpful for tasks such as the development of pricing structures that will encourage communication behavior that helps "even out" the communications load on the network. Recognizing which services lead to a disproportionate load and frequently cause congestion in portions of the network might cause a service provider to charge extra for them.
Alarm Management Systems
Alarm management systems are specialized in collecting and monitoring alarms from the network. They help users to quickly sift through and make sense of the volumes of event and alarm messages that are received from the network. Often alarm management systems have additional capabilities to group ("correlate") alarms that are likely to belong together, to offer initial diagnoses for the root cause of an alarm, or to provide impact analysis to forecast the fallout that an alarm might have. Sometimes, based on their analysis, alarm management systems generate additional synthetic event messages that aggregate and interpret the findings from a set of raw alarms. In many cases, alarm management systems also serve as preprocessors for other management applications, such as trouble ticket systems, like the one that we encountered in the earlier scenario with Pat.
Of course, other tools, such as management platforms and element managers, already include a certain degree of alarm management functionality. Dedicated alarm management applications, however, generally offer functionality that is more sophisticated and goes above and beyond what more general-purpose applications are offering.
Figure 2-9 shows a view of a screen of a typical alarm management application, displaying a list of alarm messages that can be expanded or searched and filtered for various purposes. For each alarm message, the screen shows a brief summary of what the message is about, along with information on which device it originated from, what category of alarm it belongs to, when it occurred, and (through its color coding) how severe the condition is.
Figure 2-9 Sample Screen of an Alarm Management Application (Cisco Info Center)
Trouble Ticket Systems
Trouble ticket systems are used to track how problems in a network (such as those that are indicated by alarms) are being resolved. Note that this is different from managing the alarms themselves. Trouble ticket systems are used to capture information about problems that were observed in the network and to track the resolution of those problems. In many cases, trouble tickets are generated by users of the network who experience a problem, although they might also be created proactively by an application that monitors the network and detects a problem.
A trouble ticket system supports the resolution of problems in many ways. For example, the trouble ticket system can automatically assign trouble tickets to a ticket owner who has to take responsibility, or it can automatically escalate tickets that take too long to resolve. The trouble ticket system can also report statistics about the resolution process and generally ensures that problems are followed up on. Of course, the scenario that featured Pat was centered heavily on the use of a trouble ticket system.
Work Order Systems
Work order systems are used to assign and track individual maintenance jobs in a network. They also help organize and manage the workforce that carries them out. For each job, a work order is assigned whose resolution is then tracked. Similar to trouble ticket systems, work order systems offer a myriad of functions to capture information about jobs, to manage the assignment of jobs to a work force, to make sure those jobs are properly taken care of, and, in general, to track what the work force that is maintaining the networking infrastructure is doing. We encountered a work order system in conjunction with the scenario of Pat when someone needed to be dispatched to replace a piece of faulty equipment.
Workflow Management Systems and Workflow Engines
A workflow management system helps manage the execution of workflows. A workflow is basically a predefined process or procedure that consists of multiple steps that can involve different owners and organizations. Workflow management systems pertain to business processes in general and are not specific to network management. However, they can be applied to network management when the processes and workflows in question involve the running of a network.
A workflow management system helps keep track of the steps in a workflow and ensures that predefined procedures are followed and policies are enforced. Workflows are usually defined using a concept called finite state machines. Each step along the way constitutes a state, and transitions between states occur according to well-defined interfaces and when well-defined events occur. The individual tasks are then pushed through these finite state machines as applicable, managed through the core of the workflow management system, the so-called workflow engine.
Both trouble ticket and work order systems can, in fact, be considered specialized examples of workflows. However, a workflow management system is more general in nature and highly customizable, to allow for the incorporation of any type of workflow.
Inventory Systems
Inventory systems are used to track the assets of a network provider. They come in two flavors:
- Network inventory systems track physical inventory in a network, mainly the equipment that is deployed, but sometimes also spare parts. Inventory information includes the type of equipment, the software version that is installed on it, cards within the equipment, the location of the equipment, and so forth. We encountered a network inventory system in the scenario involving Pat when the network technicians replaced a part in the network and the work order system automatically updated the network inventory accordingly.
- Service inventory systems track the instances of services that have been deployed over the network and that can be traced to individual users and end customers. For example, this could be DSL and phone services for residential customers of a telecommunications service provider. They might also include information on which network equipment and which ports are used to physically realize the service. Knowing this information makes it easier to assess who will be affected in case maintenance operations need to be performed or in case of a network failure.
In addition to inventory systems, facility management systems are used to document and keep track of the physical cable and ducts in buildings that are used to interconnect networking equipment.
Service Provisioning Systems
Service provisioning systems facilitate the deployment of services over a network, such as Digital Subscriber Line (DSL) or telephone service for residential customers of large service providers. Service provisioning systems translate requests to turn on or to remove a service into a series of steps and configurations that are then driven into the network.
Service provisioning systems are typically very complex applications that can be found only in operational support environments of large service providers; we did not encounter any in our earlier scenarios. They allow service providers to roll out services on a very large scale, often at a rate of tens of thousands of service requests per day. In many cases, service provisioning systems do not even interact with human operators, except possibly in case of exceptions that require human intervention. For this reason, perhaps surprisingly, they often offer no graphical user interface (GUI) or only a very rudimentary one. Instead, requests are issued from another system, for example from a service order management system via an electronic application programming interface (API). Such an interface allows another system to automatically interact with the system without user involvement, for example to request a piece of information or to hand off a request. For example, a service order management system (which we encounter in the section that follows) might use the API to automatically dispatch a request for provisioning a service to the service provisioning system when the order for the services becomes due.
Service Order–Management Systems
Service order–management systems are used to manage orders for services by customers of large service providers. (As with service provisioning systems, such systems are generally not encountered in enterprise environments.) They are part of a larger category of systems that deal with customer relationship management (CRM), which, for example, also includes help-desk functions.
Managing service orders involves a set of specialized workflows, similar to managing work orders or trouble tickets. Service order management systems help service providers track and fulfill orders for services and automate many, if not most, steps along the way. This includes identifying needed equipment, locating required ports, performing customer credit checks, scheduling the fulfillment of service orders, and eventually dispatching requests to turn up services to a service provisioning system.
Note the distinction between service order management systems and service provisioning systems. The former help manage workflows and processes of an organization. The latter are applications that interact with a network to configure it in a certain way. Compare this to the earlier distinction between trouble ticket systems and alarm management systems.
Billing Systems
Last in our list, but not least, are billing systems. We did not discuss billing systems in any of our earlier scenarios, but we should not lose sight of the reason many network providers (service providers, in particular, not enterprise IT departments) are in the business of running networks in the first place: to make money. Billing systems are essential to the realization of revenues. They analyze accounting and usage data to identify which communication services were provided to whom at what time. Subsequently, a tariffing scheme that defines how services need to be charged for is applied to that data to generate a bill.
Many other functions than billing systems themselves are associated with billing. For example, fraud detection systems help detect suspicious patterns in activity that could indicate that services are being stolen. Billing systems might also need to interface with other systems that are used for customer relationship management so that, for example, customer databases can be updated with information on which customers are past due.