Decisions and Investments
Let’s say that you love the use cases of NetDevOps because they resonate with your current challenges. So now you ask yourself, “How do I start, and do I buy something?”
In order to adopt NetDevOps, or any other technology, you will have to make several decisions and possibly some investments. This section covers the four main verticals you should consider:
Starting point
New skillsets
New tools
Organizational changes
These might seem like a lot of investments; however, considering the benefits, they are worth it. NetDevOps has some initial investments that decline over time, while its benefits grow over time, as shown in Figure 2-13.
FIGURE 2.13 NetDevOps Investments Versus Benefits Chart
Because this is a fairly new field compared to others in networking, it is hard to find trustworthy resources about it. The four main verticals described in this chapter are derived from the authors’ own experience in the field for the last five years working with NetDevOps. They are not industry standards.
Where to Start
When you start something new, you must begin somewhere. For example, when you are learning a new technology, you can start by reading a book or watching a video online. For some things, where you start does not matter because you’ll ultimately reach the same destination; however, when it comes to adoption of NetDevOps practices, choosing the place to start is very important.
So where should you start? There is no silver bullet or a single place where all organizations should start; rather, each organization should undergo an analysis to decide what is best for its situation. This preliminary analysis should evaluate roughly where the organization is in terms of the following:
Challenges/pain points
Skills
Technology stack
Why these three? There are other verticals you can consider, but evaluating these three typically results in a good starting point. Besides, the state of these three verticals is often well known to organizations, making this initial analysis cheap and fast. You do not need to produce a formal analysis with documentation, although you can do that if you wish. The result of this analysis should be an understanding of where you are in regard to these three topics.
After you have the understanding of where you are, either documented or not, you should add more weight to the first vertical, challenges/pain points. You should start your journey with use cases in mind. Do not try and embark on the NetDevOps journey because of trends or buzzwords. Solving the challenges you have identified that are affecting your organization is the priority.
Prioritize the identified challenges based on their importance for your business but at the same time measure the complexity of each challenge. The result should be an ordered list. This balance between complexity and benefit is sometimes hard to understand, so use your best judgment because this is not an exact science.
So far, you have not factored in the skills and the technology stack verticals from the analysis. This is where they come in. From the ordered list of challenges, add which technologies are involved from the technology stack and what skills would be required to solve them. Some of those skills you might already have, while others you might not. The same goes for the technologies.
Skills come in second in our three verticals. Although the next section focuses solely on skills and how they influence your NetDevOps journey, they also play a role in defining a starting point. Prioritize use cases that you already have the skillsets to implement. Technology comes next, because it is easier to pick up a new technology than a new skillset. However, this does not mean that adopting a new technology is easy, because it is not, and that is why we include it as our third factor.
For the technology stack, there will be many different nuances, and some use cases will not require all NetDevOps components to solve. For example, if your challenge is that people can make modifications to your network device configurations that go unnoticed, creating snowflake networks, and you need a way of maintaining a single source of truth, the only component you need is a version control system repository. Similarly, if your challenge is lack of speed and error-prone copy/paste configuration activities, you might just need to apply automation instead of also using CI/CD pipelines.
Understanding the minimum number of NetDevOps components you will need to adopt makes the journey to success shorter, which leads us to the highest contributing factor to the success rate of NetDevOps adoption: the ability to show successes at the early stages of adoption. Do not underestimate this. It not only motivates the teams involved, but it is a great way to show stakeholders their investments are paying off. Experiencing failures early or going for a long time without anything to show for it is the downfall of many adoption journeys.
However, you cannot really show success if you do not have success criteria. After you decide which use cases you will solve using which NetDevOps components, make sure you define what success looks like. Following up the previous example of snowflake networks and configurations changes that go unnoticed, the success criteria could be to have 80% of devices in the same functions with the same configuration, and to measure this you would audit the network. For the second example, in the slow and error-prone configuration changes environment, you could measure your current estimated time to implement a new configuration and the number of minutes of downtime caused by changes. Then you could define a success criterion of lowering this number by 20%.
Having specific success criteria allows you to show progress and improvement; however, it can also show that you are not actually solving your initial use case. This can be equally beneficial because it enables you to adjust your initial plan. In other words, failing quickly is less expensive.
To summarize, start with understanding where you are right now in terms of challenges, skills, and technology. Make identifying challenges the number-one priority because you want to ensure you are solving something relevant for your organization. Next, prioritize your challenges based on the skills you already have while minimizing the amount of NetDevOps components involved, prioritizing the ones you already have. Before you implement your NetDevOps strategy, make sure you have clearly defined success criteria and plan to show milestone successes early.
Skills
Skills are an influential factor in NetDevOps. In Chapter 1, you learned that many components of NetDevOps are not traditional networking components, and although some folks take them for granted, automation, programming, and orchestration are not an evolution of networking; rather, they are a horizontal domain of knowledge.
Most organizations have network engineers equipped with the traditional skillset of “hardcore” networking. This includes routing protocols, switching configurations, networking security such as access control lists (ACLs) or Control Plane Policing (CoPP), and all the rest. But in the same way that software developers do not know what the Border Gateway Protocol (BGP) is, traditional network engineers do not know what Jenkins or Groovy is.
The profile of a network engineer is evolving, and nowadays we’re starting to see more and more a mix of networking, infrastructure as code (IaC), and orchestration knowledge. However, that might not be the case at your organization. If it is not, there are two schools of thought: upskilling/training and hiring.
You can choose to train your engineers in the skills you have identified to be missing from your “where to start” analysis, or you can hire folks who already have those skills. One option is not better than the other; each organization must make its own decision.
If you choose to train your engineers, you must take into consideration that, as previously mentioned, some of these NetDevOps skills are not a natural evolution of networking and can require considerable effort to learn. For example, software-defined networks (SDN) can be seen as an evolution of networking, and in some way they are the next networking topology. However, writing an automation playbook in a programming language is not an evolution of writing network configurations. Although the line is becoming blurry, and some terms like “network engineer v2” and “next-generation network engineer” have started to emerge, historically speaking, networking and automation have been two different domains.
Not all skills are generic skills such as automation or networking. A skill family that often is overlooked is tool skills. An engineer proficient at automation will not be an expert in all the automation tools; for example, the engineer might have worked extensively with Ansible but never with Terraform. This is particularly important if your chosen strategy is to hire, because most of the time you do not want to upskill a new hire on a new tool if you can hire someone with tool knowledge. In terms of training, this also factors in. Training someone in a tool is easier if that person already has knowledge of the tool domain. In other words, training someone in Golang is easier if they already know how to program in Python.
Another consideration is how you want to distribute your skills in each role; the upcoming section “Organizational Changes” covers how you can distribute your skills: all in one NetDevOps team or separate automation and networking teams. The number and distribution of engineers’ skillsets differ based on each organization’s needs.
Lastly, because you are reading this book, you probably want to become a NetDevOps engineer or transform your organization into one that uses NetDevOps engineering practices; however, not every network engineer will become a NetDevOps engineer. Expert-level networking skills are still required, and many folks may not have to take part in orchestration and automation tasks. This is a common misconception.
Tooling
Tools are an important part of NetDevOps. As you have learned, tools are enablers and not actual DevOps practices; however, some folks still commonly label tools as DevOps. Nonetheless, tools will still represent a big part of your investment. Not only because of their price but also because of tool-specific skills and knowledge. After you and your organization acquire these skills, changes result in added effort and cost.
Within the NetDevOps umbrella, you can separate the tools into the following different categories:
Infrastructure as code
Continuous integration/continuous delivery (or deployment)
Source/version control
Testing
Monitoring
The following list provides examples of tools in each category. Note that this is not an exhaustive list; each category has a plethora of tools to offer.
IaC
Ansible
Terraform
Pulumi
CI/CD
Jenkins
GitHub Actions
AWS CodePipeline
Testing
EVE-NG
GNS3
Cisco Modeling Labs CML
Source Control
GitHub
GitLab
Bitbucket
Monitoring
Datadog
ELK stack
Splunk
Note that the IaC, CI/CD, source control, and testing tools were covered in Chapter 1. Monitoring is a well-known tool vertical in networking that has evolved over time. From the older SNMP pull-based monitoring to the newer push-based telemetry models, common tools to achieve this functionality are proprietary network controls such as Cisco DNA Center, SolarWinds Network Performance Monitor, Splunk, and open source solutions you can tailor to your liking, such as ELK (Elasticsearch, Logstash, Kibana) stack. Monitoring in NetDevOps also encompasses the monitoring of CI/CD pipelines and automation tasks. This is an extended scope compared to only monitoring the network.
Here is a set of characteristics you can use to select the best fit for your organization in any of these tool categories:
Cloud or on-premises
Managed or unmanaged
Open source, proprietary, or in-house
Integration ecosystem
The first characteristic is where the solution will be hosted. All the tools have to exist somewhere, and some have a bigger footprint than others (for example, a CI/CD server with many agents versus an IaC tool that only needs a single server). For the location, you have two choices: the cloud or on-premises. Some folks will argue you have three options, with the third being a co-location facility (for example, a service provider data center). However, this option is encompassed in the on-premises category in this two-location system. The cloud refers to on-demand resources accessed over the Internet. It is a huge trend right now and typically benefits from a pay-as-you-go model. Of the many cloud vendors, the most well known are Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). The benefit from using the cloud is that you do not have to manage or secure the physical infrastructure. You can just access your resources as you need them. In contrast, the on-premises option will give you more flexibility when it comes to controlling the storage of your data, and you do not have to rely on Internet connectivity.
The second characteristic is manageability. In the cloud versus on-premises characteristic, you saw that with the cloud, physical management is the responsibility of the provider. Some tools offload more than that; they offload all the management to the provider, other than the actual configuration. For example, you can have access to a working Jenkins instance, and all you need is to create your workflows. You do not need to install Jenkins, configure networking for the instance, or anything like that. For on-premises setups, this is more uncommon, although some service providers manage the installation of tools for you. In general, you should only consider this a feature for cloud-hosted tools. Examples of managed tools are Amazon Managed Grafana, versus hosting your own Grafana in a virtual machine on the cloud; Amazon OpenSearch service, versus hosting your own Elasticsearch cluster in virtual machines; Terraform Cloud, versus hosting your own Terraform environment in containers or virtual machines; and CloudBees-hosted Jenkins, versus hosting your own Jenkins in AWS.
The third characteristic is of special importance because it greatly contributes to the price. All tool categories will have open source solutions; however, these solutions will not have support other than community support. For enterprise environments, this might be a problem. Nonetheless, many organizations offer enterprise-grade support for open source tools. Therefore, don’t rule out open source tools just because they are open source. Ansible, for example, is a free and open source configuration tool, but Red Hat has enterprise support plans for it. An advantage of open source tools is the wealth of knowledge you will be able to find online versus the more exclusive proprietary tools. In contrast to open source tools, proprietary tools are solutions owned by the individual or company who published them. They are “closed” in the sense that changes must be made by the party that published the tool. However, this type of tool usually has several support offerings. This is the most widely adopted tool type in medium- and large-sized companies.
The other option is to build your own tools. Although this option is uncommon, it is not unheard of; some organizations decide to build their own in-house tools using programming languages such as Python and Java. This option requires highly specialized staff, and after the development phase, you will also need to provide support. The advantage is that the tool will have only the functionality you need and not numerous features you never use, as commonly happens with commercial off-the-shelf (COTS) tools. On top of that, if the tool needs modifications (for example, you find a buggy behavior), you are the vendor, so you can immediately apply a fix. If you are starting out in automation and orchestration, however, this option is not recommended for you.
The final characteristic is the tool ecosystem. This characteristic is sometimes undervalued, but having a tool integrate natively with the rest of your tooling can be a very big advantage (versus having to script a lot of functionality). For example, most source control tool vendors now offer CI/CD tools embedded together. This greatly simplifies integrations, and you can run your workflows directly from the source control repository. Examples of this are Gitlab CI/CD and GitHub Actions. You will also see very tight and native integrations when you are using all the tooling from the same cloud vendor, such as Azure DevOps ecosystem and Amazon Code* services.
Ensure you are prioritizing skills over tools. A characteristic that did not make it on the list is the available skills in the market. Your prioritization should start with the tool skills you have available; however, you should consider how widely adopted within the industry a tool is because this will greatly improve your ability to hire folks who are proficient with the tool or to find training materials for your own folks.
Now that you know what characteristics to look for in a tool, you need to adapt this to your organization’s needs and liking. Be aware of the tradeoffs; for example, continuing to use Ansible for provisioning while understanding Terraform might be a better solution for your use case if you already have Ansible skills and existing support from Red Hat. Again, there is no single “works every time” choice, but aligning your tool choice to your organization’s strategy helps. For example, if your organization is implementing a cloud-first strategy, cloud-hosted tools are preferred. Likewise, if your organization is not an IT-focused organization and instead focuses on a different core business, offloading the management of the tools (sometimes called “undifferentiated heavy lifting”) to focus your resources on differentiation activities is likely the right choice.
Finally, because so many tools are available right now, you need to be careful of tool sprawl, meaning having too many tools and even unused tools. It is okay to use specialized tools that are very good at a specific task or action, but it is important not to let the tools take over your organization. Likewise, retire old tools if they are no longer being used. As you have learned, NetDevOps is a set of practices, not tools, but using the right tools for the job is one of those practices.
Organizational Changes
Network operators and network architects—that is, folks who design and configure networks—are typically already on the same team or work closely aligned. In the development world, however, developers and operations were many times on completely separate teams. This is good because adopting NetDevOps has less of an impact on an organization’s structure as DevOps did back in the day in software development.
However, being on the same team does not mean you do not need any organizational changes at all. For some more traditional teams, adopting these practices might require hiring new folks, as described previously in this chapter, which is an organizational change already.
You can tackle NetDevOps in one of two ways: join automation and networking together into one team or have automation and networking in separate teams but collaborating.
If you choose to separate automation and networking in different teams, which is not recommended, your organization needs to find a way of bridging these two areas. For example, your networking folks create configuration templates per device type and platform, while your automation team creates the automation scripts and orchestration pipelines to deliver the configurations. If you chose this approach, remember that each of these teams has no idea about the others’ domain expertise and challenges, so communication and collaboration are key. Working in isolation will greatly impair their ability to deliver fast and working solutions.
If you choose to relabel your networking team as a NetDevOps team, your folks will be doing end-to-end tasks. This is the recommended approach. With this method, everyone has an understanding of the use cases and challenges, which is beneficial in the successful adoption of these practices. This is the most ambitious approach, and you may face more resistance from folks who prefer the traditional approaches to networking.
Although this might seem like a simple rebranding exercise, it is not, and it is important to have support from the organization and key management stakeholders. This support is paramount to address resistance from less-inclined engineers as well as to address and justify a potential initial loss of productivity or higher costs due to ramp-up.
Another important organizational change is to adopt open communication and encourage failing fast. When new processes, tools, skills, and technologies are being adopted, mistakes will happen and questions will arise. It is important to foster a culture of collaboration and open communication, where engineers are encouraged to present their questions and doubts while experimenting and, in some cases, failing. This is a NetDevOps principle. Although this might not seem like an organizational change, many organizations do not openly embrace open communication and failing, and they are surprised by this aspect when adopting NetDevOps.
The junction of the networking and automation domains, which likely have been working separately until now, should be reflected in your organizational structure. However, independent of your decisions regarding skills, tooling, and where to start, you should understand that it can take time to successfully reshape your organization into this new paradigm.