Groups and Policies
When an organization first starts using Workload Optimizer, the number of pending actions can be significant, especially in a large, active, or poorly optimized environment. New organizations generally take a conservative approach initially and execute actions manually, verifying as they go that the actions are improving the environment and moving them closer to the desired state. Ultimately, though, the power of Workload Optimizer is best achieved through a judicious implementation of groups and policies to simplify the operating environment and to automate actions where possible.
Groups
Workload Optimizer provides the capability of creating logical groups of resources (VMs, hosts, datastores, disk arrays, and so on) for ease of management, visibility, and automation. Groups can be either static (such as a fixed list of a named set of resources) or dynamic. Dynamic groups self-update their membership based on specific filter criteria—a query, effectively—to significantly simplify management. For example, you could create a dynamic group of VMs that belong to a specific application’s test environment, and you could further restrict membership of that group to just those running Microsoft Windows (see Figure 9-7, where .* is used as a catchall wildcard in the filter criteria).
Figure 9-7 Creating dynamic groups based on filter criteria
Generally, dynamic groups are preferred due to their self-updating nature. As new resources are provisioned or decommissioned, or as their status changes, their dynamic group membership adjusts accordingly, without any user input. This benefit is difficult to understate, especially in larger environments. You should use dynamic groups whenever possible and static groups only when necessary.
Groups can be used in numerous ways. From the search screen, you can select a given group and automatically scope it to just that group in the supply chain. This is a handy way to zoom in on a specific subset of the infrastructure in a visibility or troubleshooting scenario or to customize a given widget in a dashboard. Groups can also be used to easily narrow the scope of a given plan or placement scenario (as described in the next section).
Policies
One of the most critical benefits of the use of groups arises when they are combined with policies. In Workload Optimizer, all actions are governed by one or more policies, including default global policies, user-defined custom policies, and imported placement policies. Policies provide extremely fine-grained control over the actions and automation behavior of Workload Optimizer.
Policies fall under two main categories: placement and automation. In both cases, groups (static or dynamic) are used to limit the scope of the policy.
Placement Policies
Placement policies govern which consumers (VMs, containers, storage volumes, data-stores) can reside on which providers (VMs, physical hosts, volumes, disk arrays).
Affinity/Anti-affinity Policies
The most common use for placement policies is to create affinity or anti-affinity rules to meet business needs. For example, say that you have two dynamic groups, both owned by the testing and development team: one of VMs and another of physical hosts. To ensure that the testing and development VMs always run on testing and development hosts, you can create a placement policy that enables this constraint in Workload Optimizer, as shown in Figure 9-8.
Figure 9-8 Creating a new placement policy
The constraint you put into the policy then restricts the underlying economic decision engine that generates actions. The buying decisions that the VMs within the testing and development group make when shopping for resources are restricted to just the testing and development hosts, even if there might be other hosts that could otherwise serve those VMs. You might similarly constrain certain workloads with a specific license requirement to only run on (or never run on) a given host group that is (or isn’t) licensed for that purpose.
Merge Policies
Another placement policy type that can be especially useful is a merge policy. Such a policy logically combines two or more groups of resources such that the economic engine treats them as a single, fungible asset when making decisions.
The most common example of a merge policy is one that combines one or more VM clusters such that VMs can be moved between clusters. Traditionally, VM clusters are siloed islands of compute resources that can’t be shared. Sometimes this is done for specific business reasons, such as separating accounting for different data center tenants; in other words, sometimes the silos are built intentionally. But many times, they are unintentional: The fragmentation and subsequent underutilization of resources is merely a by-product of the artificial boundary imposed by the infrastructure and hypervisor. In such a scenario, you can create a merge policy that logically joins multiple clusters’ compute and storage resources, enabling Workload Optimizer to consider the best location for any given workload without being constrained by cluster boundaries. This ultimately leads to optimal utilization of all resources in a continued push toward the desired state.
Automation Policies
The second category of policy in Workload Optimizer is automation policies, which govern how and when Workload Optimizer generates and executes actions. Like a placement policy, an automation policy is restricted to a specific scope of resources based on groups; however, unlike placement policies, automation policies can be restricted to run at specific times with schedules. Global default policies govern any resources that aren’t otherwise governed by another policy. It is therefore important to use extra caution when modifying a global default policy as any changes can be far-reaching.
Automation policies provide great control—either broad or extremely finely grained control—over the behavior of the decision engine and how actions are executed. For example, it’s common for organizations to enable nondisruptive VM resize-up actions for CPU and memory (for hypervisors that support such actions), but some organizations wish to further restrict these actions to specific groups of VMs (for example, testing and development only and not production), or to occur during certain pre-approved change windows, or to control the growth increment. Most critically, automation policies enable supported actions to be executed automatically by Intersight, eliminating the need for human intervention.
Best Practices
When implementing placement and automation policies, a crawl–walk–run approach is advisable.
Crawl
Crawling involves creating the necessary groups for a given policy, creating a policy scoped to those groups, and setting the policy’s action to manual so that actions are generated but not automatically executed.
This method provides administrators with the ability to double-check the group membership and manually validate that the actions are being generated as expected for only the desired groups of resources. Any needed adjustments can be made before manually executing the actions and validating that they do indeed move the environment closer to the desired state.
Walk
Walking involves changing an automation policy’s execution mechanism to automatic for relatively low-risk actions. The most common of these actions are VM and datastore placements, nondisruptive upsizing of datastores and volumes, and nondisruptive VM resize-up actions for CPU and memory.
Modern hypervisors and storage arrays can handle these actions with little to no impact on the running workloads, and automating them generally provides the greatest bang for the buck for most environments. More conservative organizations may want to begin automating a lower-priority subset of their resources (such as testing and development systems), as defined by groups. Combining these “walk” actions with a merge policy to join multiple clusters provides even more opportunity for optimization in a reasonably safe manner.
Run
Finally, running typically involves more complex policy interactions, such as schedule implementations, before- and after-action orchestration steps, and rollout of automation across the organization, including production environments.
During the run stage, it is critical to have well-defined groups that restrict unwanted actions. Many off-the-shelf applications such as SAP have extremely specific resource requirements that must be met to receive full vendor support. In such cases, organizations typically create a group, specific to an application, and add a policy for that group that disables all action generation for it, effectively telling Workload Optimizer to ignore the application. This can also be done for custom applications for which the development teams have similarly stringent resource requirements.