The HyperFlex HX Data Platform disaster recovery feature allows you to protect virtual machines (VMs) by setting up replication so that protected virtual machines running on one cluster replicate to the other cluster in a pair of network-connected clusters and vice versa. The two paired clusters typically are located at a distance from each other, with each cluster serving as the disaster recovery site for virtual machines running on the other cluster. Once protection has been set up on a VM, the HX Data Platform periodically takes a replication snapshot of the running VM on the local cluster and replicates (copies) the snapshot to the paired remote cluster. In the event of a disaster at the local cluster, you can use the most recently replicated snapshot of each protected VM to recover and run the VM at the remote cluster. Each cluster that serves as a disaster recovery site for another cluster must be sized with adequate spare resources so that, in the event of a disaster, it can run the newly recovered virtual machines in addition to its normal workload.
This chapter describes the HyperFlex HX Data Platform disaster recovery feature and describes the configuration steps needed to enable replication between two HyperFlex clusters. It also covers the available backup solutions that can be integrated with HyperFlex HX Data Platform.
Data Protection
There are several schools of thought on data protection. Some people believe that high availability and durability are part of data protection. Some say that stretch clusters are also part of data protection. However, two very basic parameters or variables allow you to determine the data protection solution you should use: recovery time objective (RTO) and recovery point objective (RPO).
RTO essentially refers to how much time it takes for a service or a virtual machine to come up after a disaster or failure has occurred. RPO indicates how much data loss someone is ready to bear while waiting for services to come up.
HyperFlex offers the following protection options, listed here from low RTO/RPO level to high RTP/RPO level:
Local resiliency (high availability and durability):
Two (RF-2) or three (RF-3) copies of VM data
Data stripped and distributed across all local nodes
Redundant network paths
An HA-aware hypervisor
Zero RPO and zero or very low RTO
Site-level resiliency (stretch clusters):
Four copies (RF 2+2) of VM data
Protection against local failures and site failures
Protection against “split brains”
VM data mirrored across sites
An HA-aware hypervisor
Zero RPO and zero or very low RTO
Snapshots (VM-centric snapshots):
VM-centric instant, space optimized
Redirect-on-write snapshots
Scheduled with a retention policy
Quiesced and crash consistent
Rapid provisioning using ReadyClones
“Now”/hourly/daily/weekly RPO and RTO in minutes
Replication and disaster recovery (VM-centric replication and disaster recovery):
VM-centric replication
Periodic asynchronous replication to remote site (WAN distance)
Snapshot based
Failover, fast failback, and test recovery
Minutes/hourly/daily RPO and RTO in minutes
Backup and Archive (third-party backup vendor integration):
Fully verified Cisco Validated Design (CVD) on UCS infrastructure
Integrated with HyperFlex native snapshots
Accelerated transfers and low backup window
Hourly/daily RPO and RTO in minutes/hours