SRM 6 and vSphere 6 Storage DRS (SDRS) Improvements for Array Based Replication

With the announcement of vSphere 6 one of the touted features was improved integration between Site Recovery Manager 6 and vSphere 6′s Storage DRS feature when using array based replication. Since SRM 5.5 and vSphere 5.5 the two capabilities have been supported together with some caveats. With the newly announced release the integration between the two will be much simpler. This post will focus on array based replication but as noted in Cormac’s post the SDRS and vSphere Replication integration has also improved.

Okay, Tell Me More About Those Array Based Replication Caveats

Since SRM 5.5 you have been able to use Storage vMotion to move protected VMs between datastores belonging to the same consistency group. However there was nothing to warn users when they were migrating a VM outside of the consistency group (or even onto non-replicated datastores) or prevent Storage DRS doing the same.

To workaround this there are two key items customers have been doing:

So What Changes in vSphere 6 and SRM 6?

The SRM and Storage DRS teams have worked together to make SDRS aware of which datastores are being replicated and the consistency group membership. When SRM discovers replicated devices it associates information to the appropriate datastores in vSphere (e.g. that it is replicated and the consistency group), this information is then used by SDRS in deciding which automatic moves it can make, SDRS will not perform any automatic migrations that would impair the recoverability of a VM with SRM.

With this change you don’t have to be as fine-grained with your storage clusters, or alternatively as coarse in your failover granularity. Now you can mix non-replicated and replicated datastores in the same cluster, and even replicated datastores belonging to different consistency groups.

This reduces the operational burden of working with SRM and SDRS and provides much more flexibility in how you choose to organize your datastores and datastore clusters.

srm-sdrs

 

VMware SRM Topologies

One of the topics that people often ask me about is what datacenter topologies VMware SRM supports. The good news is that SRM does have flexible capabilities for a number of topologies to help support more complex use cases beyond the typical two site deployment.

Shared Recovery Site (VC at Remote Sites)

The shared recovery site use case is one of the more commonly seen topologies outside of a two datacenter deployment. This is a use case that has been supported by SRM for multiple releases and provides a good option for customers that are looking to protect virtual machines at remote offices that have their own vCenter instances.

robo-remote-vc

With this use case the customer is able to share the resources used for recovery amongst several remote offices. This provides flexibility in providing the resources required at the recovery site. A conservative approach would provide sufficient capacity to failover all remote sites to the shared recovery site. An alternative approach would be to oversubscribe the recovery site resources on the assumption that you wouldn’t need to recover all the remote sites at once. This second approach obviously has some benefits in terms of the potential for reduced capital expense at the recovery site but may add risk if there is a chance that all the remote sites may actually need to fail over at the same time.

One of the advantages of this topology, with remote vCenters at each site, is that local IT retain full manageability of the infrastructure. Even in the event of a network outage between the central data center and the remote site the hosts at the remote site remain accessible. In contrast one of the disadvantages of this approach is that each SRM pair must be at the same major and minor version and each vCenter server must be a supported version for the SRM server joined to it. This means that upgrades of SRM or vCenter across the set of data centers should be carefully planned and coordinated.

Shared Recovery Site (Central VC)

While the above topology is a common approach to providing shared recovery site capabilities it is not the only approach to achieving this. If there isn’t a need for each remote site to have its own vCenter instance then a simple single-pair SRM deployment could be adapted to meet the use case.

robo-central-vc

For this deployment you would deploy two vCenter instances, one to manage the resources at the recovery site and one to manage the resources at the remote site. An instance of SRM for each vCenter would be deployed and the two paired together. You could then create recovery plans for each remote site but have a single shared vCenter to consolidate management of the remote sites.

This approach has the advantage that upgrades of the vCenter and SRM instances would be somewhat simpler than the shared recovery site (fewer moving parts). Additionally you would be able to have different levels of recovery plan in the same SRM deployment. For example you could define a recovery plan for each remote office but also recovery plans for each region, grouping the remote offices together. The disadvantage of this approach is that the remote infrastructure couldn’t be managed by vCenter when you lose connectivity.

Additional Multi-Pair Topologies

In addition to the shared recovery site topologies defined above I am often asked about support for other multi-datacenter topologies. Prior to the release of SRM 5.8 VMware supported some additional topologies using the Request for Product Qualification (RPQ) process.

While the shared recovery topology is still the main use case for multi-pair topologies, the technology provided by SRM underneath actually supports more flexible configurations. The SRM documentation has recently been updated to remove some of the previous limitations and allow some more complex topologies to be supported based on the concept of multiple SRM pairs. From the SRM 5.8 Installation and Configuration Guide:

“In addition to the shared recovery site configuration, Site Recovery Manager also allows and supports shared protected site (1:N) and many-to-many (N:N) configurations.”

It is also supported to begin with a standard two site SRM deployment and later on add additional site pairings to add in more complex topologies.

With the new flexibility a new set of potential use cases are now supported. For example a three site topology allowing a round robin like protection use case would be possible:

  • Site A’s workloads being protected to Site B
  • Site B’s workloads being protected to Site C
  • Site C’s workloads finally being protected back to Site A

other-srm-1

Another topology that might be of interest would be to have a traditional SRM pairing between two core data centers while also providing DR protection for remote offices to one (or even both) of the core data centers.other-srm-2

The examples presented above are not intended to be exhaustive. As long as you don’t exceed the configuration maximums (use 10 or fewer SRM pairings) there are a wide variety of topologies that are enabled.

SRM 5.8 Improvements In Managing Multiple SRM Pairs

With the introduction of the web client in the SRM 5.8 there have been several improvements in managing more complex topologies with SRM. The first improvement has been with the increase in supported SRM scale up to 5,000 virtual machines from 1,000 in previous releases. This makes it easier to consider deploying SRM across multiple pairs of data centers.

One of the big user interface improvements is gained by integrating with the inventory lists of the web client. It is now easy to quickly switch between multiple site pairings.

srm-site-summary-screen

The lists of protection groups and recovery plans is also consolidated so you can see plans belonging to multiple site pairings at once.

multi-site-protection-groups

In addition to the consolidated views the various wizards to create new protection groups and recovery plans (for example) let users choose the target SRM pairing during creation.

Finally the SRM installer has also been enhanced to make it easier to deploy multiple pairs of SRM servers without having to specify command line parameters to the installer. For the initial SRM pairing you would use the default identifier, for subsequent pairings each SRM instance in the pair should be installed using the same customer Plug-in Identifier (and each pairing should have a distinct identifier).

srm-multi-install

Replication Topology Support

The SRM 5.8 Installation and Configuration guide states:

Site Recovery Manager supports point-to-point replication. Site Recovery Manager does not support replication to multiple targets, even in a multi-site configuration.

While SRM pairs do allow you to failover distinct workloads to different recovery sites, SRM doesn’t currently support orchestrating the failover of the same workload to different recovery sites. This means that SRM only supports managing a replicated datastore with one SRM pairing.

Conclusion

Hopefully this article has outlined some of the newly supported capabilities available to SRM users. Planning multi-site disaster recovery strategies takes careful thought as to what you are looking to achieve but the new capabilities make it even easier to address a variety of recovery requirements.

Other Resources

If you’re interested in learning more about SRM 5.8 then I’d recommend the official documentationproduct resource page, the uptime blog, and Eric Shanks’ guide to SRM 5.8.

Using PowerCLI to Change SRM Recovery Settings

One of the updated capabilities in the new Site Recovery Manager 5.8 release is the ability to change some recovery settings like the recovery priority and script callouts via the SRM API. With the release of PowerCLI 5.8 R1 these capabilities are also exposed for scripting via the SRM API!

To make it easier to script against the API in PowerCLI I’ve been working on some helper SRM functions and have updated them to support some of the new capabilities.

Here’s a short example using PowerCLI to update the recovery priority of a VM and add a new post recovery call-out.

And here’s some code to perform something similar to what was done in the video using the custom SRM functions [at 2014-11-17].


# Load the custom functions
. ./SrmFunctions.ps1
. ./Examples/ReportConfiguration.ps1

# Connect to protected site VC & SRM
$creds = Get-Credential
$vca = Connect-VIServer vc-w12-01a.corp.local -Credential $creds
$srma = Connect-SrmServer -Server $vca -Credential $creds -RemoteCredential $creds

# Output Current SRM Configuration Report
Get-SrmConfigReport

# get recovery plan
$rp = Get-RecoveryPlan "Anaheim"

# get protected VM
$pvm = $rp | Get-ProtectedVM | Select -First 1

# view recovery settings
$rs = $pvm | Get-RecoverySettings -RecoveryPlan $rp

# update recovery priority
$rs.RecoveryPriority = "highest"

# create new command callout
$srmCmd = New-SrmCommand -Command '/bin/bash /root/failover.sh' -Description 'Run standard linux failover script' -RunInRecoveredVm

# add command as post recovery command callout
Add-PostRecoverySrmCommand -RecoverySettings $rs -SrmCommand $srmCmd

# update the recovery settings on the SRM server
Set-RecoverySettings -ProtectedVm $pvm -RecoveryPlan $rp -RecoverySettings $rs

# validate recovery settings (view in report)
Get-SrmConfigReportProtectedVm

Site Recovery Manager (SRM) 5.8 is Available!

I am pleased to say that Site Recovery Manager 5.8 is now available!

This was a really exciting release to be working on as there are a ton of new features, including (but not limited to):

  • Integration into the vSphere Web Client
  • Protect up to 5,000 VMs and concurrently recover 2,000 VMs (compared to limits of 1,000 VMs at GA of SRM 5.5) [KB 2081158]
  • New integration options using vCenter Orchestrator plug-in for Site Recovery Manager enabling new capabilities including:
    • Provision VMs with vCloud Automation Center and DR protect automatically with SRM
    • Create SRM protection groups
    • Create inventory mappings
    • and more!
  • Dramatically simpler IP customization at scale with subnet-level IP customization rules
  • Performance improvements to storage failover operations, up to 75% faster in larger environments
  • Streamlined deployment using new support for vPostgreSQL that provides optional embedded database option during SRM installation.
  • Improved support for Integrated Windows Authentication when using remote Microsoft SQL Server instances as the Site Recovery Manager database
  • Compatibility with vSphere Storage I/O Control

As you can see from the list of changes there has been a lot of hard work put into this release by many people!

If you are wondering SRM 5.8 supports the same set of SRAs as SRM 5.5 so no need to wait for an update from your storage vendor. Check out the drivers and tools tabs for the SRAs and the vCenter Orchestrator plugin.

VMworld 2014

Well VMworld 2014 has arrived. There were some really interesting announcements (looking forward to the demo for Project Fargo). This is my third VMworld and as usual it will be filled with customer and partner meetings as well as whatever sessions I can squeeze in. I know I could always catch the session recordings after the conference but I do like the live participation.

This year I will also be presenting on Site Recovery Manager and vCloud Automation Center integration. My sessions on Tuesday and Thursday (register online or see tweets about the session #BCO1893) and there’s currently over 800 people registered to attend so I expect it will be fairly busy.