VMware SRM Topologies

One of the topics that people often ask me about is what datacenter topologies VMware SRM supports. The good news is that SRM does have flexible capabilities for a number of topologies to help support more complex use cases beyond the typical two site deployment.

Shared Recovery Site (VC at Remote Sites)

The shared recovery site use case is one of the more commonly seen topologies outside of a two datacenter deployment. This is a use case that has been supported by SRM for multiple releases and provides a good option for customers that are looking to protect virtual machines at remote offices that have their own vCenter instances.

robo-remote-vc

With this use case the customer is able to share the resources used for recovery amongst several remote offices. This provides flexibility in providing the resources required at the recovery site. A conservative approach would provide sufficient capacity to failover all remote sites to the shared recovery site. An alternative approach would be to oversubscribe the recovery site resources on the assumption that you wouldn’t need to recover all the remote sites at once. This second approach obviously has some benefits in terms of the potential for reduced capital expense at the recovery site but may add risk if there is a chance that all the remote sites may actually need to fail over at the same time.

One of the advantages of this topology, with remote vCenters at each site, is that local IT retain full manageability of the infrastructure. Even in the event of a network outage between the central data center and the remote site the hosts at the remote site remain accessible. In contrast one of the disadvantages of this approach is that each SRM pair must be at the same major and minor version and each vCenter server must be a supported version for the SRM server joined to it. This means that upgrades of SRM or vCenter across the set of data centers should be carefully planned and coordinated.

Shared Recovery Site (Central VC)

While the above topology is a common approach to providing shared recovery site capabilities it is not the only approach to achieving this. If there isn’t a need for each remote site to have its own vCenter instance then a simple single-pair SRM deployment could be adapted to meet the use case.

robo-central-vc

For this deployment you would deploy two vCenter instances, one to manage the resources at the recovery site and one to manage the resources at the remote site. An instance of SRM for each vCenter would be deployed and the two paired together. You could then create recovery plans and for each remote site but have a single shared vCenter to consolidate management of the remote sites.

This approach has the advantage that upgrades of the vCenter and SRM instances would be somewhat simpler than the shared recovery site (fewer moving parts). Additionally you would be able to have different levels of recovery plan in the same SRM deployment. For example you could define a recovery plan for each remote office but also recovery plans for each region, grouping the remote offices together. The disadvantage of this approach is that the remote infrastructure couldn’t be managed by vCenter when you lose connectivity.

Additional Multi-Pair Topologies

In addition to the shared recovery site topologies defined above I am often asked about support for other multi-datacenter topologies. Prior to the release of SRM 5.8 VMware supported some additional topologies using the Request for Product Qualification (RPQ) process.

While the shared recovery topology is still the main use case for multi-pair topologies, the technology provided by SRM underneath actually supports more flexible configurations. The SRM documentation has recently been updated to remove some of the previous limitations and allow some more complex topologies to be supported based on the concept of multiple SRM pairs. From the SRM 5.8 Administration Guide:

“In addition to the shared recovery site configuration, Site Recovery Manager also allows and supports shared protected site (1:N) and many-to-many (N:N) configurations.”

It is also supported to begin with a standard two site SRM deployment and later on add additional site pairings to add in more complex topologies.

With the new flexibility a new set of potential use cases are now supported. For example a three site topology allowing a round robin like protection use case would be possible:

  • Site A’s workloads being protected to Site B
  • Site B’s workloads being protected to Site C
  • Site C’s workloads finally being protected back to Site A

other-srm-1

Another topology that might be of interest would be to have a traditional SRM pairing between two core data centers while also providing DR protection for remote offices to one (or even both) of the core data centers.other-srm-2

The examples presented above are not intended to be exhaustive. As long as you don’t exceed the configuration maximums (use 10 or fewer SRM pairings) there are a wide variety of topologies that are enabled.

SRM 5.8 Improvements In Managing Multiple SRM Pairs

With the introduction of the web client in the SRM 5.8 there have been several improvements in managing more complex topologies with SRM. The first improvement has been with the increase in supported SRM scale up to 5,000 virtual machines from 1,000 in previous releases. This makes it easier to consider deploying SRM across multiple pairs of data centers.

One of the big user interface improvements is gained by integrating with the inventory lists of the web client. It is now easy to quickly switch between multiple site pairings.

srm-site-summary-screen

The lists of protection groups and recovery plans is also consolidated so you can see plans belonging to multiple site pairings at once.

multi-site-protection-groups

In addition to the consolidated views the various wizards to create new protection groups and recovery plans (for example) let users choose the target SRM pairing during creation.

Finally the SRM installer has also been enhanced to make it easier to deploy multiple pairs of SRM servers without having to specify command line parameters to the installer. For the initial SRM pairing you would use the default identifier, for subsequent pairings each SRM instance in the pair should be installed using the same customer Plug-in Identifier (and each pairing should have a distinct identifier).

srm-multi-install

Replication Topology Support

The SRM 5.8 administration guide states:

Site Recovery Manager supports point-to-point replication. Site Recovery Manager does not support replication to multiple targets, even in a multi-site configuration.

While SRM pairs do allow you to failover distinct workloads to different recovery sites, SRM doesn’t currently support orchestrating the failover of the same workload to different recovery sites. This means that SRM only supports managing a replicated datastore with one SRM pairing.

Conclusion

Hopefully this article has outlined some of the newly supported capabilities available to SRM users. Planning multi-site disaster recovery strategies takes careful thought as to what you are looking to achieve but the new capabilities make it even easier to address a variety of recovery requirements.

Other Resources

If you’re interested in learning more about SRM 5.8 then I’d recommend the official documentationproduct resource page, the uptime blog, and Eric Shanks’ guide to SRM 5.8.

Using PowerCLI to Change SRM Recovery Settings

One of the updated capabilities in the new Site Recovery Manager 5.8 release is the ability to change some recovery settings like the recovery priority and script callouts via the SRM API. With the release of PowerCLI 5.8 R1 these capabilities are also exposed for scripting via the SRM API!

To make it easier to script against the API in PowerCLI I’ve been working on some helper SRM functions and have updated them to support some of the new capabilities.

Here’s a short example using PowerCLI to update the recovery priority of a VM and add a new post recovery call-out.

And here’s some code to perform something similar to what was done in the video using the custom SRM functions [at 2014-11-17].


# Load the custom functions
. ./SrmFunctions.ps1
. ./Examples/ReportConfiguration.ps1

# Connect to protected site VC & SRM
$creds = Get-Credential
$vca = Connect-VIServer vc-w12-01a.corp.local -Credential $creds
$srma = Connect-SrmServer -Server $vca -Credential $creds -RemoteCredential $creds

# Output Current SRM Configuration Report
Get-SrmConfigReport

# get recovery plan
$rp = Get-RecoveryPlan "Anaheim"

# get protected VM
$pvm = $rp | Get-ProtectedVM | Select -First 1

# view recovery settings
$rs = $pvm | Get-RecoverySettings -RecoveryPlan $rp

# update recovery priority
$rs.RecoveryPriority = "highest"

# create new command callout
$srmCmd = New-SrmCommand -Command '/bin/bash /root/failover.sh' -Description 'Run standard linux failover script' -RunInRecoveredVm

# add command as post recovery command callout
Add-PostRecoverySrmCommand -RecoverySettings $rs -SrmCommand $srmCmd

# update the recovery settings on the SRM server
Set-RecoverySettings -ProtectedVm $pvm -RecoveryPlan $rp -RecoverySettings $rs

# validate recovery settings (view in report)
Get-SrmConfigReportProtectedVm

Site Recovery Manager (SRM) 5.8 is Available!

I am pleased to say that Site Recovery Manager 5.8 is now available!

This was a really exciting release to be working on as there are a ton of new features, including (but not limited to):

  • Integration into the vSphere Web Client
  • Protect up to 5,000 VMs and concurrently recover 2,000 VMs (compared to limits of 1,000 VMs at GA of SRM 5.5) [KB 2081158]
  • New integration options using vCenter Orchestrator plug-in for Site Recovery Manager enabling new capabilities including:
    • Provision VMs with vCloud Automation Center and DR protect automatically with SRM
    • Create SRM protection groups
    • Create inventory mappings
    • and more!
  • Dramatically simpler IP customization at scale with subnet-level IP customization rules
  • Performance improvements to storage failover operations, up to 75% faster in larger environments
  • Streamlined deployment using new support for vPostgreSQL that provides optional embedded database option during SRM installation.
  • Improved support for Integrated Windows Authentication when using remote Microsoft SQL Server instances as the Site Recovery Manager database
  • Compatibility with vSphere Storage I/O Control

As you can see from the list of changes there has been a lot of hard work put into this release by many people!

If you are wondering SRM 5.8 supports the same set of SRAs as SRM 5.5 so no need to wait for an update from your storage vendor. Check out the drivers and tools tabs for the SRAs and the vCenter Orchestrator plugin.

VMworld 2014

Well VMworld 2014 has arrived. There were some really interesting announcements (looking forward to the demo for Project Fargo). This is my third VMworld and as usual it will be filled with customer and partner meetings as well as whatever sessions I can squeeze in. I know I could always catch the session recordings after the conference but I do like the live participation.

This year I will also be presenting on Site Recovery Manager and vCloud Automation Center integration. My sessions on Tuesday and Thursday (register online or see tweets about the session #BCO1893) and there’s currently over 800 people registered to attend so I expect it will be fairly busy.

Building More PowerCLI Custom Functions for SRM, Step by Step Walkthrough

One of my colleagues Ken Werneburg recently wrote a great blog post about executing an SRM failover via PowerCLI. Ken does a great job of explaining some of the caveats around this feature and how you would actually make the call using PowerCLI. Looking at the code provided I thought it would be a good candidate for encapsulating in a custom function. I wrote recently about using custom functions to simplify using the SRM API from PowerCLI and I think this use case would be a great example.

In Ken’s example he is coding against the raw API from PowerCLI and you have to deal with code like this:

$RPmoref = … # Set the recovery plan we want to use

# define the recovery plan mode we want to use ('1' is a test)
$RPmode = New-Object VMware.VimAutomation.Srm.Views.SrmRecoveryPlanRecoveryMode
$RPmode.Value__ = 1

# start the test
$RPmoref.Start($RPmode)

While this is not too hard to follow it does require that the user to create recovery mode objects using “magic” numbers in the code. Using custom functions we can hide that away and deliver something much nicer. What I would like us to get to is something like:

Start-RecoveryPlan -RecoveryPlan $Plan -RecoveryMode Test

Or even:

Get-RecoveryPlan -Name 'Failover Site A' | Start-RecoveryPlan

So, how do we get there?

First let’s determine how we want to be able to call the functions and specify the parameters we will accept. Here’s my first crack at this:

<#
.SYNOPSIS
Start a Recovery Plan action like test, recovery, cleanup, etc.

.PARAMETER RecoveryPlan
The recovery plan to start

.PARAMETER RecoveryMode
The recovery mode to invoke on the plan. May be one of "test" (the default), "recovery", "cleanup", and "reprotect"
#>
Function Start-RecoveryPlan () {
    Param(
        [Parameter (Mandatory=$true, ValueFromPipeline=$true)] $RecoveryPlan,
        [VMware.VimAutomation.Srm.Views.SrmRecoveryPlanRecoveryMode] $RecoveryMode = 'Test'
    )

    #TODO
}

<#
.SYNOPSIS
Stop a running Recovery Plan action.

.PARAMETER RecoveryPlan
The recovery plan to stop
#>
Function Stop-RecoveryPlan () {
    Param(
        [Parameter (Mandatory=$true, ValueFromPipeline=$true)] $RecoveryPlan
    )

    #TODO
}

A couple of things to note here. First we can directly use the SrmRecoveryPlanRecoveryMode type when defining the parameter. The PowerCLI team have made this an enum type so we get some nice behavior out of the box in terms of type casting. The second thing to note is that we are going to default to a ‘Test’ operation. I am doing this as I don’t want to get into the scenario where we default to a recovery and someone initiates a disruptive failover instead of a test or cleanup, simply because of a scripting error.

Now we have the outline of the functions let’s try and fill them in a little. First we’ll look at the Start-RecoveryPlan function. For this we want to be able to take in the recovery plan and mode parameters and call the SRM API with that information, maybe with a bit of error checking in there as well. If we take the example from Ken’s blog and put it into our function we should get something like:

Function Start-RecoveryPlan () {
    Param(
        [Parameter (Mandatory=$true, ValueFromPipeline=$true)] $RecoveryPlan,
        [VMware.VimAutomation.Srm.Views.SrmRecoveryPlanRecoveryMode] $RecoveryMode = 'Test'
    )

    # Validate with informative error messages
    $rpinfo = $RecoveryPlan.GetInfo()

    if ($rpinfo.State -eq 'Protecting') {
        throw "This recovery plan action needs to be initiated from the other SRM instance"
    }

    $RecoveryPlan.Start($RecoveryMode)
}

The only real change we have made is to avoid tripping ourselves up by running the recovery plan from the protected site. The API expects the plan to be run from the recovery site so we add a check to ensure the recovery plan state is not in the 'Protecting' state which is the default state a plan is in when seen from the protected site API.

Given that executing a recovery plan could be a disruptive event it makes sense to prompt the user for confirmation before executing the failover. We can do this by annotating our cmdlet with SupportsShouldProcess=$True and checking the value of $pscmdlet.ShouldProcess in our function.

So for one final time let’s put it together for our Start-RecoveryPlan function:

Function Start-RecoveryPlan () {
    [cmdletbinding(SupportsShouldProcess=$True,ConfirmImpact="High")]
    Param(
        [Parameter (Mandatory=$true, ValueFromPipeline=$true)] $RecoveryPlan,
        [VMware.VimAutomation.Srm.Views.SrmRecoveryPlanRecoveryMode] $RecoveryMode = 'Test'
    )

    # Validate with informative error messages
    $rpinfo = $RecoveryPlan.GetInfo()

    # Prompt the user to confirm they want to execute the action
    if ($pscmdlet.ShouldProcess($rpinfo.Name, $RecoveryMode)) {
        if ($rpinfo.State -eq 'Protecting') {
            throw "This recovery plan action needs to be initiated from the other SRM instance"
        }

        $RecoveryPlan.Start($RecoveryMode)
    }
}

We can do similar updates for the corresponding Stop-RecoveryPlan function as well. When we put them together with some of the custom functions we have defined earlier it makes it fairly easy to put together very concise scripts like this:

# First let's connect to VC and both SRM servers
$localvc = …
$un = …
$pw = …
Connect-VIServer -Server $localvc -User $un -Password $pw
Connect-SrmServer -User $un -Password $pw -RemoteUser $un -RemotePassword $pw

#Then let's find a recovery plan and execute a failover

Get-RecoveryPlan -Name '_Manchester Site Failover' | Start-RecoveryPlan -RecoveryMode Failover

As always a picture (or in this case a video) can say a thousand words…

I’ve also shared these functions in my SRM Cmdlets github repository to make it easier to illustrate how you can develop this. As always I’ll reiterate that I am pretty new to PowerShell so if you have any feedback, especially on the scripting style say, please let me know!