VMware Site Recovery VMworld 2017 Session

During VMworld 2017 I shared a tech preview, with GS Khalsa, of the VMware Site Recovery service that’s now available as an add-on to VMware Cloud on AWS. While we’ve already made several enhancements to the service, over and above what you’ll see in the tech preview, I think it still illustrates many of the exciting new options available with VMware Site Recovery today!

Check out the session online

Launch Often! VMware Cloud on AWS

At VMworld, back in August, the first version of VMware Cloud on AWS was launched. Now three months later we’re doing it again! As the Product Manager owning the storage and disaster recovery initiatives it’s been a great experience to work with the joint VMware and AWS teams as we delivered the storage platform for VMware Cloud on AWS (built with vSAN), and are now delivering new Disaster Recovery (DR) capabilities with VMware Site Recovery.

Delivering improved resiliency and DR options has been an important focus for VMware Cloud on AWS. This new capability allows customers to protect their mission-critical workloads running on-premises to VMware Cloud on AWS, or vice-versa. We also support protection between VMware Cloud on AWS SDDCs. This enables customers to protect workloads across different AWS Availability Zones, or even between AWS Regions with the newly announced support for US East (N. Virginia).

It’s also been a great experience to work closely with some of our forward-looking customers as we’ve been developing VMware Cloud on AWS. Listen to one of these early customers share their view of the collaboration between VMware and AWS, and the new capabilities we’re delivering.

More details on the VMware Site Recovery solutions can be found on the VMware Cloud Services site:

SRM 6.5 PowerCLI Module Changes

With the recent release of PowerCLI 6.5.1 the PowerCLI team moved to a more modular approach to delivering their capabilities. This new PowerCLI release also made some changes related to SRM, from their launch blog:

The SRM cmdlets have been removed from the Core module and a new SRM module has been created. The new module is named VMware.VimAutomation.Srm and features updated cmdlets that enable users to interact with the API views for the SRM 6.5 API!

The PowerCLI SRM module provides easy access to the SRM public API. To make it easier to work with the new SRM 6.5 public API I have updated my SRM advanced functions to work with the new PowerCLI 6.5.1 release and the new SRM 6.5 APIs. This new version of SRM-Cmdlets, v0.2, is not backwards compatible with earlier versions of PowerCLI and is intended for use with SRM 6.5, if you are using earlier versions of PowerCLI or SRM you should stick with the earlier release of these cmdlets.

I am hosting the SRM-Cmdlets project on Github, so you can get access to the latest enhancements there and provide feedback via Github issues.

The commands available in the v0.2 release are:

  • Add-SrmPostRecoveryCommand
  • Add-SrmPreRecoveryCommand
  • Add-SrmProtectionGroupToRecoveryPlan
  • Export-SrmRecoveryPlanResultAsXml
  • Get-SrmPlaceholderVM
  • Get-SrmProtectedDatastore
  • Get-SrmProtectedVM
  • Get-SrmProtectionGroup
  • Get-SrmProtectionGroupFolder
  • Get-SrmRecoveryPlan
  • Get-SrmRecoveryPlanFolder
  • Get-SrmRecoveryPlanResult
  • Get-SrmRecoverySetting
  • Get-SrmReplicatedDatastore
  • Get-SrmReplicatedVM
  • Get-SrmServer
  • Get-SrmServerApiEndpoint
  • Get-SrmServerVersion
  • Get-SrmTestVM
  • Get-SrmUnProtectedVM
  • New-SrmCommand
  • New-SrmProtectionGroup
  • New-SrmRecoveryPlan
  • Protect-SrmVM
  • Remove-SrmPostRecoveryCommand
  • Remove-SrmPreRecoveryCommand
  • Remove-SrmProtectionGroup
  • Remove-SrmProtectionGroupFromRecoveryPlan
  • Remove-SrmRecoveryPlan
  • Set-SrmRecoverySetting
  • Start-SrmDiscoverDevice
  • Start-SrmRecoveryPlan
  • Stop-SrmRecoveryPlan
  • Unprotect-SrmVM

This includes some new commands as well as some updates to existing commands. Hopefully these commands provide some useful examples of working with the SRM public API in PowerCLI 6.5.1.

Automating vSphere Replication and SRM with vRealize Orchestrator

With the release of the vSphere Replication plugin for vRealize Orchestrator there are a whole set of new automation capabilities when it comes to Disaster Recovery for your VMware environment. The new plugin enables you to configure vSphere Replication:

  • to vCloud Air Disaster Recovery Service
  • to another vCenter deployment
  • or even within the same vCenter deployment

I’d recommend reading the release notes for more details on what capabilities are on offer. In this blog post I want to demonstrate how you can easily combine the vSphere Replication plugin with the SRM plugin to automate the end to end replication and DR protection of your Virtual Machines. Without further ado here’s where we want to end up:

In this short video you saw that the administrator selected a handful of Virtual Machines they wanted to protect, invoked a VRO workflow from the context menu, and was able to completely configure the replication and protection of the VMs in just a few seconds! In fact most of the video was me driving the UI to show you all the items (replication schedules, protected VMs, protection groups, recovery plans, etc.) that were created by the workflow.

So What do We Need to Make It Work?

The demonstration used the following products:

How Did You Build The Workflow?

The workflow was built by linking together various out of the box workflows provided by the VR and SRM plugins. I actually built two workflows so I could separate out the workflow that linked together the two plugins capabilities, and another workflow that wrapped the more complex workflow with some predefined attributes and a little scripting to simplify the presentation and make it easy to call from the vSphere Web Client.

As you can see the workflow assigned to the context menu provides is basically a wrapper with a little scripting to allow me to use predefined RPO tiers when presenting the workflow and some predefined attributes to allow me to simplify the what I choose to present to the user.

vro-workflow-1

The workflow that ties the VR and SRM workflows together looks more complex, but most of it is simply chaining together out of the box workflows.

vro-workflow-2

The green and blue highlighted sections are the workflows provided by the VR and SRM plugins for vRO. The red highlighted section was custom built to handle a simple lookup from an instance of VC:VirtualMachine to an instance of SRM:UnassignedReplicatedVm. The script to do this is a one liner:

replVm = Server.findForType("SRM:UnassignedReplicatedVm", sourceVm.id);

Conclusion

Hopefully this has given you a glimpse into what you can achieve with these new automation capabilities. While there is a learning curve to vRealize Orchestrator there is also a lot of potential to streamline your operations. Being able to combine the automation capabilities of VR and SRM opens up a lot of new possibilities to explore.

Update: Presenting RPOs as a Dropdown list

One of my colleagues asked me how I was able to present the RPO values as a drop-down list. To do that required me to configure the presentation properties of the workflow. Within vRealize Orchestrator you are able to control how parameters are presented to the end user in the presentation tab. Here I simply added a set of Predefined answers to a field I called ReplicationTier (you can also order and group the parameters that you present to the customer as well in this tab).

vro-workflow-presentation

The selected value is then passed to a short script that parses the value selected by the user and determines what the RPO value, in minutes, should be set as:

RPO = 4 * 60; // set default to 240 minutes, i.e. 4 hours.
if (ReplicationTier.indexOf("Gold") > -1) {
    RPO = 15; // 15 minute RPO
} else if (ReplicationTier.indexOf("Silver") > -1) {
    RPO = 4 * 60; // 4 hour RPO
} else if (ReplicationTier.indexOf("Bronze") > -1) {
    RPO = 12 * 60; // 12 hour RPO
}

This is the Lookup RPO from tier script task in the first workflow schema image shared above. Of course this is not necessary, you can just ask for the RPO value as a number input (with min and max values) but I thought the dropdown selection was a nice option to demonstrate.

What You Need to Know About SRM 6.0

With the launch of VMware Site Recovery Manager 6.0 here are some useful resources for people that want to learn more about the new features, roll out a fresh deployment, or who are looking to upgrade.

What’s New

Here are my posts about some key changes in SRM 6:

I’d also recommend checking out the blogs by GS Khalsa for SRM 6.0 and Jeff Hunter for vSphere Replication 6.0.

Where Can I Download the Bits?

Planning a Fresh Install or Upgrade to SRM 6?

Here are some resources you will find useful when planning the setup and deployment of SRM:

In addition to the SRM documentation I highly recommend reading the vSphere installation documentation and the following KB articles and white papers that are relevant to a multi-site and multi-product deployment:

Network Ports to Open for SRM and vSphere Replication 6.0

Because you do actually want to replicate the VMs don’t you?

What Do I Need to Know About Deploying SRM 6 in Larger Environments?

SRM 6.0 now supports up to 2,000 VMs replicated with vSphere Replication (up from 500 in the 5.8 release). SRM 6.0 continues to support protection of up to 5,000 VMs with array based replication.

What Site Topologies Does SRM 6 Support?

SRM 6.0 continues to support a variety of deployment topologies:

Where Can I Learn More?

The SRM Administration Guide is a great resource, also Eric Shank’s SRM 5.8 guide is also largely applicable to SRM 6.0. If you have any questions on SRM I’d recommend posting in the SRM community at VMTN.

 

SRM 6 Inventory Mapping Improvements

With the release of SRM 5.8 the user interface was significantly updated to integrate with the vSphere Web Client for the first time. As part of the update of the user interface we improved a lot of things, like being able to add paired array managers at the same time, creating reverse inventory mappings, or my personal favorite enabling rule based IP reconfiguration.

When demoing these new improvements to customers the feedback on these changes was very positive. One piece of feedback that I heard consistently (even in the SRM 5.8 beta) was the need to make it easier to create inventory mappings, especially at scale. As a result of that customer feedback one of the UI enhancements introduced in the recent SRM 6.0 release is the introduction of streamlined inventory mapping for networks and folder structures.

Introducing A New Option To Create Inventory Mappings by Matching Folder and Network Names

Picture a scenario where you have a large number of folders or networks that you want to create inventory mappings for. In SRM 5.8 that process involved you either creating the mapping one-by-one in the user interface (and being able to create the reverse mapping automatically) or automating the process via the SRM public API or VRO plug-in. Now with SRM 6.0 you can just select the root of a hierarchy you want to map and all the child elements will be automatically matched by name for you.

If you maintain consistent folder or network naming across sites this could potentially save a lot of time in creating the initial inventory mappings, especially for large inventories.

Walkthrough Auto-Mapping Folders by Name

A lot of customers use folders to organize their VM inventory, in this example the VMs are organized by department and the same naming scheme is used on both sites for consistency. Here are the two sites, Anaheim and Boulder, with all the departmental folders organized under a top level “Production” folder at each site.

srm-6-mapping-1

When creating an inventory mapping you are now prompted to either select the existing mapping behavior where you select items manually, or the new automatic behavior based on matching names. Choose the new option and proceed to the next screen.

srm-6-mapping-2

Now you will select the source and target “roots” for the folders you want to map. You choose the root folder on the left for the first site, followed by the target folder on the right for the second site. Then click the Add mappings button to generate the automatic mappings.

srm-mapping-3

A small confirmation box will pop-up showing the results of the automatic mapping.

srm-mapping-4

Next you can review the suggested mappings and go onto the next step in the mapping process.

srm-mapping-5

The final step is to decide whether you want to create the reverse mappings or not for these folders. If you do you can either select them one by one, or just click the “Select all applicable” link to select them all at once and then complete the folder mapping dialog by clicking finish.

srm-mapping-6

Walkthrough Auto-Mapping Networks by Name

Just as we could use name based matching to speed up the creation of inventory mappings for folders we can also do the same thing for networks.Here’s an abbreviated walkthrough showing the same approach to configuring inventory mappings of distributed switches.

First we can see that we have some distributed switches where the associated port groups have matching names across the two sites.

srm-mapping-7

In the same way we could select automatic mapping mapping for folders we can select the option to automatically map our networks as well.

srm-mapping-8

For the next step we select the distributed switches as the root of the mapping on both sites and click the “Add mappings” to generate the matches.

srm-mapping-9

After dismissing the popup and reviewing the proposed mappings we can continue on with the rest of the wizard to completion.

Summary

If you are doing a small scale SRM deployment with just a couple of folders or port groups these enhancements are not going to be a huge deal. If however you deal with 10’s of folders or networks and have adopted consistent naming across both sites there is the potential for this to make your initial setup of SRM much more efficient.

Further Reading

SRM 6.0 Simplified Certificates

One of the improvements I was most happy to see in VMware Site Recovery Manager 6.0 was the simplified experience deploying SRM with external certificates. With earlier SRM releases external certificates were used to both authenticate the SRM instances with each other and also authenticate SRM servers to their associated vCenter instance. This dual purpose meant that there were several requirements and restrictions placed on external certificates that made it more difficult to quickly deploy SRM when using external certs.

With the integration of SRM 6.0 with SSO the certificate requirements (imposed by the dual usage of certificates) could be relaxed compared to earlier releases. These improvements will make it easier to deploy SRM with external certificates. The SRM 6.0 Installation and Configuration guide provides full details of the updated certificate requirements. A short list of the improvements taken from the guide are:

  • “If you use a custom certificate for vCenter Server and Platform Services Controller, you are not obliged to use a custom certificate for Site Recovery Manager, and the reverse.”
  • “Unlike in previous releases, there is no requirement for the certificate to also be a client certificate.”
  • “The Subject Name does not need to be the same for both members of a Site Recovery Manager Server pair.”

Another improvement in this release is that SRM 6.0 will warn customers who try and use certificates with SHA1 signature algorithms (SHA256 or stronger is recommended). Also in this release the insecure MD5 signature algorithm is no longer supported with SRM.

While improved certificate handling is a fairly small improvement (and there’s still more room to improve) I do think it is indicative of the focus that the SRM team has been putting on improving the overall operational experience of the product.

Further Reading

SRM 6 and vSphere 6 Storage DRS (SDRS) Improvements for Array Based Replication

With the announcement of vSphere 6 one of the touted features was improved integration between Site Recovery Manager 6 and vSphere 6’s Storage DRS feature when using array based replication. Since SRM 5.5 and vSphere 5.5 the two capabilities have been supported together with some caveats. With the newly announced release the integration between the two will be much simpler. This post will focus on array based replication but as noted in Cormac’s post the SDRS and vSphere Replication integration has also improved.

Okay, Tell Me More About Those Array Based Replication Caveats

Since SRM 5.5 you have been able to use Storage vMotion to move protected VMs between datastores belonging to the same consistency group. However there was nothing to warn users when they were migrating a VM outside of the consistency group (or even onto non-replicated datastores) or prevent Storage DRS doing the same.

To workaround this there are two key items customers have been doing:

So What Changes in vSphere 6 and SRM 6?

The SRM and Storage DRS teams have worked together to make SDRS aware of which datastores are being replicated and the consistency group membership. When SRM discovers replicated devices it associates information to the appropriate datastores in vSphere (e.g. that it is replicated and the consistency group), this information is then used by SDRS in deciding which automatic moves it can make, SDRS will not perform any automatic migrations that would impair the recoverability of a VM with SRM.

With this change you don’t have to be as fine-grained with your storage clusters, or alternatively as coarse in your failover granularity. Now you can mix non-replicated and replicated datastores in the same cluster, and even replicated datastores belonging to different consistency groups.

This reduces the operational burden of working with SRM and SDRS and provides much more flexibility in how you choose to organize your datastores and datastore clusters.

srm-sdrs

[Update March 17th 2015] – There is now a KB article, How Site Recovery Manager Handles Storage DRS Tagging (2108196), that goes into some more detail on how SRM and SDRS interoperate using tags.

Further Reading

VMware SRM Topologies

One of the topics that people often ask me about is what datacenter topologies VMware SRM supports. The good news is that SRM does have flexible capabilities for a number of topologies to help support more complex use cases beyond the typical two site deployment.

Shared Recovery Site (VC at Remote Sites)

The shared recovery site use case is one of the more commonly seen topologies outside of a two datacenter deployment. This is a use case that has been supported by SRM for multiple releases and provides a good option for customers that are looking to protect virtual machines at remote offices that have their own vCenter instances.

robo-remote-vc

With this use case the customer is able to share the resources used for recovery amongst several remote offices. This provides flexibility in providing the resources required at the recovery site. A conservative approach would provide sufficient capacity to failover all remote sites to the shared recovery site. An alternative approach would be to oversubscribe the recovery site resources on the assumption that you wouldn’t need to recover all the remote sites at once. This second approach obviously has some benefits in terms of the potential for reduced capital expense at the recovery site but may add risk if there is a chance that all the remote sites may actually need to fail over at the same time.

One of the advantages of this topology, with remote vCenters at each site, is that local IT retain full manageability of the infrastructure. Even in the event of a network outage between the central data center and the remote site the hosts at the remote site remain accessible. In contrast one of the disadvantages of this approach is that each SRM pair must be at the same major and minor version and each vCenter server must be a supported version for the SRM server joined to it. This means that upgrades of SRM or vCenter across the set of data centers should be carefully planned and coordinated.

Shared Recovery Site (Central VC)

While the above topology is a common approach to providing shared recovery site capabilities it is not the only approach to achieving this. If there isn’t a need for each remote site to have its own vCenter instance then a simple single-pair SRM deployment could be adapted to meet the use case.

robo-central-vc

For this deployment you would deploy two vCenter instances, one to manage the resources at the recovery site and one to manage the resources at the remote site. An instance of SRM for each vCenter would be deployed and the two paired together. You could then create recovery plans for each remote site but have a single shared vCenter to consolidate management of the remote sites.

This approach has the advantage that upgrades of the vCenter and SRM instances would be somewhat simpler than the shared recovery site (fewer moving parts). Additionally you would be able to have different levels of recovery plan in the same SRM deployment. For example you could define a recovery plan for each remote office but also recovery plans for each region, grouping the remote offices together. The disadvantage of this approach is that the remote infrastructure couldn’t be managed by vCenter when you lose connectivity.

Additional Multi-Pair Topologies

In addition to the shared recovery site topologies defined above I am often asked about support for other multi-datacenter topologies. Prior to the release of SRM 5.8 VMware supported some additional topologies using the Request for Product Qualification (RPQ) process.

While the shared recovery topology is still the main use case for multi-pair topologies, the technology provided by SRM underneath actually supports more flexible configurations. The SRM documentation has recently been updated to remove some of the previous limitations and allow some more complex topologies to be supported based on the concept of multiple SRM pairs. From the SRM 5.8 Installation and Configuration Guide:

“In addition to the shared recovery site configuration, Site Recovery Manager also allows and supports shared protected site (1:N) and many-to-many (N:N) configurations.”

It is also supported to begin with a standard two site SRM deployment and later on add additional site pairings to add in more complex topologies.

With the new flexibility a new set of potential use cases are now supported. For example a three site topology allowing a round robin like protection use case would be possible:

  • Site A’s workloads being protected to Site B
  • Site B’s workloads being protected to Site C
  • Site C’s workloads finally being protected back to Site A

other-srm-1

Another topology that might be of interest would be to have a traditional SRM pairing between two core data centers while also providing DR protection for remote offices to one (or even both) of the core data centers.other-srm-2

The examples presented above are not intended to be exhaustive. As long as you don’t exceed the configuration maximums (use 10 or fewer SRM pairings) there are a wide variety of topologies that are enabled.

SRM 5.8 Improvements In Managing Multiple SRM Pairs

With the introduction of the web client in the SRM 5.8 there have been several improvements in managing more complex topologies with SRM. The first improvement has been with the increase in supported SRM scale up to 5,000 virtual machines from 1,000 in previous releases. This makes it easier to consider deploying SRM across multiple pairs of data centers.

One of the big user interface improvements is gained by integrating with the inventory lists of the web client. It is now easy to quickly switch between multiple site pairings.

srm-site-summary-screen

The lists of protection groups and recovery plans is also consolidated so you can see plans belonging to multiple site pairings at once.

multi-site-protection-groups

In addition to the consolidated views the various wizards to create new protection groups and recovery plans (for example) let users choose the target SRM pairing during creation.

Finally the SRM installer has also been enhanced to make it easier to deploy multiple pairs of SRM servers without having to specify command line parameters to the installer. For the initial SRM pairing you would use the default identifier, for subsequent pairings each SRM instance in the pair should be installed using the same customer Plug-in Identifier (and each pairing should have a distinct identifier).

srm-multi-install

Replication Topology Support

The SRM 5.8 Installation and Configuration guide states:

Site Recovery Manager supports point-to-point replication. Site Recovery Manager does not support replication to multiple targets, even in a multi-site configuration.

While SRM pairs do allow you to failover distinct workloads to different recovery sites, SRM doesn’t currently support orchestrating the failover of the same workload to different recovery sites. This means that SRM only supports managing a replicated datastore with one SRM pairing.

Conclusion

Hopefully this article has outlined some of the newly supported capabilities available to SRM users. Planning multi-site disaster recovery strategies takes careful thought as to what you are looking to achieve but the new capabilities make it even easier to address a variety of recovery requirements.

Other Resources

If you’re interested in learning more about SRM 5.8 then I’d recommend the official documentationproduct resource page, the uptime blog, and Eric Shanks’ guide to SRM 5.8.

Further Reading