15.5 Failover

In a failover operation, the failover workload within a PlateSpin Protect VM container takes over the business function of a failed production workload.

15.5.1 Detecting Offline Workloads

PlateSpin Protect constantly monitors your protected workloads. If an attempt to monitor a workload fails for a predefined number of times, PlateSpin Protect generates a Workload is offline event. Criteria that determine and log a workload failure are part of a workload protection’s Tier settings. See Tier Settings row in Workload Protection Details.

If notifications are configured along with SMTP settings, PlateSpin Protect simultaneously sends a notification email to the specified recipients. See Configuring Email Notification Services for Events and Replication Reports.

If a workload failure is detected while the status of the replication is Idle, you can proceed to the Run Failover command. If a workload fails while an incremental is underway, the job stalls. In this case, abort the command (see Aborting Commands), and then proceed to the Run Failover command. See Performing a Failover.

Figure 15-1 shows the Web Interface’s Dashboard page upon detecting a workload failure. Note the applicable tasks in the Tasks and Events pane:

Figure 15-1 The Dashboard Page upon Workload Failure Detection (‘Workload Offline’)

15.5.2 Performing a Failover

Failover settings, including the failover workload’s network identity and LAN settings, are saved together with the workload’s protection details at configuration time. See Failover Settings in Workload Protection Details.

You can use the following methods to perform a failover:

  • Select the required workload on the Workloads page and click Run Failover.

  • Click the corresponding command hyperlink of the Workload is offline event in the Tasks and Events pane. See Figure 15-1.

  • Run a Prepare for Failover command to boot the failover VM ahead of time You still have the option to cancel the failover (useful in staged failovers).

Use one of these methods to start the failover process and select a recovery point to apply to the failover workload (see Recovery Points). Click Execute and monitor the progress. Upon completion, the replication status of the workload should indicate Live.

For testing the failover workload or testing the failover process as part of a planned disaster recovery exercise, see Using the Test Failover Feature.

15.5.3 Using the Test Failover Feature

PlateSpin Protect provides you with the capability to test the failover functionality and the integrity of the failover workload. This is done by using the Test Failover command, which boots the failover workload in an isolated networking environment for testing the functionality of the failover and verifying the integrity of the failover workload.

When you execute the command, PlateSpin Protect applies the Test Failover Settings, as saved in the workload protection details, to the failover workload. See Test Failover Settings in Workload Protection Details.

To use the Test Failover feature:

  1. Define an appropriate time window for testing and ensure that there are no replications underway. The replication status of the workload must be Idle.

  2. On the Workloads page, select the required workload, click Test Failover, select a recovery point (see Recovery Points), and the click Execute.

    Upon completion, PlateSpin Protect generates a corresponding event and a task with a set of applicable commands:

  3. Verify the integrity and business functionality of the failover workload. Use the VMware vSphere Client to access the failover workload in the VM container

  4. Mark the test as a failure or a success. Use the corresponding commands in the task (Mark Test Failure, Mark Test Success). The selected action is saved in the history of events associated with the workload and is retrievable by reports. Dismiss Task discards the task and the event.

    Upon completion of the Mark Test Failure or Mark Test Success tasks, PlateSpin Protect discards temporary settings that were applied to the failover workload, and the protection returns to its pre-test state.