15.8 Scenarios for Backup and Restore

15.8.1 Scenario: Losing a Hard Drive Containing eDirectory in a Single-Server NetWork

Indira is the administrator for a single-server network at Stationery Supply, Inc. Indira can't rely on replication for fault tolerance, because her environment has only one server. The Backup Tool functionality provides a simple solution for Indira to back up and restore eDirectory. It's server-centric and it's fast.

On eDirectory 8.7.3 or to later versions, Indira sets up unattended backups for her server using batch files to run the Backup Tool.

Indira wants to do a full backup of eDirectory every Sunday night, and an incremental backup every weeknight. She sets the unattended backups to run shortly before her full and incremental file system backups each night, so her tape backups contain the eDirectory backup files as well as the file system data. She has contracted with a remote data storage company to send the tape backups offsite.

Every Monday morning, Indira checks the backup log to make sure the full backup was successful. She also checks the logs occasionally during the week to make sure the incremental backups were successful.

Indira decides not to turn on roll-forward logs for the following reasons:

She does not have a separate storage device on her server, so turning on roll-forward logs would not provide any additional backup of eDirectory. If there were a storage device failure, the logs would be lost along with eDirectory, so there is no point in creating them.
The tree does not change very much, and she is satisfied with being able to restore only up to last night's backup. She doesn’t need to be able to restore eDirectory to the moment before a failure.
Because the server does not participate in a replica ring with other servers, roll-forward logs are not required for the restore verification process to be successful.

Stationery Supply, Inc. decides to reorganize the staff, so Indira does a manual backup before and after making significant changes to the tree. Her strategy is to make a new backup of changes during the middle of a weekday when necessary, instead of running roll-forward logs all the time.

To make sure her backup strategy is ready to go when she needs it, Indira tests it occasionally. She doesn’t have the budget to purchase a second server for testing, so she makes arrangements with a test lab in her town. Using a server like hers in the test lab, she installs her operating system and tries to approximate the environment of her eDirectory database. She restores her backups and checks to make sure eDirectory is restored as she expects.

One Wednesday morning, the hard drive containing eDirectory on the server has a failure. Indira obtains a new hard drive and the backup files from the full backup on Sunday evening, the incremental backup on Monday evening, and the incremental backup on Tuesday evening. She installs the new hard drive and installs eDirectory on it. Then she restores the full and incremental backups. Any changes to the tree that were made on Wednesday morning before the hard drive failure are lost because Indira was not running roll-forward logs on the server. But Indira is satisfied with restoring only to last night's backup. She doesn’t feel that running roll-forward logs would be worth the administrative overhead.

15.8.2 Scenario: Losing a Hard Drive Containing eDirectory in a Multiserver Environment

Jorge at Outdoor Recreation, Inc. has 10 servers running eDirectory. He does full backups every Sunday night and incremental backups nightly, running the eDirectory backup shortly before the file system backup to tape.

All of the servers are participating in replica ring. Jorge uses roll-forward logging for all the servers. On each of his servers, he has placed the roll-forward logs on a different storage device than eDirectory. He monitors the free space and rights on those storage devices to make sure the roll-forward logs don't fill up the storage device. Occasionally he backs up the roll-forward logs to tape and removes all except the one in use by eDirectory, to free up space.

The administrative overhead of turning on continuous roll-forward logging is worth it to Jorge, because it gives him the up-to-the moment backup required for servers that participate in replica ring. This way, if he needs to restore a server, the restored server will match the synchronization state that other servers in the replica ring expect.

In his test lab, Jorge periodically tests his backup files to make sure his backup strategy will meet his goals.

One Thursday at 2:00 p.m., the Linux server named Inventory_DB1 has a hard drive failure on the drive containing eDirectory.

Jorge needs to gather the last full backup and the incremental backups since then, which will restore the database up to the point of last night's incremental backup at 1:00 a.m. The roll-forward logs have been recording the changes to the database since last night's backup, so Jorge will include them in the restore to bring the database back to the state it was in just before the hard drive failure.

Jorge takes the following steps:

He gets a replacement hard drive for the server.
He gets the tape of the full backup for the server from the previous Sunday night.

The batch file he uses to run full backups every Sunday night places the backup file in /adminfiles/backup/backupfull.bk.

He had specified a file size limit of 200 MB in the backup configuration settings, so there are two backup files:
- backupfull.bk.00001 (250 MB)
- backupfull.bk.00002 (32 MB)
He also gets the tapes containing the incremental backups for Monday, Tuesday, and Wednesday nights.

The batch file he uses to run incremental backups every weeknight places the backup file in /adminfiles/backup/backupincr.bk.

Because he runs the same batch file every weeknight for the incremental backups of eDirectory, they all have the same filename. He needs to give them new names when he copies them back onto the server, because they all must be placed in the same directory during the restore.
Jorge installs the replacement hard drive.

In this case, the Linux operating system for the server was not on the hard drive that failed, so he does not need to install Linux.
Jorge restores the file system from tape backup for the disk partitions that were affected.
Jorge reinstalls eDirectory, putting the server into a new temporary tree (the restore puts it back into the original tree again later).
Jorge creates an /adminfiles/restore directory on the server, to hold the files to be restored.
He copies the full backup (the set of two files) into that directory.
He copies the incremental backups for Monday, Tuesday, and Wednesday nights into the directory.

Each of them is named backupincr.bk, so when he copies them into the directory he changes the filenames to
- backupincr.mon.bk
- backupincr.tues.bk
- backupincr.wed.bk
NOTE:Full and incremental backups aren't required to be in the same directory together, but all the incremental backups must be in the same directory.
He uses iManager to restore eDirectory:
1. He goes into iManager and clicks eDirectory Maintenance > Restore.
2. He logs in to the server, using the context of the new temporary tree.
3. In the Restore Wizard - File Configuration screen, he does the following:
  
  Enters /adminfiles/restore for the location where he placed the backup files.
  
  Enters /adminfiles/restore/restore.log for the location where the restore log should be created.
4. In the Restore Wizard - Optional screen, he does the following:
  
  Checks Restore Database.
  
  Checks Restore Roll-Forward Logs.
  
  Enters the location of the roll-forward logs.
  
  (This is the separate location that he created specifically to hold the roll-forward logs. Because he placed them on a different hard drive than eDirectory, the hard drive failure did not affect them and they are still available.)
  
  Checks Restore Security Files
  
  Checks Activate the Restored Database after Verification.
  
  Checks Open the Database after Completion of Restore.
  
  Wants eDirectory to open if the restore verification is successful.
He starts the restore and enters the filenames of the incremental backup files when prompted.
The restore verification is successful, so the database opens, back in its original tree.

The restore verification was successful because roll-forward logs were running on the server when the hard drive failed, and Jorge included the logs in the restore.
Jorge re-creates the roll-forward logs configuration on the server after the restore is complete, then he creates a new full backup.

The settings are reset to the default during a restore, which means roll-forward logging is turned off, so he has to turn it back on. The new full backup is necessary so that he is prepared for any failures that might occur before the next unattended full backup is scheduled to take place.

Jorge checks the way the server is running, and it appears to be normal.

15.8.3 Scenario: Losing an Entire Server in a Multiple-Server Environment

Bob is the administrator for 15 servers at GK Designs Company. He does full backups every Saturday night and incremental backups nightly, running the eDirectory backup shortly before the file system backup to tape.

All of the servers are participating in replica ring. Bob uses roll-forward logging for all the servers.

An electrical fire destroys one of the servers in a branch across town. Fortunately, all but one of the partitions held by this server are also replicated on other servers. Bob had turned on roll-forward logs on that server, but they were lost along with all the other server data, so he can't restore the eDirectory database on that server to the state it was in just before the server went down.

However, he is able re-create the server's eDirectory identity by restoring with the existing backup files. Because Bob can't include the roll-forward logs in the restore, the server does not match the synchronization state that the other servers expect (see Transitive Vectors and the Restore Verification Process), so the restore verification process is not successful. This means that by default the eDirectory database is not opened after the restore.

Bob addresses the situation by removing this server from the replica ring, using DSRepair to change all the outdated replica information on the server to external references, and then re-adding a new copy of each partition to this server using replication from the other servers that hold the up-to-date replicas. These steps are described in Recovering the Database If Restore Verification Fails.

The one partition on this server that Bob had not replicated was a container that held network printing objects for the branch office location, such as a fax/printer and a wide-format color printer. This partition information can't be recovered by the method noted above because no other server has a replica. Bob must re-create the objects in that partition, and this time he chooses to replicate them on other servers for better fault tolerance in the future.

Bob also re-creates the roll-forward log configuration after the server is back on line (because the restore turns it off and resets the settings to the default), and creates a new full backup as a baseline.

15.8.4 Scenario: Losing Some Servers in a Multiple-Server Environment

Joe administers 20 servers across three locations. At one location, a pipe bursts and water destroys 5 out of 8 servers.

Joe has eDirectory backups for all the servers. However, all the servers participate in replica ring, and he is concerned about bringing them back into the tree without the roll-forward logs, which were also lost. He is not sure which servers to restore eDirectory on first or how to address inconsistencies between replicas. Because of the complex issues involved, he calls NetIQ Support for help in deciding how to restore.

15.8.5 Scenario: Losing All Servers in a Multiple-Server Environment

Delores and her team at Human Resources Consulting, Inc. administer 50 servers at one location.

For fault tolerance during normal business circumstances, they have created three replicas of each partition of their tree, so that if one server is down, the objects in the partitions it holds are still available from another server. They have also planned for recovery of individual servers by backing up all their servers regularly with the Backup Tool, turning on roll-forward logging, and storing the backup tapes at a remote location.

For disaster recovery planning, Delores and her team have also designated two of their servers as DSMASTER servers. They use two servers because their tree is large enough that more than one DSMASTER server is needed to hold a replica of every partition. Every partition in the tree is replicated on one of the two DSMASTER servers. Neither of the two DSMASTER servers hold replicas of the same partition, so there is no overlap between them. This design is an important part of their disaster recovery plan.

In their test lab, Delores and her team periodically test the backups to make sure their backup strategy will meet their goals.

One night the Human Resources Consulting, Inc. building is damaged by a hurricane, and all the servers in the data center are destroyed.

After this disaster, Delores and her team first restore the two DSMASTER servers, which hold replicas of every partition. They use the last full backup and the subsequent incremental backups, but can't include roll-forward logs in the restore because they were lost when the servers were destroyed. Delores and her team planned the DSMASTER servers so that they don't share replicas. Because the two DSMASTER servers do not share replicas, the restore verification process is successful for both servers even though the roll-forward logs are not part of the restore. After the DSMASTER servers are restored, all the objects in the tree for Human Resources Consulting, Inc. are now available again.

The DSMASTER servers are important because Delores and her team can use them to re-create the tree without inconsistencies after a disaster.

They were using roll-forward logs so they could restore a server to the state it was in at the moment before it went down, bringing it back to the synchronization state expected by other servers in the replica ring. This allows the server to resume communication where it left off, and receive any updates it needs from the other replicas to keep the whole replica ring in sync.

However, in this disaster situation, Delores and her team do not have the roll-forward logs. Without the roll-forward logs, only one server in a replica ring can be restored without errors—the first one they restore. For the rest of the servers, the restore verification process will fail because the synchronization states don't match what the other servers expect (see Transitive Vectors and the Restore Verification Process). If the restore verification fails, the restore process will not activate the restored eDirectory database.

Delores and her team anticipated this, and they have planned for it. They use the two DSMASTER servers as a starting point, which gives them only one replica of each partition.Those servers can be restored without verification errors, and then the replicas they hold can be used as masters to be copied onto all the other servers.

After restoring the DSMASTER servers, restoring the rest of the servers requires some extra steps. Delores and her team must restore each of the remaining servers by doing the following:

Making sure that the replicas on the DSMASTER servers are designated as master replicas.
Removing all the servers except the DSMASTER servers from the replica ring.
Restoring the full and incremental backups for each of the other servers.

Delores and her team know that the restore verification process will fail for the rest of the servers, because they could not use roll-forward logs in the restore for any of the servers. This leaves them with a restored database that is not activated.
Activating the restored database, but keeping it locked, using advanced restore options
Using DSREPAIR to change all the replica information to external references.
Unlocking the restored database.

At this point the server has the same identity it did before but it will not try to synchronize replica information. Instead, it is prepared to receive a new copy of the replicas it held before.
Adding the replicas back on to each server by replicating them from the copy on the DSMASTER server.

Delores and her team have a pretty good idea which replicas were held by each server, but they can read the header of the backup files for each server to see a list of the replicas that were on the server at the time of the last backup.
Re-creating the roll-forward log configuration after the servers are back on line (since the restore turns it off and resets the settings to the default), and creating a new full backup as a baseline to prepare for any other failures that might happen before the next unattended full backup is scheduled.

(These steps are explained in more detail in Recovering the Database If Restore Verification Fails.)

Delores and her team have a lot of work to do, but they can get the tree itself up relatively quickly, and they can expect to recover the eDirectory identity for all of their servers.