17.7 Recovering the Database If Restore Verification Fails

The restore process includes a verification step, which compares the eDirectory database on the server being restored to other servers in the replica ring by comparing the transitive vectors. For more information on the restore process, see Overview of How the Backup Tool Does a Restore and Transitive Vectors and the Restore Verification Process.

If the transitive vectors do not match, the verification fails. This usually indicates that data is missing from the files you used for the restore. For example, data might be missing for the following reasons:

  • You did not turn on roll-forward logging before the last backup was performed.

  • You did not include the roll-forward logs in the restore.

  • The set of roll-forward logs you provided for the restore was not complete.

By default, the restored eDirectory database will not open after the restore if it is inconsistent with the other replicas.

If you have all the backup files and roll-forward logs necessary for a complete restore but forgot to provide all of them during the process, you can simply run the restore again with a complete set of files. If the restore is complete on a second try, the verification can succeed and the restored database will open.

If you do not have all the backup files and roll-forward logs necessary to make the restore complete so that verification will be successful, you must follow the instructions in this section to recover the server. Here is an outline of what you can recover if verification fails:

  • You can still recover the server's identity and file system rights.

  • You cannot recover any replicas on this server from backup, but the server can still be used for the replicas it contained after you follow the recovery procedure in this section. You must remove the server from the replica ring and use advanced Restore options and the DSRepair Tool to bring the server to a state where it can be put back in the replica ring. Then you can re-add the desired replicas to it.

  • Unfortunately, if this server had the sole copy of any partition of the database (there were no other replicas of the partition), the partition cannot be recovered.

Use the instructions in this section after verification fails to recover the server's identity and file system rights, and to remove and re-add it to the replica ring. When you have followed these steps and the replication process is complete, the server should function as it did before the failure (with the exception of any partitions that were not replicated and, therefore, can't be recovered).

First, complete Cleaning Up the Replica Ring. Then continue with Repair the Failed Server and Re-add Replicas to the Server.

17.7.1 Cleaning Up the Replica Ring

This procedure explains how to,

  • Reassign master replicas. If the failed server holds a master replica of any partition, you must use DSRepair to designate a new master replica on a different server in the replica list.

  • Remove replica list references to the failed server. Each server participating in replica ring that included the failed server must be told that the failed server is no longer available.

Prerequisites

  • eDirectory is installed on the machine where you are trying to restore the failed server.

  • A restore was attempted, and the restore verification failed.

  • The eDirectory database is open and running, and the database named RST is still on the machine (left there by the restore process).

  • You know which replicated partitions were stored on the failed server. The replicas this server held are listed in the header of the backup file.

Procedure

To clean up the replica ring:

  1. At the console of one of the servers that shared a replica with the failed server, load DSRepair with the switch that lets you access the advanced options.

    • Windows: Use the -a switch.

    • Linux: Use the -Ad switch.

    For more information on how to run DSRepair with advanced options using the -a or -Ad switches, see DSRepair Options.

    WARNING:If you use DSRepair with -a or -Ad, some of the advanced options can cause damage to your tree.

  2. Select Replica and Partition Operations.

  3. Select the partition you want to edit, so you can remove the failed server from the replica ring of that partition.

  4. Select View Replica Ring to see a list of servers that have replicas of the partition.

  5. (Conditional) If the failed server held the master replica, select another server to hold the master by selecting Designate This Server As the New Master Replica.

    The replica ring now has a new master replica. All replicas participating in the ring are notified that there is a new master.

  6. Wait for the master replica to be established. Make sure the other servers in the ring acknowledge the change before proceeding.

  7. Go back to View Replica Ring. Select the name of the failed server, then select Remove This Server from the Replica Ring.

    If you have not loaded DSRepair with -a or -Ad (depending on the platform) for advanced options, you will not see this option in the list.

    WARNING:Make sure you do not do this if the failed server is designated as the master replica. You can see this information in the list of servers in the ring. If it is the master, designate a different server as the master as noted in Step 5. Then, come back to this step and remove the failed server from the replica ring.

  8. Log in as Admin.

  9. After reading the explanation message, enter your agreement to continue.

  10. Exit DSRepair.

    All servers participating in that replica ring are notified.

  11. Repeat this procedure on one server for each replica ring that the failed server participated in.

To finish preparing the failed server to get new copies of the replicas, continue with the next procedure, Repair the Failed Server and Re-add Replicas to the Server.

17.7.2 Repair the Failed Server and Re-add Replicas to the Server

This procedure lets you change the replica information on the server to external references, so that the server does not consider itself to be part of a replica ring. After you remove the replicas from the server in this way, you can unlock the database.

After removing the replicas, you complete the procedure by re-adding the replicas to the server. This way, the server receives a new, up-to-date copy of each replica. When each replica has been re-added, the server should function as it did before the failure.

To remove replicas using DSRepair, and re-add them using replication:

  1. Make sure you have completed Cleaning Up the Replica Ring.

  2. Specify the advanced restore option to override the restore, then specify a log filename:

    dsbk restadv -v -l logfilename

    This advanced restore option renames the RST database (the database that was restored but failed the verification) to NDS, but keep the database locked.

  3. At the server console, change all the replica information on the server into external references using advanced options in DSRepair.

    • Windows: Click Start > Settings > Control Panel > NetIQ eDirectory Services. Select dsrepair.dlm. In the Startup Parameters field, type -XK2 -rd. Click Start.

    • Linux: Enter the following command:

      ndsrepair -R -Ad -xk2

    The -rd or -R switch repairs the local database and the replica.

    WARNING:If used incorrectly, DSRepair advanced options can cause damage to your tree.

  4. When the repair is finished, remove the lockout and open the database using the following advanced restore options in the eMBox Client:

    dsbk restadv -o -k -l logfilename

    The -o opens the database and the -k removes the lockout.

  5. Use iManager to add the server back into the replica ring:

    1. In NetIQ iManager, click the Roles and Tasks button Roles and Tasks button.

    2. Click Partition and Replica Management > Replica View.

    3. Specify the name and context of the partition you want to replicate, then click OK.

    4. Click Add Replica.

    5. Next to the Server Name field, click the Browse button Browse button, then select the server you just restored.

    6. Select the type of replica you want, click OK, then click Done.

    7. Repeat these steps for each replica ring that the server was participating in.

  6. Wait for the replication process to complete.

    The replication process is complete when the state of the replicas changes from New to On. You can check the state in iManager. See Viewing Information about a Replica for more information.

  7. To restore NICI security files, first restore the NICI files alone and then restart the NDSD server and restore the DIB.

  8. (Conditional) If you want to use roll-forward logging on this server, you must re-create your configuration for roll-forward logging to make sure it is turned on and the logs are being saved in a fault-tolerant location. After turning on the roll-forward logs, you must also do a new full backup.

    This step is necessary because during a restore, the configuration for roll-forward logging is set back to the default, which means that roll-forward logging is turned off and the location is set back to the default. The new full backup is necessary so that you are prepared for any failures that might occur before the next unattended full backup is scheduled to take place.

    For more information about roll-forward logs and their location, see Using Roll-Forward Logs.