IBM Support

About stale restorereplica oplogs

Question & Answer


Question

What is a []stale[] restorereplica oplog and how can it impact the IBM® Rational® ClearCase MultiSite® restore process and synchronizations?

Answer





The MultiSite restorereplica process involves a sequence of oplogs starting with a restorereplica and ending with restore_complete.


The restore_complete oplog is generated when all sites have responded to the restoring site, or another restorereplica command is run using the -override flag (in ClearCase 2002.05.00 and later) or the -complete flag (for versions earlier than 2002.05.00).

Review technote 1131381 for more information about the MultiSite restorereplica process.

A restorereplica oplog is stale when the restoring site loses its restorereplica oplog.

This can occur due to one of the following reasons:
  • The restorereplica command is run again at the site, without the -replace option.

  • The site overwrites the VOB storage (restoring from backup again).

  • The site removes the VOB storage (replaces the replica instead)
An oplog cannot be "stale" at the originating site.

Note: It is impossible to generate a restore_complete oplog for a stale restorereplica oplog.
Syncreplica packets generated from a replica with a stale restorereplica oplog may be rejected at the restoring site with the error:

multitool error Replica incarnation for "<replica-name>" is old: <old-timestamp>, should be <new-timestamp>.

Review technote 1151039 for more information about incarnation errors.

Identifying a stale restorereplica oplog (click to expand)
A stale restorereplica oplog will cause performance issues (export can run slowly) until it is removed. If restorereplica has recently been run on a replica within a family that is now experiencing syncreplica performance issues or may be generating excessively large packets unexpectedly, this may be a result of a stale restorereplica oplog.

Either of the following 2 procedures can be used to help identify a potentially stale restorereplica oplog:
  • Manually dump the restorereplica oplog
    A manual dump of the oplogs can be used to help verify the presence of the restorereplica oplog.

    Run the following command once for each replica in the family including .deleted replicas. Start with the replica that is suspected to have a stale restorereplica oplog.
    A full list of replicas including .deleted replicas can be found by running multitool lsepoch <local replica >

    multitool dumpoplog -invob <vobtag> -vreplica <replica> -long -name -from 0xFFFFFFFF

    If a restorereplica oplog is present, it will return the full oplog, if it is not, nothing will be returned.
  • Run the show_restores.pl script
    You can run the show_restores.pl script (attached below) which will output details about existing restorereplica oplogs. Refer to the readme file (attached below) for instructions on running the script.

You should send Rational Client Support the above output along with the output of a cleartool -ver and operating system information from the VOB server where it was run for further investigation.


Show_restores perl script:
show_restores.pl

Readme instructions for show_restores script:
show_restores.html

Removing a stale restorereplica oplog (click to expand)
  • The restorereplica oplogs can be removed with vob_scrubber.
  • The dumped oplog will show information like

    op= restorereplica

    replica_oid= d2a1259a.7a194c48.bcd1.6f:e6:ee:f4:1b:e2 (rep-1)

    oplog_id= 4294967295

    op_time= 25-Jul-07.19:33:04UTC  create_time= 25-Jul-07.19:37:18UTC

    If the "create_time" is old, then calculate the age in days and adjust "oplog -keep" line of vob_scrubber_params to a number that is less than the age of the stale oplog.

    Caution: This procedure will remove all oplogs older than the specified age based on their create_time. IBM Rational recommends keeping oplogs up to the age of your oldest VOB backup. If the stale oplog is too new for safe removal by scrubbing, use the procedure below or contact IBM Rational Client Support."

  • Where the originating replica still exists, run the complete restorereplica procedure again and ensure that a syncreplica packet containing the replacement restorereplica oplog is sent to all sites. They will import the restorereplica oplog with the new date to replace the stale one.

Note: The replica need not be restored from backup again, however, the original restore operation should have been completed already.
  1. Run restorereplica with optimization: multitool restorereplica <one available replica>
    Synchronize to all replicas in the family with the restorereplica oplog being careful to import the restorereplica oplog before the restore complete oplog created in step 3.
    Review the Optimizing the restoration process information in the ClearCase MultiSite Administrators Guide under the topic of restorereplica (cleartool man restorereplica) for more information.
  2. Create and send an update packet to the restoring replica from the replica used as the argument in step 1. Import the packet at the restoring replica.
    Restoration should now be complete.
  3. Send a packet to all replicas in the family with the above restore complete oplog.

This will generate a new restorereplica oplog and overwrite any of the existing ones at the remote replicas but will only require an update from a single replica.
The subsequent "restoration complete" oplog should remove the newly created restorereplica oplog

If a stale restorereplica is present and you are unable to remove it with the options above, you should send the output collected in the previous section along with the output of a cleartool -ver and operating system information from the VOB server(s) to Rational Client Support for a tool to remove the stale oplog.

Note: If the VOB storage of a replica has been removed (or is otherwise unrecoverable), obtain advice through Rational Client Support on how best to remove its stale restorereplica.

[{"Product":{"code":"SSSH27","label":"Rational ClearCase"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Restore Replica","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF015","label":"IRIX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"2002.05.00;2003.06.00;7.0","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}},{"Product":{"code":"SSSH27","label":"Rational ClearCase"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Oplogs","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
16 June 2018

UID

swg21131690