When is an exact copy not an exact copy?

A client was performing an upgrade on their core Oracle databases from 9.2 to 11.2. In the days before the upgrade, I had built a clone of the Production environment including its standby databases to validate the upgrade steps. When the day came, the upgrade steps proceeded smoothly, however, after the upgrade, 2 out of the 5 standby databases started reporting unusual Oracle error messages when querying their status to confirm they were in sync. In all other facets, the standby databases appeared normal.

After a few hours debating whether we could or could not safely go live with the new system, Oracle narrowed the root cause down to some unusual entries in the standby database control files. The Production standby databases had been up and running for almost two years by this stage, accumulating entries in the control files. These entries in the control files were cleared out as a side effect of the cloning process and never had the opportunity to cause a problem during the trial upgrade. Thankfully, the offending records were aged out of the Oracle control file after two weeks and the failing function started working correctly.

The lesson I drew from this was that even with exact copies, you can’t control every variable. Production is always unique and a little bit of extra time in the implementation window can be a lifesaver.

The 'why' is almost always more interesting than the 'what'