This is How We Handle Problems

I had a production issue tonight. Still am, actually. I’ve admitted to it and here’s the email I’m sending to my management.

At 9:00PM I took a backup of database_a and database_b prior to running the database migration scripts. Once the backups were finished, I began the migration process at approximately 9:20. I stopped the migration process at 10:15 after multiple failures and restarts. There are too many unknown cross-dependencies to go on with the roll forward. At this time I called SUPPORT PERSON and explained the situation. I also called MANAGEMENT and left a voice mail. I then began the process of restoring the production databases on SERVER_A. No changes were made to SERVER_FIGHTING_MONGOOSE or SERVER_C. Once I had restored database_b and database_a on SERVER_A, I began seeing multiple failures from replication and the rest of SQL Server indicating severe problems with the physical disk structure. I immediately stopped all replication involving database_a and database_b on SERVER_A and I have begun a physical drive integrity check using SQL Server’s built-in integrity check tool: DBCC CHECKDB. The CHECKDB for database_b finished at 11:15 with a clean bill of health. database_a is still running as of 11:31PM. Once the CHECKDB process for database_a is complete, I will begin re-initialize the subscription for the database_a database on SERVER_A. Following the successful completion of the database_a re-initialization, I will begin the process of re-initializing the subscription to database_b. If you have any questions, feel free to contact me at 867-5309.

See what I did there?

  • I stated how we got into this mess – I dropped a running chainsaw into the SAN.
  • I outlined my decision making process and took ownership of rolling back our production migration.
  • I described the situation after the migration was rolled back and provided an assessment based on what I had observed.
  • I outlined a course of action to mitigate our problems and restore our production database to an operational state as soon as possible.

Am I proud? Not really. I like it when things work. Am I tired and cranky? Yes. Will I get this fixed before I go to bed? Hell yeah. Is this something that I, in a sick way, live for? Only because it reminds me to keep studying and to stay on my toes.