Thursday, May 11, 2006

Bad Day

My day today :

  • Main production database decided to suck up every ounce of CPU power (from the server it resides on) and grinds the server to a complete halt.

  • At midday 12pm.. !

  • We managed to somehow gracefully shutdown the beast.

  • Manager thinking we might as well reboot the server, asks me to restart the box.

  • The server blue screens (The blue screen of death)...

  • The error msg shows : STOP: Cxxxxxxx unknown hard error

(Thank you to the Engineer at Microsoft for putting in such meaningful error messages).

  • Restart the server again - do recovery operation - cross fingers and pray it comes back up alive.

  • Server is Online

  • Recover database - Start database - database starts to grind server to a halt again - CPU 100%

  • Shutdown database - Set job queues to Zero - Start database

  • Nothing runs when job queues are at Zero - which makes database useless - Business people start calling again

  • Alter job queues to 1/2 of what they were before (so atleast something runs)

  • Everything back Online - 2pm

  • Sit back - breathe - Monitor Server - Start writing Incident report - Wish I had taken a sickie today and just play WoW at home.

I knew I should have stayed at home today. . . :(

