My day today :
- Main production database decided to suck up every ounce of CPU power (from the server it resides on) and grinds the server to a complete halt.
- At midday 12pm.. !
- We managed to somehow gracefully shutdown the beast.
- Manager thinking we might as well reboot the server, asks me to restart the box.
- The server blue screens (The blue screen of death)...
- The error msg shows : STOP: Cxxxxxxx unknown hard error
(Thank you to the Engineer at Microsoft for putting in such meaningful error messages).
- Restart the server again - do recovery operation - cross fingers and pray it comes back up alive.
- Server is Online
- Recover database - Start database - database starts to grind server to a halt again - CPU 100%
- Shutdown database - Set job queues to Zero - Start database
- Nothing runs when job queues are at Zero - which makes database useless - Business people start calling again
- Alter job queues to 1/2 of what they were before (so atleast something runs)
- Everything back Online - 2pm
- Sit back - breathe - Monitor Server - Start writing Incident report - Wish I had taken a sickie today and just play WoW at home.
I knew I should have stayed at home today. . . :(
No comments:
Post a Comment