Here you see our server room during a power outage caused by a malfunctioning transformer on a pole down the street. Kentucky Utilities told us that we lost two phases of the source coming into campus so we were seeing some things work, others not. In the server room, one large backup power supply lost power, one didn’t. So, most devices with dual power supplies were still running.

Anticipating the possibility of a total loss of power, we walked through our emergency procedures to cleanly shut down as many systems as we could. It turned out to be a nice review of our procedures as well as a good exercise for several members of the IT staff to walk through the steps and discuss the scenario. Did I mention that this outage started at 4:30pm on a Friday? Power was restored later that night and we had everything back up and running around 10pm.
I can readily summarize the potential disasters from our outage because we had just updated our emergency procedures documentation the week before (more on this later):
- At best, we’ll get about 15-20 minutes of uptime during a total power outage. Anything that does not automatically shut itself down when power loss is detected will go down hard, and we have plenty of things that will suffer that fate.
- We didn’t have a printed copy of our emergency procedures on hand and we wouldn’t have been able to easily see them, anyway.
- From the dark picture, you can’t see the worst problem of all – we have water sprinkler heads in our server room just like any other part of the building. Should we ever have a fire in the building that sets off the system, we are going to be in a world of hurt.
At least there are a few points that make me feel a little better (if only a little):
- We have emergency procedures documented and we know which systems are a priority to shut down cleanly.
- We have well maintained backups of all systems that are not stored in the same building as the systems themselves.
- Our administration is aware of our procedures and the precarious nature of our environment. We hope to make changes in the future as we grow our campus (more on this later, too).
- We turned a long work day into an excuse to grab pizza and beer on a Friday night.
Now, for the “more on this later” part. Just one week before, we had updated our emergency documents because I was going to present a summary of our vulnerabilities to my fellow cabinet members. Our board of trustees had recently formed a technology committee and I had given some folks a tour of the facilities and talked through our issues, which had generated a lot of questions.
I thought it was a good idea to make sure that all of our senior administrators had a grasp on exactly what kind of downtime we’d be looking at after a disaster. After all, several of us are actively involved in planning the future of our campus. What better way to solve our problem than to have everyone thinking about how a new building would make a great place for a data center with chemical fire suppression, diesel generator facilities and no bathrooms overhead!
I had just presented this information to the cabinet three days prior on Tuesday.
On top of that, earlier on the very Friday of the outage I had visited some employees of the city’s emergency planning department. They had offered to give us some emergency alert beacons that we could place in strategic locations on campus, so I was meeting with them to discuss the particulars. We met in their emergency command post area that you see below. Ironically, we talked about how this area has no backup power sources whatsoever. We grumbled about budgets and planning and how everything is acceptable on paper right up until the time it breaks. If only I had known what was coming later that day!

To round it out, apparently Kentucky Utilities has no way to detect an outage like ours. They only found out about it once we started calling.
To be sure, we are not alone in our situation. I’ve seen many places that have no regular backups or that backup to a disk sitting right on top of the system in the same room. I’ve seen all manner of technology shops with no disaster plans or procedures to speak of. And on this particular Friday, I got a glimpse of what is going to happen to the city’s emergency planning headquarters when the power goes out.
But we can’t pat ourselves on the back for having properly documented emergency procedures while we continue to have sprinkler heads in our server rooms. It is much too easy to say “well, we’ve lived with it all this time…” or “at least we’ve got good backups” and feel like we’ve got it covered. You have to make the problem a priority and address it in some way if you are ever going to fix it. Maybe that should be our disaster recovery documentation.