How Many Days Can You Survive? I saw a great post from DCAC on disaster plans and using them after a fire in an LA data center. DCAC wasn't affected, and I wouldn't expect them to be. Denny, Joey, Monica, John, Kerry, and Meagan are all experts in running systems efficiently and effectively. They think about how to ensure that things keep working in the event of disasters and have delivered presentations all over the world helping you become better at managing your own servers. However, Denny brings up a great question in the post. How many days could you survive without IT systems? Days? Weeks? Have you ever been in that situation? I'm sure many of us have experienced a failure of some sort. An application crash, hardware dying, or most commonly, Internet access being cut. All of these are small disasters, which typically are fixed quickly. Hopefully, these aren't issues you experience every week. If you do, maybe you need to call DCAC or someone else and fix more fundamental issues. While it's unlikely that you lose all your systems, it could happen if you concentrate your resources in one data center, one region, one cloud provider, etc. Having a single point of failure is something we try to avoid in IT, and that is true not just for one application, but for your infrastructure design. Most of us depend on one authentication system, and a failure of our Active Directory could lock everyone out of a system. Those are rare, and hopefully, your administrators have enough redundancy (and backups) to recover from this type of disaster. I have experienced a few large failures at one enterprise. We had a few viruses, including the SQL Slammer worm in the early 2000s. Our network was shut down for a couple of days, when almost all systems from email to CRM to ticketing systems weren't available. Everyone had to use whatever paper systems they could to keep business running. While we likely lost some revenue from these outages, we learned we could survive a few days without our network. We also learned that we needed better virus scanning and education for employees, as well as a few more resources for tech people. Before those events, everyone assumed an outage was bad, but had no idea how bad. I have no idea how much this cost us, but it didn't appear in a 10-Q, so it must not have been too bad. I think there are lots of businesses that could find ways to continue to work if some systems were down. However, there can be costs, sometimes significant. Even if the company doesn't go out of business, perhaps some people get terminated because of less revenue. That might not be the tech person in the short term, but how would you feel if your DR plan didn't work (or you didn't have one) and some co-workers were let go? The move to the cloud, and the move to more software-as-a-service systems, might help you better survive local disasters, but if you have too many systems concentrated in one place, it is worth preparing for some contingency. After all, even if this fire were to happen in an Azure or AWS data center, it's possible that their process to move and restore all the systems from one data center to another could take time. Your systems might not even be their top priority as cloud vendors have some large customers. It probably won't take months, but I wouldn't want to bet my job on any cloud vendor getting everything moved in less than a week. If you're not in the cloud, make sure you have a plan. If you don't know how to do that, call DCAC or another consultant to help you. Steve Jones - SSC Editor Join the debate, and respond to today's editorial on the forums |