Hurricane Matthew Should Remind You to Check Your DR/BC Plans

The news is full of tragedy from Hurricane Matthew at the moment, and our heart goes out to those being impacted by the storm and its aftermath.

This storm is a powerful hit on much of the South East US, and should serve as a poignant reminder to practice, review and triple check your organization’s DR and BC plans. You should have a process and procedure review yearly, with an update at least quarterly and anytime major changes to your operations or environment occur. Most organization’s seem to practice these events on a quarterly or at least 2x per year cycle. They often use a full test once a year, and table top exercises for the others. 

This seems to be an effective cycle and approach. 

We hope that everyone stays safe from the hurricane and we are hoping for minimal impacts, but we also hope that organizations take a look at their plans and give them a once over. You never know when you just might need to be better prepared.

Got Disaster Recovery?

As the recent heavy storms in the Midwest has brought to my attention in a personal way — even the best laid plans can have weaknesses. In my case, it was an inconvenience, but a good lesson.

I got a reminder about cascading failures in complex systems via the AT&T data network collapse (thanks to a crushed datacenter), as well as a frontline wake-up call about the importance of calculating generator gasoline supplies properly. 

So, while you read this, I am probably out adding 30 gallons to my reserve. Plus, working on a “lessons learned” document with my family to more easily remember the things we continually have to re-invent every time there is a power outage of any duration. 

I share with you these personal lessons for a couple of reasons. First, I hope you’ll take a few moments and update/review your own personal home plans for emergencies. I hope you’ll never need them, but knowing how to handle the basics is a good thing. Then move on to how you’ll manage trivialities of personal comfort like bandwidth, coffee & beer. 🙂

Lastly, I hope you take time and review your company’s DR/BC plans as well. Now might be a good time to do exactly what I hope AT&T, Amazon, Netflix, Instagram, etc. are doing and get those plans back in line with attention to the idea that failures can and often do, cascade. This wasn’t an earthquake, tsunami or hurricane (though we did have 80+ mph winds) – it was a thunderstorm. Albeit, a big thunderstorm, but a thunderstorm nonetheless. We can do better. We should expect better. I hope we all will get better at such planning. 

As always. thanks for reading and until next time, stay safe out there. 

PS – The outpouring of personal kindness and support from friends, acquaintances and family members has been amazing. Thank you so much to all of the wonderful folks who offered to help. You are all spectacular! Thank you!

Are Your Disaster Recovery Plans Ready For A Disaster?

One Data center just found out that theirs wasn’t, and a lot of their customers were also caught with no backup servers, only relying on the Data center’s disaster recovery. On Saturday ThePlanet Data center experienced an explosion in their power room that knocked approximately 9,000 servers offline, effecting over 7,500 customers. ThePlanet was unable to get power back on to those servers for over a day, due to the fire department not letting them turn the backup power on.

Two separate issues can be seen from this, one, the Data center’s disaster recovery plan failed to recover them from a disaster. While quite unlikely to happen, an explosion in the power room can happen, as seen here, and they were not prepared for it. Perhaps they could have worked with the fire department during the disaster recovery policy creation to identify ways that backup power could be served while the power room was down. Or possibly with 5 Data centers (as ThePlanet has) they could have had spare hot servers at the other sites to send backups to. We don’t know the details of their policy or exactly what happened yet, so we can only speculate ways that the downtime could have been prevented.

Secondly, many customers found out the hard way to not rely on someone else’s disaster recovery plans. These sites could have failed over to a site at another Data center, or even a backup at their own site, but they weren’t prepared, assuming that nothing could happen to the Data center their server is at.

The lesson learned from this mistake is that disasters happen, and you need to be prepared. No disaster scenario should be ignored just because “it’s not likely to happen”. So take a look at your plans, and if you host at a Data center, if your website is critical make sure there is a backup at a separate Data center or on your own site.