Everything Fails

Everything is horrible! Wait, that’s not the message I want to send at all. [caption id=“attachment_972” align=“alignright” width=“300”]I wonder if these guys have an HA/DR plan… I wonder if these guys have an HA/DR plan…[/caption]

Planning for Failure Should Be Comprehensive

Think about the last time you thought about high availability and disaster recovery… You’re lying, nobody ever thinks about HA and DR. Not until something is already on fire, at least. Now, pretending you did think about HA and DR at some point in the distant past, how far down the rabbit hole did you go? Were there two servers? Did each server have redundant NICs? Power supplies? Were you using RAID? Did you think about the UPS? Every component in the system needs to be considered when you’re looking into HA and DR. Using an AlwaysOn Availability Group, clustering, or database mirroring isn’t enough - there’s more to it.

Failure Has Consequences

Let’s use a specific example instead of talking in the abstract. We’ll assume that you’ve decided those super fast consumer grade SSDs are the way to go. You’ve planned the rest of your deployment. You’ve got an AlwaysOn Availability Group. You’re ready to go. Right? There’s still one more thing to talk about - power. See, most of those consumer grade SSDs don’t have any kind of battery in them. And, as you might know, disks lie. So we can’t really be sure if our writes are actually permanently stored somewhere unless we safely shut down the computer. Which always happens when the power goes out, right? In this particular case, we need to keep worrying about power - what happens if the power fails? Is this server connected to a UPS? What happens when the UPS kicks in? Is there a backup generator? Will the server stay on? Can the server be automatically shut down? What’s that look like instead?

Ask Awful Questions

Being prepared has everything to do with asking yourself terrible questions. Work through the entire stack and come up with as many ways for things to fail as you can. Explore how you’d prevent these scenarios. You can’t provide a mitigation for _everything _that you come up with, but it’s good to think of these things. Once you’ve got your List of Awfulness, work the feasible things into your HA and DR plans. Make sure that you’re covered as best as you can. Sometimes it makes sense to sweat the small stuff.

  “Explosión” by kinojam is licensed under CC BY-NC-SA 2.0