As Adam Savage, the unbereted half of the A-team Mythbusters, is fond of noting, failure is always an option, whether you’re talking meat-fueled cannons, head-on tractor-trailer collisions, or Amazon Web Services. Moreover, failure is inevitable. Desirable, even. But MBTF increases — certainly desirable, though not inevitable — are to be expected, as long as you take the time to learn from your failures. And this one, having the widespread effects that it did and coming as early in the game as it did, was especially instructive.
So, what did we learn from the recent AWS failure?
- James Cameron is psychic, the architect of the AWS outage, or a robot (pardon – cybernetic organism) puppeted by Skynet, which gets a kick out of telegraphing the exact date of our destruction and laughing up its virtual sleeve as we proceed to blithely ignore it. That, or Coincidence just can’t help making a joke now and then.
- Running in multiple availability zones (AZs) in the same region won’t protect you when the entire region itself experiences difficulties. And since AZs are not isolated, problems can propagate from one to another.
- Elastic Block Store (EBS) and Relational Database Service (RDS) are not covered by Amazon’s SLA. EC2 connectivity is, but that wasn’t a problem in this case.
- This problem is not unique to the cloud. Any application running in any single data center, whether yours or someone else’s, is downtime waiting to happen.
- Multisite redundancy (all your eggs replicated in multiple baskets located in multiple physical locations) is a good thing, but it’s a relatively expensive thing.
- You have to quantify the cost of being down for a given length of time and compare it to the cost of spreading your service over multiple physical regions. The AWS outage gives us all an incentive to figure this out and a real-world case we can plug the numbers into.
- What doesn’t kill us makes us stronger.
- Test your failover responses so that you don’t get killed by the next outage. Because it will come. After all, it’s always an option.