5 Lessons Learned from June 29 2012 AWS Outage
Discussing a difficult situation is never fun, and I have been wrestling with how to start this post. It’s about revealing unpleasant cloud truths. And not necessarily the truths you might be expecting to hear. I am not here to preach, but my message to you is important. For the past five years I have been working on a project that uses the cloud to it’s fullest potential, celebrating the victories and learning from the defeats.
I’m speaking to my fellow Amazon cloud citizens. My co-tenants, if you will, in the “Big House of Amazon.” We’re all living together in this man-created universe with its own version of “Newtonian Laws” and “Adam Smith” economics. 99.99% of the time all is well… until out of the blue it’s not, and chaos upends polite cloud society.
If you lost data or sustained painful hours of application downtime during Amazon’s June 29 US-East outage, then you can only wag your finger in blame while looking in the mirror.
I know, I know, the cloud is supposed to be cheap AND reliable. We’ve been telling ourselves that since 2007. But this latest outage is an important wake up call: we’re living in a false cloud reality.
Lesson 1: Follow the Cloud Rules
Up front, you were told the “rules of the cloud”:
- Expect failure on every transaction
- Backup or replicate your data to other intra-cloud locations
- Buy an “insurance policy” for worst case scenarios
These rules fly against the popular notion that the cloud is “cheaper” than do-it-yourself hosting.
There is a silver lining to this dark cloud event. Everyone in the cloud will learn and improve so we don’t have to repeat this episode ever again.