A Fire Took a Cloud Data Center Offline for 12 Hours. What Was the DR Plan?
Last week delivered two uncomfortable reminders that "the cloud" is, physically, a building.
On May 7, a fire at a NorthC data center facility near Amsterdam cut power to IBM Cloud's AMS3 zone. The region stayed degraded for roughly 12 hours. Brittany Ferries — a company that presumably never thought much about Dutch electrical rooms — watched its reservation system go down as collateral damage.
The same week, an availability zone in AWS us-east-1 suffered a thermal event. Overheating in a single zone rippled into multi-hour outages at Coinbase, FanDuel, and CME. One hot room, three household names offline.
The question nobody asks until it's burning
Neither of these was a security failure. Nobody got hacked. The infrastructure simply did what physical infrastructure occasionally does — it failed, completely, in one place.
So the only question that mattered that morning was: what happens to your business when one building goes away for a day?
For companies with a tested disaster recovery plan, the answer was a failover and a status page update. For everyone else, it was a very long day of refreshing someone else's incident dashboard — a position with no moves available.
What an actual answer looks like
- Backups that don't share fate. A backup in the same facility as production protects you from mistakes, not from fires. Off-site replication is the floor, not the ceiling.
- A standby that exists before you need it. Recovery time is dominated by provisioning. If the answer to "where would we restore to?" is "we'd figure it out," your RTO is measured in days.
- A drill, on a calendar. Failover plans that have never been executed are hypotheses. The Amsterdam fire was somebody's first test — don't let your first test be real.
We've spent years building and drilling exactly these plans for businesses that can't afford the long day. The week the building catches fire is a bad week to start.