2 minute read
The dangers of big cloud providers
Dystopian fiction focuses on large technology companies and AI systems taking over the world. The reality is that they are taking it down.
As more of the planet is digitized, and those digital services are put onto a small number of cloud providers, it’s a recipe for disaster. And it’s only getting worse.
This December, the world’s largest cloud provider, Amazon Web Services suffered two major outages, across three different regions.
Considering the vast number of companies that rely on AWS, it’s not clear just how many companies and crucial systems were impacted. Recreational services like Netflix, Call of Duty, and Tinder were all brought down, but far more critical networks could have also been affected.
Among those caught up by the outages was Amazon’s own logistics network. The $1.7 trillion company will be able to survive the holiday shopping gridlock, but its employees will be ultimately the ones who will have to suffer.
Not only were they offered unpaid time off during the outage, but they will be the ones forced to work overtime to catch up on the backlog. Amazon Flex drivers, its contract delivery workers, also were told their pay may come late due to the outage.
At such a large scale, with so many caught up in the outage, it’s hard to know just how many people will have suffered because of this.
As more services digitally transform, it will only get worse. Last year, a patient died after a ransomware attack took down a hospital’s IT system, delaying operations. It won’t be long before an AWS outage could take out an entire city’s hospitals.
There is also little hope that IT administrators and CIOs will take the risk seriously. Despite running the cloud itself, even Amazon failed to prepare properly.
Its status page was tied to a region, rendering it useless during the first outage. Its logistics network and a multitude of internal programs did not follow best practices, and collapsed when only one region went down. If Amazon doesn’t use AWS properly, how can we expect others to?
Equally, don’t expect AWS and others to simply eliminate outages. Human error and device failures will always happen. As AWS gets bigger, the complexity of keeping it running is spiraling out of control. During December’s outage, staff joined a 600-person phone call where they speculated about external attacks and other nefarious activities.
“The more I read AWS’s analysis of the useast-1 outage, the less confident I find myself in my understanding of failure modes and blast radii,” Duckbill Group’s Corey Quinn said on Twitter. “It’s not at all clear that AWS is fully aware of them, either.”
Equally, don’t expect multi-cloud to necessarily help. “The idea of being in multiple clouds for resilience is a red herring,” Quinn said. “They end up going down three times as often because they now have exposure to everyone’s outages, not just AWS’s.”
So, we’re living in a world where more critical services are hosted on fewer cloud providers, which are themselves getting unmanageably complex. This means dangerous outages are inevitable.
We can do our best to mitigate and prepare by following best practices, but we should be ready for them to happen nonetheless. And, if they do, we should at least make sure the workers at the bottom are paid.