Amazon says 'unexpected behaviour' caused huge cloud outage

Some experts said the explanation does not help users fully understand what went wrong. PHOTO: REUTERS

NEW YORK (BLOOMBERG) - Amazon.com said automated processes in its cloud computing business caused cascading outages across the Internet this week, affecting everything from Disney amusement parks and Netflix videos to robot vacuums and Adele ticket sales.

In a statement Friday (Dec 10), Amazon said the problem began Dec 7 when an automated computer program - designed to make its network more reliable - ended up causing a "large number" of its systems to unexpectedly behave strangely. That, in turn, created a surge of activity on Amazon's networks, ultimately preventing users from accessing some of its cloud services.

"Basically, a bad piece of code was executed automatically and it caused a snowball effect," Forrester analyst Brent Ellis said. The outage persisted "because their internal controls and monitoring systems were taken offline by the storm of traffic caused by the original problem".

Amazon explained the failure in a highly technical statement posted online. The problems began about 10.30am New York time on Dec 7 and lasted several hours before Amazon managed to fix the problem.

In the meantime, social media lit up with complaints from consumers angered that their smart home gadgetry and other internet-connected services had suddenly ceased to work.

Some experts said the explanation does not help users fully understand what went wrong.

"They don't explain what this unexpected behaviour was and they didn't know what it was. So they were guessing when trying to fix it, which is why it took so long," said Mr Corey Quinn, cloud economist at Duckbill Group.

AWS is generally a reliable service. Amazon's cloud division last suffered a major incident in 2017, when an employee accidentally turned off more servers than intended during repairs of a billing system.

Still, the latest outage reminded the world how many products and services are centralised in common data centres run by just a handful of big tech companies like Amazon, Microsoft and Alphabet's Google.

There is no easy fix to the problem. Some analysts believe companies should duplicate their services across multiple cloud computing providers so no one crash puts them out of commission.

Others say a "multi-cloud" strategy would be impractical and could make companies even more vulnerable because they would be exposed to everyone's outages, not just AWS'.

"We know this event impacted many customers in significant ways," the company said in the jargon-filled statement. "We will do everything we can to learn from this event and use it to improve our availability even further."

Join ST's Telegram channel and get the latest breaking news delivered to you.