Facebook blames 'faulty configuration change' for nearly six-hour outage across apps

Facebook and WhatsApp, along with Instagram, went dark at around midnight on Monday Singapore time.
Facebook and WhatsApp, along with Instagram, went dark at around midnight on Monday Singapore time.PHOTO: AFP

WASHINGTON (REUTERS, BLOOMBERG, NYTIMES) - Facebook Inc blamed a “faulty configuration change” for a nearly six-hour outage on Monday (Oct 4) that prevented the company’s 3.5 billion users from accessing its social media and messaging services such as WhatsApp, Instagram and Messenger.

The company in a late Monday blog post did not specify who executed the configuration change and whether it was planned.

Several Facebook employees who declined to be named had earlier said that they believed that the outage was caused by an internal mistake in how Internet traffic is routed to its systems.

The failures of internal communication tools and other resources that depend on that same network in order to work compounded the error, the employees said. Security experts have said an inadvertent mistake or sabotage by an insider were both plausible.

“We want to make clear at this time we believe the root cause of this outage was a faulty configuration change,” Facebook said in the blog.

The Facebook outage is the largest ever tracked by web monitoring group Downdetector.

The outage was the second blow to the social media giant in as many days after a whistle-blower on Sunday accused the company of repeatedly prioritising profit over clamping down on hate speech and misinformation.

As the world flocked to competing apps such as Twitter and TikTok, shares of Facebook fell 4.9 per cent, their biggest daily drop since last November, amid a broader sell-off in technology stocks on Monday. Shares rose about 0.5 per cent in after-hours trade following resumption of service.

“To every small and large business, family, and individual who depends on us, I’m sorry,” Facebook chief technology officer Mike Schroepfer tweeted earlier in the day, adding that it “may take some time to get to 100 per cent”.

“Facebook basically locked its keys in its car,” tweeted Professor Jonathan Zittrain, director of Harvard’s Berkman Klein Centre for Internet & Society.

Far-reaching impact 

The outage showcased just how dependent the world has become on a company that is under intense scrutiny. Although it lasted just some six hours, the impact of the outage was far-reaching and severe. 

Facebook has built itself into a linchpin platform with messaging, live streaming, virtual reality and many other digital services. In some countries, such as Myanmar and India, Facebook is synonymous with the Internet.

More than 3.5 billion people around the world use Facebook, Instagram, Messenger and WhatsApp to communicate with friends and family, distribute political messaging, and expand their businesses through advertising and outreach.

Facebook is used to sign in to many other apps and services, leading to unexpected domino effects such as people not being able to log into shopping websites or sign into their smart TVs, thermostats and other Internet-connected devices.

“Today’s outage brought our reliance on Facebook – and its properties like WhatsApp and Instagram – into sharp relief,” said Cornell University communications professor Brooke Erin Duffy. “The abruptness of today’s outage highlights the staggering level of precarity that structures our increasingly digitally mediated work economy.”

Technology outages are not uncommon, but to have so many apps go dark from the world’s largest social media company at the same time was highly unusual. Facebook’s last significant outage was in 2019, when a technical error affected its sites for 24 hours, in a reminder that a snafu can cripple even the most powerful Internet companies.

Facebook, which is the world’s largest seller of online ads after Google, was losing about US$545,000 (S$740,000) in United States ad revenue per hour during the outage, according to estimates from ad measurement firm Standard Media Index.

Past downtime at Internet companies has had little long-term effect on their revenue growth, however.

Twitter on Monday reported higher-than-normal usage, which led to some issues in people accessing posts and direct messages.

In one of the day’s most popular tweets, video-streaming company Netflix shared a meme from its new hit show Squid Game captioned “When Instagram & Facebook are down”, that showed a person labelled “Twitter” holding up a character on the verge of falling labelled “everyone”.

Inside a Facebook group for ad buyers, one member wisecracked after service returned that “lots of people searched today ‘how to run google ads for clients’ ”.

Workers scrambling 

Facebook’s services, including consumer apps such as Instagram, workplace tools it sells to businesses and internal programs, went dark at 1600 GMT (Monday midnight, Singapore time). Access started to return at around 5.45am Singapore time.

Soon after the outage started, Facebook acknowledged users were having trouble accessing its apps but did not provide any specifics about the nature of the problem or say how many users were affected.

The error message on Facebook’s webpage suggested an error in the Domain Name System (DNS), which allows Web addresses to take users to their destinations. A similar outage at cloud company Akamai Technologies took down multiple websites in July.

Inside Facebook, workers also scrambled because their internal systems stopped functioning. The company’s global security team “was notified of a system outage affecting all Facebook internal systems and tools”, according to an internal memo sent to employees. Those tools included security systems, an internal calendar and scheduling tools, the memo said.

Employees said they had trouble making calls from work-issued cellphones and receiving e-mails from people outside the company. Facebook’s internal communications platform, Workplace, was also taken out, leaving many unable to do their jobs. Some turned to other platforms to communicate, including LinkedIn and Zoom as well as Discord chat rooms.

Some Facebook employees who had returned to working in the office were also unable to enter buildings and conference rooms because their digital badges stopped working. Security engineers said they were hampered from assessing the outage because they could not get to server areas.

Facebook’s global security operations centre determined the outage was “a HIGH risk to the People, MODERATE risk to Assets and a HIGH risk to the Reputation of Facebook”, the company memo said.

A small team of employees was soon dispatched to Facebook’s Santa Clara, California, data centre to try a “manual reset” of the company’s servers, according to an internal memo.

Several Facebook workers called the outage the equivalent of a “snow day”, a sentiment that was publicly echoed by Instagram head Adam Mosseri.

Protocol failure

The Facebook outages on Monday occurred because of a problem in the company's domain name system, a relatively unknown - at least to the masses - but crucial component of the Internet.

Commonly known as DNS, it is like a phone book for the Internet. It is the tool that converts a Web domain, like Facebook.com, into the actual Internet protocol, or IP, address where the site resides. Think of Facebook.com as the person one might look up in the white pages, and the IP address as the physical address they will find.

When a DNS error occurs, that makes turning Facebook.com into a user's profile page impossible. That is apparently what happened inside Facebook - but at a scale that has temporarily crippled the entire Facebook ecosystem.

The problem at Facebook appears to have its origins in the Border Gateway Protocol, or BGP. If DNS is the Internet's phone book, BGP is its postal service. When a user enters data in the Internet, BGP determines the best available paths that data could travel.

Minutes before Facebook's platforms stopped loading, public records show that a large number of changes were made to Facebook's BGP routes, according to Cloudflare's chief technology officer John Graham-Cumming in a tweet.

The outage added to Facebook’s mounting difficulties. For weeks, the company has been under fire related to a whistleblower, Ms Frances Haugen, a former Facebook product manager who amassed thousands of pages of internal research.

She has since distributed the cache to the news media, lawmakers and regulators, revealing that Facebook knew of many harms that its services were causing, including that Instagram made teenage girls feel worse about themselves.

The revelations have prompted an outcry among regulators, lawmakers and the public. Ms Haugen, who revealed her identity Sunday online and on “60 Minutes”, is scheduled to testify on Tuesday in Congress about Facebook’s impact on young users.