Here's what caused the longest Facebook outage last night
Late last night, popular social media platform Facebook and all its subsidiaries, including WhatsApp, Messenger, and Instagram, suffered a global outage that lasted nearly six hours. In the wake of the outage, alongside the memes, reports have surfaced suggesting it was caused by something known as Border Gateway Protocol (BGP) routing. Here's why it caused the outage.
Outage took down Facebook's internal tools, third-party login services too
The Facebook service outage last night took down all Facebook-owned platforms and services, including its internal webpages for employees (confirmed by tipster Jane Manchun Wong) and login services for third-party apps and platforms (confirmed by Pokémon GO developer Niantic). Soon after the outage, Facebook, Instagram, and WhatsApp separately tweeted acknowledgments of the outage. Facebook attempted to vaguely explain the cause via a blog post.
Initially, DNS issues were found to cause it
Initial speculation suggested the issue was caused by a Domain Name System (DNS) issue. This is because the DNS is like the internet's phonebook. It links URLs and hostnames in your browser's address bar to the correct IP address where the website/webpage is hosted. However, experts suggested that DNS is just the symptom and the underlying issue is BGP routing.
Later, experts clarified misconfiguration could've caused BGP routing to fail
The BGP routing system gives information the correct routes to take on the data superhighway. Due to a misconfiguration, it appeared as though the BGP routes vanished, sending all the queries reaching Facebook services into a bottomless abyss. WIRED reported that the BGP misconfiguration appears to have started at Facebook's end, bringing all its services down. The subsequent Facebook statement vaguely corroborated this theory.
Facebook's statement said data centers couldn't communicate with each other
Facebook's official statement in the aftermath of the outage said the issue was caused by "configuration changes on the backbone routers that coordinate network traffic between our data centers." This reportedly set off a chain reaction affecting how the data centers communicate, bringing services to a grinding halt. Facebook added that the inability to use outage-stricken internal tools further complicated diagnosis and rectification operations.
Google didn't miss the opportunity for some tongue-in-cheek humor
What do we do now? Gmail?— Google UK (@GoogleUK) October 4, 2021