 
                                                                                How a rare software bug triggered the global AWS outage
What's the story
Amazon Web Services (AWS) recently suffered a major outage, affecting numerous websites and online services worldwide. The issue stemmed from a rare software bug in one of Amazon's most critical systems, not a hardware failure or external attack. The glitch was triggered by a "faulty automation" within the company's internal systems that caused two independent programs to race each other while updating records.
Impact
Automation error led to the deletion of critical network entries
The automation error led to the deletion of critical network entries for AWS's DynamoDB database service. This triggered a domino effect, temporarily disrupting several other AWS tools. The company has since disabled the faulty automation worldwide and plans to fix the bug before reactivating it. AWS also intends to implement new safety checks and improve system recovery times in case of similar incidents in future.
Apology
Amazon apologizes for disruption
In light of the outage, Amazon has apologized and acknowledged the widespread disruption it caused. "While we have a strong track record of operating our services with the highest levels of availability, we know how critical our services are to our customers, their applications and end users, and their businesses," said the company. It assured that it would learn from this incident to prevent future occurrences.
Centralization concerns
Outage highlights vulnerability of internet infrastructure
The recent AWS outage has raised concerns over the internet's reliance on centralized infrastructure. Many web apps and services, including end-to-end encrypted messenger Signal, faced temporary downtimes due to the incident. Even crypto platforms like Coinbase and Robinhood were inaccessible for a while. More alarmingly, entire blockchain networks went offline as most nodes were running on AWS.