AWS Outage Explained: Why the Internet Broke While You Were Sleeping

The Internet started the week the way many of us would often like to: by refusing to go to work. On Monday morning, an outage at Amazon Web Services left a huge swath of the internet inaccessible, with sites and services including Snapchat, Fortnite, Venmo, PlayStation Network and, predictably, Amazon, down for a short period of time.

The outage began shortly after midnight PT and took Amazon about 3.5 hours to fully resolve. Social media and streaming services were among the more than 1,000 companies affected, and critical services such as online banking were also shut down. This morning you will likely find that most sites and services are operating as normal, but some side effects will likely be noticeable throughout the day.

AWS, a cloud service provider owned by Amazon, powers huge parts of the Internet. So when it failed, many of the services we know and love were affected. As in the case Fast And Crowd strike AWS outages over the past few years show how much of the Internet relies on the same infrastructure, and how quickly our access to the sites and services we rely on can be revoked if something goes wrong. Relying on a small number of large companies to support the network is like putting all your eggs in a handful of tiny baskets.

When it works it's great, but it only takes one small thing to bring the Internet to its knees in a matter of minutes.

How big was the AWS outage?

Just after midnight PDT on October 20, AWS first logged an issue on its service status pagestating that it is “investigating increased error rates and latency for multiple AWS services in the US-EAST-1 region.” Around 2 a.m. PT, the company said it had identified the potential root cause of the problem and within half an hour began implementing mitigation measures that resulted in significant signs of recovery.

“The underlying DNS issue has been fully resolved and most AWS service operations are now running normally,” AWS said at 3:35 a.m. PT. The company did not respond to a request for further comment other than to direct us to return to the AWS Status Dashboard.

Around the same time that AWS said it first started noticing the number of errors, Downdetector saw a sharp increase in the number of error reports across many online services, including banks, airlines and telecom operators. When AWS fixed the issue, some of these reports decreased, while others have yet to return to normal. (Disclosure: Downdetector is owned by the same parent company as CNET, Ziff Davis.)

Around 4 a.m. PT, Reddit was still down, while services like Ring, Verizon, and YouTube were still reporting a significant number of issues. Reddit finally came back online around 4:30 a.m. PT, according to its status page, which was then verified by us.

In total, Downdetector received over 6.5 million reports, with 1.4 million coming from the US, 800,000 from the UK, and the rest mainly scattered across Australia, Japan, the Netherlands, Germany and France. In total, more than 1,000 companies were affected, Downdetector added.

“An outage like this, where a core internet service takes down a large number of online services, only happens a few times a year,” Daniel Ramirez, Downdetector's chief product officer at Ookla, told CNET. “They are likely to become more common as companies are encouraged to rely entirely on cloud services and their data architectures are designed to make the most of a particular cloud platform.”

What caused the AWS outage?

AWS has not shared full information about what caused the Internet to crash this morning. It's likely that now that the fix has been installed, the next step will be to investigate what went wrong.

Until now, the company attributed the failure to a “DNS issue.” DNS stands for Domain Name System and refers to the service that translates human-readable Internet addresses (such as CNET.com) into machine-readable IP addresses that connect browsers to websites.

When a DNS error occurs, the translation process cannot be completed, causing the connection to be interrupted. DNS errors are common and a common obstacle on the Internet, but usually occur on a small scale and affect individual sites or services. But because AWS usage is so widespread, a DNS error can have equally widespread consequences.

According to Amazon, the problem is geographically located in the US-EAST-1 region, which refers to the Northern Virginia area where many of its data centers are located. It's an important location for Amazon, as well as many other internet companies, and supports services spanning the US and Europe.

“The lesson here is sustainability,” said Luke Kehoe, industry analyst at Ookla. “Many organizations still concentrate mission-critical workloads in a single cloud region. Distributing mission-critical applications and data across multiple regions and availability zones can significantly reduce the radius of future incidents.”

Was the AWS outage caused by a cyber attack?

DNS issues could be caused by attackers, but at this stage there is no evidence that this is related to the AWS outage.

However, technical glitches could give hackers the opportunity to seek out and exploit vulnerabilities when companies' backs are turned and security is weakened, according to Marijus Briedis, the company's chief technology officer. NordVPN. “This is a cybersecurity issue, not just a technical one,” he said in a statement. “True online security is not just about keeping hackers out, but about keeping you connected and protected if systems fail.”

In the coming hours, people should be wary of scammers hoping to take advantage of people's knowledge of the power outage, Briedis added. You should be especially wary of phishing attacks and emails asking you to change your password to protect your account.

Leave a Comment