Around 9:45 a.m. PST, we detected anomalies in our site publishing. We discovered that Amazon AWS S3 service was failing with errors, and the Webflow app started seeing issues uploading new files to S3. This caused issues with file uploads and site snapshots, as we rely on AWS S3 as a file system for user sites.
Around 10:15 a.m. PST, AWS reported elevated S3 error rates, and it was not until 1:33 p.m. PST that S3 started working more predictably.
This was a prolonged outage that prevented thousands of the largest sites on the internet from working properly. Sites hosted with Webflow that were recently published may have failed to load properly, or would have showed 504 errors. The Webflow Designer also failed to load properly for many users, as we use S3 to save and load backups of sites. Many sites were still operating without issue, but that was due to Fastly serving cached versions of the site. However, as the TTLs on those sites and images expired on Fastly and Amazon Cloudfront (where images are served), and requests to retrieve the assets from the source (S3) failed, more and more sites and images would fail to load as the S3 outage continued into the afternoon.
The Webflow infrastructure and engineering teams have learned that using S3 (which is sold at 99.999% availability) as a single source of truth for all customer assets is not enough to meet the needs of Webflow customers. After the outage subsided, the engineering team met and put in place a plan to add additional redundancy measures such as:
Situations like these are difficult for everyone involved, and we apologize for the downtime and inconvenience this outage caused. We're working hard on making Webflow Hosting an even more reliable place to host your sites. Thank you for your continued support!
In the meantime, please make sure your custom domains are using the most up-to-date DNS records, which you can find in our support article on how to set up custom domain hosting for your Webflow site.