Degraded SSL Redirect Functionality for Root Domains
Incident Report for Webflow

Degraded SSL Redirect Service Postmortem

What Happened

At around 6:40 a.m. PST on Friday, June 16th, we learned from one of our customers that a redirect from a root domain (e.g., https://example.com) was failing to redirect to the "www" version of that website (e.g., https://www.example.com). Because our internal monitoring tools didn't show any obvious downtime for hosted sites, our customer support team began the routine process of verifying that the DNS setup for that domain was configured properly.

At approximately 9:30 a.m. PST, we received three more support requests indicating that root-domain SSL redirects weren't working when a domain was configured to use a specific IP address (34.193.69.252) as the sole A record in the site's hosting configuration. Once our support team learned that this issue wasn't isolated to just one domain, it was immediately escalated to our infrastructure engineering team.

Our engineering team quickly determined that one of the Amazon Web Services (AWS) servers used for SSL redirection was automatically rebooted by AWS due to "degradation of the underlying hardware" that the server was running on. Some time after the automatic reboot, requests to the server started intermittently failing, and then failing outright. The server was quickly replaced with a fresh instance, and redirect requests to that IP address started working again.

Because the server in question didn't crash outright, our health checks and alerts (which notify our 24/7 on-call engineers) unfortunately didn't notify our engineering team right away. We're working on making sure this never happens again, as described below.

Impact of Outage

Based on what we know so far, this error impacted a handful of sites that had their A records set to the 34.193.69.252 IP address. In that situation, the following would have happened:

  • Visiting the root domain URL (e.g. https://example.com) would lead to a connection error
  • Visiting the "www" version URL (e.g. https://www.example.com) would work as expected

For sites that used the recommended configuration of multiple redundant A records (both 34.193.69.252 and 34.193.204.92), we believe the impact was minimal, as most modern browsers have built-in retrying logic.

Out of the thousands of SSL domains hosted on Webflow's infrastructure, we only have direct knowledge of four sites that experienced the redirect failure. However, for the customers that were impacted, we recognize that it was a very frustrating experience.

Next Steps

We're taking the following steps to prevent this issue from happening in the future:

  • Adding more specific redirection health checks to both IP addresses, and adding a real-time status view of those checks to our Status Page under the SSL Redirect IP 1 and SSL Redirect IP 2 system metrics sections. Any change to this status will immediately notify our on-call engineers, day or night.
  • Adding procedures to immediately swap out servers that are restarted by AWS with fresh ones, since in this case it appears that the reboot led to gradually degrading functionality on that server, despite the server technically being online.

We sincerely apologize for the downtime and inconvenience this issue caused for those affected. We're working hard on making Webflow Hosting an even more reliable place to host your sites. Thank you for your continued support!

Posted about 1 month ago. Jun 16, 2017 - 22:32 UTC

Resolved
We have confirmed that the issue is resolved, and we are working on a postmortem that will be attached to this incident later today.
Posted about 1 month ago. Jun 16, 2017 - 19:22 UTC
Monitoring
At 1:40pm UTC on Friday June 16th, Webflow was notified of an issue with one of our SSL redirection servers failing to redirect root domain requests (e.g. https://acme.com) to www-subdomain URLs (https://www.acme.com). The root cause was traced to a degraded AWS server, which was replaced by our engineering team by 5:00pm UTC.

We are working on a more detailed postmortem on this incident that will outline the steps we are taking to make sure this type of incident does not occur in the future.

If you are still seeing SSL redirection errors on your site, please contact our support team at support@webflow.com
Posted about 1 month ago. Jun 16, 2017 - 18:17 UTC