A DoS started happening against a Webflow site around 2/11 7pm CET. This affected sites that were published during this time, causing 503 "First Byte Errors", which meant our render cluster could be render dynamic (CMS) content and return it to our caching layer.
We applied fixes to our caching layer to prevent the DoS from passing on traffic to our render cluster. We also added more capacity to our render cluster.
Certain metrics were not being properly checked, and we've amended our infrastructure monitoring and system alerts to properly notify on-call engineers in the future, to reduce customer impact and downtime.