On Saturday, February 22nd, 2025, starting at 6:45 PM PT, our US-East RDS Database Cluster responsible for campaign delivery for customers hosted in the US entered into a crash loop during a database table repacking process, which forced the database to restart after every crash. This prevented platform communications from continuously being processed and sent out while the database restarted.
Sev2
The scope of this service disruption was restricted to customers on the US platform utilizing the US-East database cluster.
Some campaign communications may have experienced slight delivery delays via any delivery channel in the duration of this incident (8hrs 27mins).
Customers on Employee Intranet solutions, Web Experience, Studio, custom API solutions, and Embedded content endpoints, would not have seen any impact.
The root cause of this incident was attributed to a break in the integrity between a table in the database and its index, following a repacking task on the table to improve database performance. During the repacking of the table, its index was identified as having invalid indexes, and was skipped by the repacking process. The break in the integrity of the table and its index caused the database to enter into a crash loop starting at 6:45 PM PT, and restarted after every crash in an attempt to resolve the crash.
In the periods between a successful database restart and the next database crash, other database processes were running as expected and campaign communications were sent out successfully.
To mitigate this incident, the invalid index for the table was removed and recreated on Sunday, February 23rd, 2025, at 3:12 AM PT, in effect ending the crash loop on the database.
To prevent this incident from recurring, we will have performed the following actions: