Summary:
On October 10th, 2025 we received alerts indicating there was a service disruption with the Web Experience channel and also customer reports of the following error being displayed on the Web Experience for impacted users: ‘Oops! Something went wrong.' This behavior was attributed to a change that was made to enable a verbose log setting on a back end system which generated an unexpected amount of load on the database for the US1 region. A verbose log setting provides detailed, step-by-step information to help with troubleshooting, debugging, and performance optimization by logging every action and event. More capacity was added to handle the additional load which restored service for the Web Experience however a residual issue remained with some slowness navigating on the Web Experience which was fully mitigated by disabling the verbose log setting.
Impact:
A system error message was returned to any end-user attempting to access/use the Web Experience between 05:00 AM and 05:56 AM PT (56 minutes).
‘Oops! Something went wrong.'
Residual slowness was seen between 05:56 AM and 10:32 AM PT (4 hours 26 minutes).
Root Cause:
Root cause was determined to be due to a change made on October 9th to enable a verbose log setting on a back end system which generated an unexpected amount of load on the database for the US1 region, which, coupled with normal peaks in busyness, resulted in several services going into a restart loop. This restart loop ultimately could not be contained without causing a disruption in service to our Customers.
The verbose log setting was enabled outside of business hours to assist with the troubleshooting and debugging of a prior incident on October 9th whereby Studio Insights Reports were not available for some users. This increase of the granular reporting logging events on database activity for audit/compliance unexpectedly generated a high level of load on the database during peak morning traffic US East coast hours.
The frequent application restarts which caused the most significant amount of impact to users were the result of the database becoming too bogged down with writing the verbose log data creating I/O (input/output) boundness that blocked other services from interfacing with the database itself during peak traffic time.
Mitigation:
A platform incident was declared at 05:21 am PT on Friday, October 10th and the incident team restored service at 05:56 AM PT. Additional capacity was implemented to handle the unexpected load which mitigated the unavailability of the Web Experience channel. However residual slowness and latency were being experienced for some users until 10:32 AM PT until the verbose log setting was disabled. Database load then returned to normal levels.
Recurrence Prevention:
The following actions have been taken or have been identified as follow-up actions to commit to as a part of the formal RCA (Root Cause Assessment) process: