Summary:
From approximately 11:08 am - 11:38 am PT (18:08 pm - 18:38 pm UTC), Thursday August 22nd, both Studio and Web Experience were unavailable due to the release of Version 2 of Personalized Fields (PFV2), a new feature with the Q3 quarterly update that was more resource intensive than initially planned. This caused high CPU usage, increased query latency and database connection pool exhaustion.
Impact:
The scope of this incident primarily affected users who attempted to access Studio services and Web Experience between 11:08 am - 11:38 am PT. The issue manifested itself in the following observable ways through below errors on the frontend of the platform:
Root Cause:
The root cause was determined to be the release of Version 2 of Personalized Fields (PFV2), a new feature with the Q3 quarterly update that has been more resource intensive than was initially planned. The feature caused a significant increase in CPU usage, query latency on the shared database cluster and database connection pool exhaustion. This resulted in the Studio/Web Experience service unavailability and error messages observed by impacted users.
Mitigation:
The immediate impact was mitigated by temporarily disabling the newly released feature that was causing excessive resource consumption. The cache Time-To-Live (TTL) was also changed from 1 minute to 3 hours to reduce load and stabilize performance.
After service was restored, we conducted platform tuning and scaled up infrastructure outside business hours to accommodate the increased load with the introduction of this new feature.
Recurrence Prevention:
To prevent a recurrence of this incident, the below actions have or are being implemented: