EU - Intermittent unavailability issues are being experienced for both Studio and Web Experience

Incident Report for Firstup

Postmortem

Summary:

On May 6th, 2025, at 9:16 AM UTC, we received a report where some users on the Firstup EU platform were unable to load the Studio UI. Upon initial investigations, it was determined that the issue was reproducible but intermittent. A platform incident was thus declared at 9:25 AM UTC, and an incident response team continued with the investigations.

Severity:

Sev-2

Scope:

Only customers on the Firstup EU platform were in scope of this incident.

Impact:

Some Studio users on the Firstup EU platform were unable to successfully load their Studio instance and observed an infinite loading animation during this incident (2hrs 59mins). A page refresh intermittently allowed some users’ Studio instance to load successfully. Users on the Firstup US platform were not impacted by this incident.

Root Cause:

The root cause for this incident was determined to be an automation test that started running on the Firstup EU production platform at 8:30 AM UTC, which appeared to create a significant increased load on a backend system that serviced Studio. The increased load elevated the latency on the system and resulted in the system being unable to respond to Studio requests in a timely manner.

Mitigation:

The system self-recovered and Studio users started observing Studio loading successfully, as the latency reduced with the automation test job nearing its completion, which fully completed running at around 11:29 AM UTC. We also temporarily halted any other automation testing on the production environment. 

Recurrence Prevention:

To prevent this incident from happening again, we will:

  • Reduce the frequency and quantity of automation tests on the production environment.
  • Improve the stability of tests in the testing environment, to reduce reliance on the production environment for tests.
Posted May 13, 2025 - 16:57 UTC

Resolved

This incident is now resolved, and all impacted systems have remained stable and fully available. A postmortem for this incident will be published here as soon as it becomes available.
Posted May 09, 2025 - 17:26 UTC

Monitoring

All mitigation steps have been been taken to resolve and stability has been restored. Both Studio/Member Experience are available. Full details of root cause to follow.
Posted May 06, 2025 - 11:33 UTC

Update

We have identified the cause of this issue and steps are being taken to mitigate, we are also seeing an improvement with performance and both Studio/Web Experience are loading correctly. Issue will be fully resolved in 10 to 15 minutes while final mitigation steps are taken.
Posted May 06, 2025 - 11:15 UTC

Investigating

We are urgently investigating an issue with intermittent unavailability of the platform being experienced for both Studio and Web Experience. EU customers are impacted. Updates to follow asap.
Posted May 06, 2025 - 09:52 UTC
This incident affected: Products (Creator Studio, Web Experience).