Web Experience is displaying a 'signed out..' error for some programs when trying to access.

Incident Report for Firstup

Postmortem

Summary: 

On August 5th, 2025 we began receiving Customer reports of the following error being displayed by the Web Experience channel for impacted users: "You've been signed out or your access has been expired". This behavior was attributed to unexpected system memory pressure which resulted in resource starvation on the platform and subsequent intermittent user session invalidation.  The initial attempt to mitigate impact by doubling the size of the system cache worked temporarily.  However, further action was needed to quarantine a recently published campaign determined to be the cause of the out-of-memory condition. This secondary action fully restored service for the Web Experience channel. 

Impact: 

A system error message was displayed intermittently to end-users accessing the Web Experience between 08:21 AM - 09:01 AM PT and again between 11:23 AM - 11:37 AM PT.

"'You've been signed out or your access has been expired" 

Other services and endpoints including Creator Studio and the Mobile Experience remained fully functional. 

Root Cause: 

Root cause was determined to be data store instances being above memory capacity resulting in intermittent user session invalidation. This behavior was attributed to unexpected increased memory utilization due to several factors:

  1. Large number of communications/campaigns being sent at the same time near US peak hours which increased cache-pressure.
  2. Non-optimal caching of a recently published Classic Studio campaign containing multiple large images in the campaign content body itself.
  3. Increase in ‘401 unauthorized’ errors due to a newly discovered software bug which directly led to the "You've been signed out or your access has been expired" error message on the front end.

Mitigation: 

A platform incident was declared at 08:31 am PT on Tuesday, August 5th and at 08:41 AM PT the incident team determined the root cause to be related to memory exhaustion. Issue was attempted to be mitigated earlier by and faster by doubling the size of the cache(data store) instance which temporarily restored service.  The incident recurred approximately 2 hours later, and further analysis revealed non-optimal storage-reuse (ie caching) behavior related to a recently published campaign.  That implicated campaign was quarantined to address the immediate impact, thus fully restoring service for the Web Experience channel.

Recurrence Prevention: 

The following actions have been taken or have been identified as follow-up actions to commit to as a part of the formal RCA (Root Cause Assessment) process: 

  • Campaign body caching has been optimized so we will only cache the data that is being used in the response.  This change has already been completed, and we have observed significant improvement in cache optimization by both reducing the size of allowed cacheable objects and also caching the same object for multiple users.
  • Additional real-time monitoring has been implemented to improve threshold warnings for high memory usage.
  • We have implemented a hotfix for the behavior causing the user session invalidation error message so that all cache-calls go through new logic that fall back to the source of truth and are resilient to Data Store cache issues. This has put us in a state where we will not have a re-occurrence of the unintended user-facing error message, and will drastically reduce the chance of using up the full memory.
  • Testing and Analysis - Rigorous load testing on non-production internal environments was performed to verify expected behavior for campaigns with large audiences containing imagines inline with the content body.
Posted Aug 26, 2025 - 12:36 UTC

Resolved

This incident has been resolved.
Posted Aug 26, 2025 - 12:32 UTC

Monitoring

Impact has now been fully mitigated and we do not expect a recurrence. This incident will remain in monitoring state while we evaluate the longer term mechanism required to completely resolve the issue.
Posted Aug 05, 2025 - 19:27 UTC

Identified

Root cause is identified and we are actively working to mitigate the issue.
Posted Aug 05, 2025 - 19:18 UTC

Update

Moving incident back to active state. Mitigation efforts have been insufficient to fully isolate Customers from impact. Symptoms remain similar where users are being signed out, but other connection errors may be intermittently displayed as well.
Posted Aug 05, 2025 - 18:39 UTC

Investigating

We have seen a recurrence of this issue and are currently investigating. Same authentication errors are being displayed for some programs. Updates to follow asap.
Posted Aug 05, 2025 - 18:37 UTC

Update

We are continuing to monitor for any further issues.
Posted Aug 05, 2025 - 16:05 UTC

Monitoring

Immediate impact has been mitigated. Errors are no longer appearing and successful log in checks have been done. Stability has been restored and we are continuing to monitor. Root cause to follow.
Posted Aug 05, 2025 - 16:05 UTC

Update

Cause of issue has been identified and we are currently taking action to mitigate which will resolve the authentication errors being displayed.
Posted Aug 05, 2025 - 15:53 UTC

Investigating

We are currently investigating a platform incident impacting the web experience channel for some users. A 'You've been signed out or your access has been expired' is appearing on some communities. Updates to follow asap.
Posted Aug 05, 2025 - 15:44 UTC
This incident affected: Products (Web Experience).