On September 30th, 2024, beginning at approximately 1:24 PM PDT (20:24 UTC), we started receiving reports of Shortcuts intermittently being unavailable and the Assistant returning an error in the Employee Experience. A platform incident was declared at 2:36 PM PDT (21:36 UTC) after initial investigations revealed the issue to be platform-wide.
Any user on the US platform accessing the Web or Mobile Experiences intermittently experienced missing Shortcuts and/or received an error message while accessing the Assistant. A refresh of the Employee Experience page occasionally restored these endpoints. All other services in the Employee Experience remained available and functional.
Shortcuts and the Assistant endpoints in the Employee Experience were intermittently unavailable during the incident.
The root cause was determined to be due to an uncharacteristically high number of new user integrations introduced within a short period of time that exacerbated a newly uncovered non-optimized content caching behavior. This caused downstream latency and increased error rates served by the web service responsible for rendering shortcuts and the assistant notification page.
The immediate impact was mitigated by restarting the Employee Experience integrations API, and services were restored by 2:42 PM PDT (21:42 UTC). While investigations into the root cause continued, the incident recurred the following day – October 1st, 2024, at 12:54 PM PDT (19:54 UTC). The Employee Experience integrations API and the dependent Employee Experience user-integrations request processing service (Pythia) were restarted, restoring Shortcuts and the Assistant endpoints by 1:46 PM PDT (20:46 UTC). Cache resources for Pythia were increased to mitigate the observed latency.
To prevent this incident from recurring, our engineering incident response team:
Has developed a fix to optimize how user-integrations requests use the cache to reduce memory consumption and eliminate latency.
Will be adding a monitoring and alerting dashboard for the Employee Experience user-integrations requests processing service (Pythia).