Platform Service Disruption - User Sync Files Failing To Process
Incident Report for Firstup
Postmortem

Summary:

On April 30th, 2024, starting at 8:18 AM EDT, we began to receive reports that User Sync files were failing to process, and the following error message was returned:

  • Failed to decrypt uploaded file. Please ensure that the correct encryption key and format is used.
  • The encryption key expected to be used is [Key Fingerprint].

    A platform incident was declared at 10:23 AM EDT and was fully mitigated by 10:59 AM EDT.

Scope and Impact:

The scope of this incident was isolated to only customers who encrypt their User Sync file before uploading it. The impact of this incident was restricted to customers who had uploaded an encrypted User Sync file between 10:03 PM EDT on April 29th, 2024, and 10:59 AM EDT on April 30th, 2024.

Root Cause:

The incident response team identified that this incident resulted from a regression to a software release on April 29th, 2024. It was identified that the OS image used to deploy the upgrade lacked crucial packages for decryption.

Mitigation:

At 10:59 AM EDT, the released upgrade was rolled back to its previous version which contained the decryption packages, to allow normal decryption of encrypted User Sync files. We also identified and reprocessed any encrypted customer User Sync files that had failed to process within the duration of the incident.

Recurrence Prevention:

A technical team post-mortem meeting reviewed that the zip-based deployment of the OS had no controls over updating or re-deploying the upgrade. We therefore transitioned to an image-based deployment which allowed for greater control over the OS image and the necessary dependencies. The upgrade was later redeployed on May 6th, 2024, using the OS image that included the necessary decryption packages.

We also:

  • Added additional monitoring and alerting for the health of external-registration (User Sync

    files processing).

  • Updated regression test packs to include testing user sync with encrypted files.

Posted May 28, 2024 - 16:29 UTC

Resolved
We have observed that PGP-encrypted user sync files continue to process successfully.

This platform service disruption is now resolved, and an RCA will be provided once a full incident postmortem has been completed.
Posted May 02, 2024 - 15:53 UTC
Update
All impacted user sync files have now been reprocessed successfully. Please note that only PGP-encrypted user sync files were in the scope of this incident. Additional details will be provided once a postmortem of the incident has been completed.

This incident is now considered resolved. We will be placing the impacted systems under monitoring for now.
Posted Apr 30, 2024 - 16:14 UTC
Monitoring
A fix for this issue has been deployed, and user sync files are now processing successfully.

We will be re-running any previously failed user sync files, and confirm once completed.
Posted Apr 30, 2024 - 15:10 UTC
Identified
We have identified the cause of this service disruption, and are working to fix it.

Another update in 1 hour.
Posted Apr 30, 2024 - 14:59 UTC
Investigating
We are currently investigating reports where user sync files are failing due to PGP encrypted files failing to be processed.

We will provide you with an update within 1 hour.
Posted Apr 30, 2024 - 14:32 UTC
This incident affected: Products (New Studio, Classic Studio).