Platform Service Degradation - Email Delivery Delays
Incident Report for Firstup
Postmortem

Summary:

On Tuesday May 14th, 2024, starting at approximately 4 PM PT to Wednesday May 15th, 2024 10:06 AM PT, outbound email that had previously been sent from sendgrid.net, a backstop sender domain, instead was sent via another non-allowlisted domain.  This resulted in delivery failures or delays for some Customers with explicit inbound email rules configured to explicitly match the sendgrid.net sender domain.

Impact:

Any campaigns set to publish or re-engage users during the affected time window, would have appeared to originate from a sender domain other than sendgrid.net for any Customers without their own authenticated sender domain records (against Firstup and industry best practices). Depending on individual email rules, those emails could have been blocked, quarantined, marked as spam, or any number of other email policy-specific actions.

Root Cause:

Root cause was determined to be related to a default sender domain being set while configuring a new authenticated email domain while implementing a newly onboarded Firstup customer.  Previously no default domain was configured which resulted in sendgrid.net being used as a backstop sender domain–despite not being included in the allowlisting article at https://support.firstup.io/hc/en-us/articles/4417455533975-Allowlist-Emails-from-Firstup .Both user error and a UI design limitation in the 3rd party software used to add new authenticated domains contributed to the errantly created default sender domain.  Specifically, the configuration page does not have any explicit save button or confirmation dialogue protecting the checkbox which sets the new record as the default sender domain to use platform-wide.  The checkbox was incidentally selected while creating a screenshot of the configuration settings to be shared with the newly onboarded customer.

Mitigation:

To mitigate, the default sender domain was restored as soon as the root cause became clear.

Recurrence Prevention:

The following actions have been committed to fully resolving the incident and eliminating the reliance on the mitigation measure currently in place.

  • A scheduled maintenance window has been posted for June 15th, 2024 outlining a planned update to a new sender domain that should already be allowlisted, email.socialchorus.net. Specific details of the maintenance can be found at https://status.firstup.io/incidents/jfv1s06qyv3v
  • Firstup will review again any Customers who do not have authenticated sender domains configured to setup DMARC/SPF records and Customer specific sender domains to avoid the backstop or default from ever being needed to send program-specific email.
  • A feature request has been filed with Sendgrid, the 3rd party email provider to add protections around the checkbox selection in the user interface to avoid any chance of an unintentional user action from changing the default sender domain.
Posted May 17, 2024 - 22:48 UTC

Resolved
We are currently investigating reports of delayed email deliveries.
Posted May 14, 2024 - 23:00 UTC