On May 22nd 2023, one of our caching servers received an unexpected amount of load. This caused the depending services to be impacted, cascading to other services as well.
As mitigation step, we added a circuit breaker in order to limit the scope of impact on dependent services, as well as increased the memory in our caching server to promptly restore services.
As a long term preventative measure, we have increased the time-to-live for cashed data, which reduces the frequency of calls to the caching server, hence reducing the load on the server.