Starting at roughly 16:40 UTC on April 26, 2023, several services supporting customer interaction (including the Cloud Manager and Linode API) experienced a fault due to an unexpected increase in backend load. This fault did not affect the stability of customer resources directly, only the management (creation, updating, deletion) of resources.
Customers would have noticed increased latency in jobs processing, and perhaps later as a result of the latency, errors due to backend connections being left in a pending state.
The fault was addressed in two steps:
To prevent these issues moving forward, we are planning out the implementation of improved resiliency and alerting measures on the affected systems.
This issue was resolved around 22:40 UTC on May 16, 2023, and has since been monitored, with no recurrence reported.