Service Issue - Managed Databases

Incident Report for Linode

Postmortem

Starting around 15:45 UTC on May 17, 2023, a change was pushed to a backend service on the Linode platform that as a result prevented customers from being able to provision or fully delete Managed Databases on the platform. The errors presented as customers receiving errors in attempting to provision a new cluster, or when existing clusters were deleted, the Managed Database cluster would disappear from Cloud Manager and become inaccessible, however the cluster data itself would not be immediately deleted.

This issue was first detected via tickets to the Linode Support team. At 17:24 UTC, the ability for customers to provision new Managed Database clusters was restored. At 20:00 UTC, the ability to fully delete Managed Database clusters was restored, and the remaining delete jobs that were entered into the queue finished, completely deleting the data.

The root cause of this incident was a hotfix released by a team to a backend service that was not expected to have a direct impact related to the Managed Database service. An additional hotfix to the code was released to address this immediate issue.

To help prevent future recurrences of similar incidents, we are seeking to implement greater communication between the related teams and generate tooling and policies to test changes in these areas more frequently.

Posted May 24, 2023 - 23:39 UTC

Resolved

This issue is now resolved and all provisioning of Database Clusters should now complete normally. We identified an issue causing the initial process of cluster creation to fail. A fix was pushed to production at 17:15 UTC and has remained stable during monitoring. Any clusters provisioned from May 16th 17:54 UTC to May 17th 19:50 UTC will remain in a failed state and should be deleted and redeployed. If you continue to have issues provisioning a new cluster, please open a Support ticket for assistance.

Posted May 17, 2023 - 20:04 UTC

Update

We are continuing to monitor for any further issues.

Posted May 17, 2023 - 19:30 UTC

Monitoring

A fix has been implemented and we are monitoring the results.

Posted May 17, 2023 - 18:06 UTC

Update

We are continuing to work on a fix for this issue.

Posted May 17, 2023 - 17:11 UTC

Identified

The issue has been identified and a fix is being implemented.

Posted May 17, 2023 - 16:21 UTC

Investigating

Our team is investigating a service issue affecting the Managed Databases service. During this time, users may experience issues when attempting to provision new data bases.

We will share additional updates as we have more information.

Posted May 17, 2023 - 15:29 UTC

This incident affected: Managed Databases.