Connectivity Issue - Frankfurt Datacenter
Incident Report for Linode
Postmortem

For approximately 2 hours and 30 minutes, between 12:35 UTC and 15:05 UTC on May 24, 2023, our Frankfurt Data Center experienced an interruption in connectivity on our management network.

This did not affect direct connectivity to Linodes, including their ability to connect to other locations. Instead, this prevented jobs for services within Frankfurt such as boot, reboot, etc. from completing. It additionally affected connectivity to Object Storage services in Frankfurt from any location.

The root cause of this was due to an issue with a core router within the Data Center which resulted in the suspension of production ports. This suspension effectively shut down the management network, which in turn prevented job requests from processing. The loss of management network connectivity additionally stopped traffic flow for Object Storage services in Frankfurt, rendering it inaccessible during this time.

To rectify this issue, we needed to perform a reboot of the affected core routers. This could be either a cold boot or a soft reboot, with a cold boot being preferred to avoid issues with initialization after rebooting.

We initially attempted to contact field technicians and facility Remote Hands engineers to organize a cold boot of these core routers. Once it was apparent that there was going to be a delay, we shifted to a soft reboot, which ultimately allowed us to restore connectivity at approximately 15:05 UTC, with full redundancy being restored by 15:32 UTC. After continuing to monitor the situation, the incident was then marked resolved at 16:44 UTC.

Posted Jul 05, 2023 - 14:38 UTC

Resolved
This incident has been resolved.
Posted May 24, 2023 - 16:44 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted May 24, 2023 - 15:19 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted May 24, 2023 - 14:49 UTC
Update
We are investigating an issue that is causing jobs issued to services, such as a shutdown, reboot or boot, to fail. Linodes that are up and running remain unaffected and accessible, but their statistics will not be displayed in Cloud Manager. Customers may also experience connectivity issues to Object Storage at this time.
Posted May 24, 2023 - 13:46 UTC
Investigating
We are currently investigating this issue.
Posted May 24, 2023 - 13:04 UTC
This incident affected: Object Storage (EU-Central (Frankfurt) Object Storage) and Regions (EU-Central (Frankfurt)).