For approximately 2 hours and 30 minutes, between 12:35 UTC and 15:05 UTC on May 24, 2023, our Frankfurt Data Center experienced an interruption in connectivity on our management network.
This did not affect direct connectivity to Linodes, including their ability to connect to other locations. Instead, this prevented jobs for services within Frankfurt such as boot, reboot, etc. from completing. It additionally affected connectivity to Object Storage services in Frankfurt from any location.
The root cause of this was due to an issue with a core router within the Data Center which resulted in the suspension of production ports. This suspension effectively shut down the management network, which in turn prevented job requests from processing. The loss of management network connectivity additionally stopped traffic flow for Object Storage services in Frankfurt, rendering it inaccessible during this time.
To rectify this issue, we needed to perform a reboot of the affected core routers. This could be either a cold boot or a soft reboot, with a cold boot being preferred to avoid issues with initialization after rebooting.
We initially attempted to contact field technicians and facility Remote Hands engineers to organize a cold boot of these core routers. Once it was apparent that there was going to be a delay, we shifted to a soft reboot, which ultimately allowed us to restore connectivity at approximately 15:05 UTC, with full redundancy being restored by 15:32 UTC. After continuing to monitor the situation, the incident was then marked resolved at 16:44 UTC.