Emergency Network Maintenance - US-East (Newark)
Scheduled Maintenance Report for Linode
Postmortem

On 2021-07-13 at 20:40 UTC, Linode's Network Operations team responded to alerts of networking issues occurring in our Newark data center. Upon investigation, the alert presented as some latency and packet loss between Linodes in the data center.

The Network Operations team was able to identify a problem with a primary core switch in a redundant switch pair that connects two pods. The team isolated the primary switch and continued to troubleshoot. 

Around this time, the Network Operations team reached out to our switch vendor to assist in troubleshooting. An emergency maintenance notification was posted to our Linode Statuspage at 21:35 UTC. After the status page was posted, the Network Operations team prepared the troubled switch for a reboot. Post-reboot, the switch still exhibited the same behaviors. During this time, the Network Operations team continued to work with the vendor to identify the cause of these issues.

Shortly before 03:00 UTC, the secondary switch began exhibiting the same behavior as the primary switch. Based on the recommendation of the vendor the Network Operations team decided to revert to a different version of the operating system and prepared the primary switch to be downgraded and reprovisioned. At 03:00 UTC, the secondary switch also entered into a completely failed state, stopping all connectivity between the two pods. The failure on the secondary switch occurred during the reprovisioning of the primary switch.

Services were restored around 03:40 UTC following the completion of the downgrade and reprovisioning of the primary switch. The Network Operations team then isolated the secondary switch and completed the same action. We continued to monitor for any latency or packet loss in Newark, with services fully restored at 04:42 UTC. 

We’re continuing to work with the vendor to determine the final cause, most likely a software bug. We had been running on this code for many months without issue.

Posted Aug 11, 2021 - 01:42 UTC

Completed
The emergency network maintenance has been completed.
Posted Jul 14, 2021 - 04:42 UTC
Update
The Linode Network Operations Team is continuing to perform emergency network maintenance with our switch vendor. While performing this maintenance, we encountered a critical bug that impacted additional internal networking equipment. There will be periods of packet loss between Linodes within the US-East (Newark) Data Center.
Posted Jul 14, 2021 - 03:21 UTC
Update
The Linode Network Operations Team is continuing to perform emergency network maintenance. We appreciate your patience.
Posted Jul 14, 2021 - 02:32 UTC
Update
The Linode Network Operations Team is continuing to perform emergency network maintenance with our switch vendor.
Posted Jul 14, 2021 - 01:36 UTC
Update
The Linode Network Operations Team is continuing to perform emergency network maintenance and we have to extend the maintenance period to troubleshoot with our switch vendor. There is no impact expected for network traffic going to and from the internet, however there may be brief periods of packet loss between Linodes within the US-East (Newark) Data Center. We will update the status page when more information is available.
Posted Jul 13, 2021 - 23:38 UTC
In progress
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Posted Jul 13, 2021 - 21:45 UTC
Scheduled
The Linode Network Operations Team will be performing an emergency network maintenance on core switches in the US-East (Newark) Data Center on Tuesday, July 13th, 2021 from 21:45 UTC until 22:45 UTC. There is no impact expected for network traffic going to and from the internet, however there may be brief periods of packet loss between Linodes within the US-East (Newark) Data Center.
Posted Jul 13, 2021 - 21:35 UTC
This scheduled maintenance affected: Regions (US-East (Newark)).