Connectivity Issue - Newark
Incident Report for Linode
Postmortem

A pair of switches configured for redundancy connects two pods in the Newark Data Center, referred here as SwitchA and SwitchB.

This switch pair utilizes Multi chassis LAG, often referred to as CLAG or MLAG. At approximately 23:25 EST on 08/25/2021 links on a connected core switch(CoreSwitchA) flapped multiple times in succession.

This either caused or resulted from the MLAG process on SwitchA failing and restarting into a broken state. This failure caused brief loops to be formed in the Newark network. The NetOps team isolated SwitchA and worked to resolve the issues caused by these loops. On 8/27 the NetOps team performed an emergency maintenance to fully isolate and repair this switch.

It’s worth noting that because of the multiple failures of this switch pair, the NetOps team has moved off the primary traffic and functions of these switches, and they will be decommissioned in the near future. In addition the connectivity between these two PODs has been re-engineered to utilize a different network architecture, which should improve overall network stability.

Posted Sep 29, 2021 - 18:45 UTC

Resolved
We have not observed any additional issues with connectivity, and are continuing to work with the vendor. If additional maintenance is required, we will provide additional status page posts in the future.
Posted Aug 26, 2021 - 19:02 UTC
Monitoring
Network connectivity continues to remain stable with the fixes that have been applied. We are continuing to monitor connectivity while we work with our vendor.
Posted Aug 26, 2021 - 18:08 UTC
Update
We are still working on a permanent solution for the connectivity issues in our Newark data center. We will be monitoring this issue to ensure that it remains stable in the meantime. If you are still experiencing issues, please open a Support ticket for assistance.
Posted Aug 26, 2021 - 14:48 UTC
Identified
Our team has identified the issue affecting connectivity in our Newark data center. We have implemented a temporary fix and are working quickly to implement a permanent one. We will provide an update as soon as the solution is in place.
Posted Aug 26, 2021 - 09:43 UTC
This incident affected: Regions (US-East (Newark)).