Network Connectivity - London
Incident Report for Linode

On the morning of January 20th, 2017, our London datacenter experienced three incidents of partially degraded internet connectivity–

  • 7:41 to 7:55 UTC (14m)
  • 8:26 to 8:36 UTC (10m)
  • 9:09 to 9:18 UTC (9m)

Our London datacenter is serviced by two leased geographically diverse dark fiber spans between our colocation facility and nearby points of presence. All of our transit and peering connections are backhauled from these two points of presence.

Each of these three incidents was caused by multiple brief losses of light on our dark fiber spans–

[otn2] 7:41:18 to 7:46:58 UTC (5m 40s)
[otn2] 7:48:29 to 7:51:30 UTC (3m 1s)
[otn1] 8:26:34 to 8:27:58 UTC (1m 24s)
[otn1] 8:28:19 to 8:28:34 UTC (15s)
[otn1] 8:28:59 to 8:29:13 UTC (14s)
[otn1] 8:29:19 to 8:29:26 UTC (7s)
[otn1] 8:30:13 to 8:30:24 UTC (11s)
[otn2] 9:09:53 to 9:13:09 UTC (3m 16s)
[otn2] 9:13:57 to 9:14:36 UTC (39s)
[otn2] 9:15:02 to 9:17:27 UTC (2m 25s)

As can be seen above, neither of the spans experienced failures at the same time; however, the rapidly changing link states caused a pathological failure case for BGP, exacerbating the service impact.

We have not yet determined the root cause of these light failures. We have verified that our own equipment was operating normally at the time. Additionally, there was no scheduled maintenance reported by any of our infrastructure providers during the relevant time periods.

In order to minimize the impact of similar incidents going forward, we have decided to prioritize the implementation of BGP bidirectional forwarding detection (BFD) with our peers, wherever possible. Additionally, we will be implementing stricter BGP flap dampening.

Alex Forster
Network Engineer, Linode

Posted over 1 year ago. Jan 20, 2017 - 18:28 UTC

This incident is resolved. We will be posting a post-mortem shortly.
Posted over 1 year ago. Jan 20, 2017 - 18:14 UTC
Connectivity to our London datacenter has been fully restored. We'll continue to monitor this situation and provide updates as necessary.
Posted over 1 year ago. Jan 20, 2017 - 12:53 UTC
We are currently experiencing connectivity issues within our London datacenter. Our Network Operations team is aware and is currently investigating.
Posted over 1 year ago. Jan 20, 2017 - 09:18 UTC
Connectivity has been restored and we are monitoring for any residual issues.
Posted over 1 year ago. Jan 20, 2017 - 08:53 UTC
We are aware of an issue within our London datacenter and are investigating at this time. We will provide additional information as it becomes available.
Posted over 1 year ago. Jan 20, 2017 - 07:55 UTC
This incident affected: London.