Linode Status - Connectivity Issues

Connectivity Issues - Fremont

Incident Report for Linode

Postmortem

At approximately 04:23:00 UTC on June 21, 2018, utility power was interrupted to our Fremont data center. At this time, the facility’s Uninterruptible Power Supply (UPS) system was engaged. However, the UPS unit servicing a sizeable portion of Linode’s hardware deployment failed. This caused a subset of our hardware fleet which services our customer instances to lose power and reboot.

Utility power was restored at approximately 05:16:00 UTC on June 21, 2018. At that time, Linode staff worked to bring our infrastructure and the affected customer instances back online. Most Linode infrastructure and customer instances were online by approximately 08:10:00 UTC, and the incident was deemed resolved.

Our colocation provider is working with their UPS vendor to conduct a full investigation to determine what caused the failure. We will provide more detail as it becomes available. At this time, the affected UPS has been repaired and is operating normally. We have confirmation that there were subsequent power loss events early last week and the repaired UPS operated normally.

This is the second outage we’ve experienced at this facility in the last 6 months, and we do not take these downtime events lightly. To reduce the impact of power loss issues going forward, we are in the process of moving our critical network infrastructure to a new area of the data center facility now. This new area will provide fully redundant power feeds which would prevent a full outage should an issue like this recur. We anticipate completion of this phase in the next 30-60 days. Additionally, we are planning to move the remaining Linode hardware deployment to the new area to take advantage of its additional power redundancy. We do not yet have an ETA for the completion of this phase.

We do not foresee any further issues with the Fremont facility at this time. We appreciate your patience while we await the official RFO, root cause, and mitigation plan from our colocation provider.

Update 2018-07-11

Our colocation provider and their UPS vendor have completed a full investigation. Inspection of the UPS system indicated failed/burned components, and further diagnostics determined that a rectifier had failed. At this time, the faulty component has been replaced and the unit has been verified to be working properly.

Posted Jul 05, 2018 - 19:35 UTC

Resolved

Because we have not experienced any additional issues affecting our Fremont data center, this matter is now resolved.
If you are still experiencing any issues, please reach out to our Customer Support Team for assistance.

Posted Jun 21, 2018 - 08:43 UTC

Monitoring

At this time, we have addressed the services affected in our Fremont data center. We will be monitoring this issue to ensure everything remains stable.
If you are still experiencing any issues, please reach out to our Customer Support Team for assistance.

Posted Jun 21, 2018 - 08:10 UTC

Update

Our team is continuing to work to restore services in our Fremont datacenter. We will continue to provide additional information going forward.

Posted Jun 21, 2018 - 07:12 UTC

Update

At this time, the power has been restored and our team is working to restore services in our Fremont datacenter. We will continue to provide additional information going forward.

Posted Jun 21, 2018 - 06:23 UTC

Identified

At this time we have identified the connectivity issues affecting our Fremont data center as being the result of a power outage. Our team is working as quickly as possible to have connectivity restored. We will provide additional updates as they develop.

Posted Jun 21, 2018 - 05:34 UTC

Update

Our team is still in communications with our upstream provider to determine the cause of this outage. We are continuing our investigation and will provide additional updates as the issue develops.

Posted Jun 21, 2018 - 05:18 UTC

Investigating

We are aware of connectivity issues affecting Linodes in our Fremont data center and are currently investigating. We will continue to provide additional updates as this incident develops.

Posted Jun 21, 2018 - 04:36 UTC

This incident affected: Regions (US-West (Fremont)).