Connectivity Issues - Newark
Incident Report for Linode
Postmortem

At approximately 19:35:00 UTC on July 18, 2018, utility power was interrupted to our Newark data center and critical load was immediately transferred to generator. At this time, the Uninterruptible Power Supply (UPS) system servicing Linode’s deployment alerted our colocation provider to a possible failure. UPS technicians were dispatched on site to diagnose the issue. It was determined that one of the UPS units in the N+1 system was damaged and taken offline for repair. As the UPS system was able to run in an N configuration, our colocation provider then decided to transfer back to utility power once it was restored.

At 23:04:00 UTC data center power was switched from generator power to utility power but the remaining two system UPS units failed to take the load and switched to bypass. This caused the power system servicing Linode’s deployment to lose power for approximately three minutes.

Power was fully restored at approximately 23:07:00 UTC. At this time Linode staff worked to bring our infrastructure and affected customer instances back online. Most Linode infrastructure and customer instances were online by approximately 02:34:00 UTC on July 19, 2018, and the incident was deemed resolved.

Our colocation provider has conducted a full investigation and determined that a power surge during the utility power failure damaged the inverter on one of the UPS units in the N+1 set servicing Linode’s deployment. This unit was taken offline and the UPS system was operating in an N state. During the switch back from generator power to utility power, the remaining UPS units were unable to handle the load due to a control board malfunction and went into bypass mode. This caused a 3-minute power interruption.

At this time, the malfunctioning control boards have been checked and verified operational, and the UPS system is currently functional in an N state. A maintenance window has been scheduled to replace the damaged inverter on the failed UPS unit as well as the control boards on all UPS units in the set. This will restore the UPS system to N+1 redundancy and eliminate the potential for further control board malfunctions.

To provide additional protection, our colocation provider has performed an audit of all power circuits servicing our deployment. Over the next quarter we will be transitioning critical hardware to power configurations with increased redundancy.

We do not foresee any further issues with the Newark facility at this time. Thank you for your patience and understanding. We apologize for any inconvenience this interruption has caused.

Posted 5 months ago. Jul 27, 2018 - 15:44 UTC

Resolved
As we have not experienced additional connectivity issues affecting our Newark datacenter, this matter is now resolved. If you are still experiencing issues and our Support team has not contacted you, please feel free to open a ticket or send us an email at support@linode.com and we will be happy to assist. We'll be following up with a post-mortem regarding this incident in the near future.
Posted 5 months ago. Jul 19, 2018 - 02:33 UTC
Monitoring
Our team has restored connectivity to Linodes within our Newark datacenter. We will continue to monitor for any additional issues. If you are still experiencing issues and our Support team has not contacted you, please feel free to open a ticket or send us an email at support@linode.com and we will be happy to assist.
Posted 5 months ago. Jul 19, 2018 - 01:10 UTC
Update
Power has been restored to our Newark data center and our team is working quickly to restore connectivity. We will continue to provide additional updates as they develop.
Posted 5 months ago. Jul 19, 2018 - 00:38 UTC
Identified
We have identified the connectivity issues affecting our Newark data center as the result of a power outage. Our team is working as quickly as possible to restore connectivity. We will provide additional updates as they develop.
Posted 5 months ago. Jul 19, 2018 - 00:01 UTC
Update
Our team is still in communication with the Newark data center to determine the cause of this outage. We are continuing our investigation and will provide additional updates as the issue develops.
Posted 5 months ago. Jul 18, 2018 - 23:40 UTC
Update
We are continuing to investigate this issue.
Posted 5 months ago. Jul 18, 2018 - 23:05 UTC
Investigating
We are aware of connectivity issues affecting Linodes in our Newark data center and are currently investigating. We will continue to provide additional updates as this incident develops.
Posted 5 months ago. Jul 18, 2018 - 23:04 UTC
This incident affected: Regions (US-East (Newark)).