On Mar 29, 2022, certificates utilized by the LKE API Server Loadbalancers for Dallas and Newark expired. This expiration went undetected by the team until we received internal alerting about unhealthy workloads. At ~3:10PM EST, we started actively investigating the issue. By ~4:50PM EST, the root cause was identified as the expired certificates. New certificates were generated and applied to the Loadbalancers by 5:14PM, and this resolved the issue. We are actively working on automating the renewal of these certificates and adding alerting that will notify us when the certificates are approaching expiry.
Posted Apr 07, 2022 - 16:32 UTC
This incident has been resolved.
Posted Mar 29, 2022 - 22:12 UTC
A fix has been implemented and we are monitoring the results.
Posted Mar 29, 2022 - 21:44 UTC
The issue has been identified and a fix is being implemented.
Posted Mar 29, 2022 - 21:14 UTC
We are currently investigating an issue with communication between customer clusters and their managed control planes. At this time, customers may be unable to communicate with their clusters, however deployed workloads are expected to be functioning properly.