Oracle Netsuite and Microsoft Azure cloud services in Sydney seem to have recovered from the downtime that hit them in the evening of Wednesday 30th August with services now restored.
The Azure website posts the following description and explanation of the events of that evening:
Impact Statement: Starting at approximately 08:30 UTC on 30 August 2023, a utility power surge in the Australia East region tripped a subset of the cooling units offline in one datacenter, within one of the Availability Zones. While working to restore cooling, temperatures in the datacenter increased so we proactively powered down a small subset of selected compute and storage scale units, to avoid damage to hardware. Multiple downstream services were impacted, with targeted communications being distributed via Azure Service Health.
Current Status: Storage infrastructure has recovered. A subset of services still experiencing residual impact are on the path to mitigation.
Mitigation: We worked on recovering the failed cooling units and reducing the overall temperature within the impacted area. Once temperature levels were within operational thresholds, we began to restore power to the affected infrastructure and started a phased process to bring this infrastructure back online. Once storage infrastructure was fully restored, dependent compute scale units were then also restored to operation. As the underlying compute and storage scale units became healthy, compute and other dependent Azure services recovered.
While we have broadly recovered, a small subset of services are still working on post recovery checks, and we are closely monitoring the datacenter metrics for storage and compute resources to ensure they continue to show as healthy. For any residual customers with services still in the recovery process, we will communicate directly to them through Service Health in the Azure portal, which also triggers Service Health alerts.
The Oracle Netsuite website posted similar statements on an ongoing basis through the situation including the information that the outage was caused by “an interruption in the chiller plant as a result of a lightning storm” :