At 8:58AM this morning we suffered a major outage. Unfortunately this was caused due to a circuit breaker which was tripped this morning as servers entered high-load states. This was a result of new hardware installations which caused a higher than normal power usage. Unfortunately the tech installing the equipment did not check the current power draw and breaker limits beforehand.
This unfortunately caused nearly all systems to power down and took some time to bring back online. By 9:20AM we had most linux high availability services back online, most standalone services and about 25% of windows VMs, by 9:40AM we had windows high availability services back online also. Due to the unexpected power outage some VMs were affected and required some additional work to bring them back online. This was done as fast as possible, as was answering tickets, phone calls and live chats during the outage.
We are having a new circuit installed today to increase our power capacity, and have also improved and increased our monitoring so all staff can view our power usage in the same way as we view our network throughput. This will mean constant monitoring as well as a tech paging notification service to ensure that this type of issue does not reoccur.
We sincerely apologise for this outage and the inconvenience it has caused, particularly occurring at the start of a working day. We work very hard to constantly improve our services and procedures, and this outage has caused us to review all procedures we work by to ensure that not only this specific circumstance doesn't reoccur, but that our procedures take into account all possibilities.
We thank you very much for your continued support.