Lost Password? No account yet? Register
  • Increase font size
  • Decrease font size
  • Default font size

WWW.BS25999.COM

Wednesday
Aug 20th
Home arrow Content arrow Technical arrow Reports on Recent Rackspace Outages
Reports on Recent Rackspace Outages PDF Print E-mail
Rackspace has historically been one of the most reliable hosting providers in the market, even providing a 100% uptime guarantee. Unfortunately there has been a recent outage at the Dallas/Fort Worth data centre; that has highlighted the need for all organisations to review their outsourcing arrangements.

Quoting from the Rackspace web site

100% Network Uptime Isn't Wishful Thinking, It's A Guaranteed Reality

So what happened?

It seems that a cascading sequence of events led to the partial data centre outage. First, on Sunday at about 4:00 a.m. CST, a mechanical failure hit the Web hosting company's Dallas/Fort Worth data centre, disrupting some web sites of some its customers

The company managed to fix the problem and get customers back online relatively quickly but before Rackspace Managed Hosting could figure out exactly what caused the minor meltdown when a vehicle struck the transformer that was feeding power into the Dallas/Fort Worth data centre.

It immediately disrupted power to the entire data centre and emergency generators kicked in and operated as intended. When they transferred power to the secondary utility power system, the data centre's chilling units were cycled back up.

However, the utility provider shut down power in order to allow emergency rescue teams safe access to the accident victim. This repeated cycling of the chillers resulted in increasing temperatures within the data centre. As a precautionary measure they decided to take some customers' servers offline At that exact moment, Rackspace was 15 minutes into cycling up the data centre's chilling units. Again, the backup generators kicked in immediately, but the transfer to backup power triggered the chillers to stop cycling and then to begin cycling back up again during which time the temperature in the data centre continued to rise.

In order to make sure the increase in temperature did not damage customers' servers, Rackspace took them offline

The company is trying to determine what caused the initial mechanical failure in order to devise an action plan for the future. He said the company would notify customers as soon as that was accomplished.

It seems that Rackspace were overwhelmed by circumstances beyond their control but evidently exercised a planned response, their actions were not that of a panicked company wondering what to do next. They kept their customers and the media well informed and have been open in their assessment of what went wrong, working on putting it right.

 

 

However, organisations should realise that no matter what an outsourced service provider says about availability they should carry out a practical risk assessment and impact analysis of the issues surrounding a loss of service provision.

There is no such thing as a 100% guarantee  

Some web links for background reading on the issue

http://gigaom.com/2007/11/12/rackspace-outage-hits-home/

http://barry.wordpress.com/2006/09/09/when-100-is-not-100/

http://valleywag.com/tech/followup/rackspace-outage-affects-texas-isp-321900.php

http://www.theregister.co.uk/2007/11/13/rackspace_texas_truck/

http://www.webpronews.com/topnews/2007/11/13/rackspace-wrecked-by-wayward-truck-driver

 

 

Comments (0)add
Write comment
smaller | bigger

security image
Write the displayed characters


busy
 
< Prev   Next >