most popular:
2008 Holiday Gift Guides



most popular: Hot Home Wind Turbines


most popular:
$19k Electric Car in US


th comments
Tricina said: "Consumers feel duped with all the greenwashing that companies have thrown at them. Fuji Water says they are "green to the last drop", Clorox has "G..." [read]

العاب said: "It seems we will never bring these anti nature pesticides to an end as long as these companies are eager to make material gains at the extent of en..." [read]

bryan said: "I pick up a piece of litter then drop it on the ground again. Is this littering? Releasing CO2 that would be released anyway is even l..." [read]

James said: "2 things not addressed: 1. If we are more mobile, then when a city makes a bad decision, businesses will migrate out faster. Okay, competit..." [read]

e. laud said: "I cycled and camped in Scotland this year for a week in the highlands. All the water I drank came from small streams and the odd river. Some ..." [read]

Warning: 99.999% Uptime is a Myth, and a Planet Killer

by Mark Ontkush, Boston, Massachusetts, USA on 09.27.07
Science & Technology (electronics)

Globe-%26-Clock.gif

Uptime is a measure of the time a particular computer system has been "up" and running. Not surprisingly, it's the opposite of downtime, when a system is not operational. In geek cant, uptime is sometimes measured in nines; "Three nines" gives you 99.9% reliability, about 8 hours and 46 minutes of downtime a year. The gold standard is "Five nines" - 99.999% reliability, which translates to a total downtime of no longer than five minutes per year.

At first glance, it would seem that you want to keep your site up as much as possible, lots of nines. Problem is that it turns out to be exponentially expensive to do so, both cost-wise and e-wise, because massive redundancy is required to support more reliant systems. Instead of one server, you need several... instead of one 9 to 5 support employee, you need around-the-clockers... instead of one power line, you need several, and going to different substations. All this leads to a Malthusian eWaste accumulation, increased labor expenditure, and according to the Uptime Institute, unsustainable power consumption. And one more thing; it's impossible.

Fact is that five nines is simply impossible for a long period of time. Remember, 5 nines is 5 minutes of downtime a year. So, if your site is out for an hour, it must be up non-stop for the next 12 years to hit your number - wholly unlikely. Good news is that this is hardly a business killer; for example, according to Pingdom, even eBay is only good for three nines over a long period. Seems like they are doing fine.

The stark truth is that most companies don't need five nines. It sounds great in boardroom meetings ("If this saves one life, it's worth it!") and fills the pork barrel for those IT directors looking for a new project. But the numbers just don't make sense for most industries; studies show that three nine design is fine for most retailing applications. Yeah, if you are Walmart you need it for your transaction processing. Yeah, if you are LAX and a computer glitch leaves 17,000 passengers stranded, you need it. But for the vast majority of cases, there's not enough return on investment (ROI) to justify the cost to your business and the damage to the planet. Good enough is in fact good enough. :: Cnet

Comments (6)

I doubt TreeHugger, for example, runs on a single server.

Given that, most of the technologies used on a site of any size to ensure fail-over of a dead web server also exist to "load balance" requests across those servers under normal circumstances. So such systems not only ensure uptime, but provide a cushion against your site being overloaded by it's users during hours of peak use.

[yes, but TH doesn't profess five nines uptime mjo]

Same for database servers, only in that case a "cluster" usually provides backup and data redundancy as well as fail-over. And if you're a web business--or almost any business, for that matter--you need backups of your data. Lose them, and you die.

That expensive switch might serve the servers... and all of the rest of the users in the company. Have the wrong box go down during business hours, and ALL of your employees could end up sitting around twiddling their thumbs.

[yes, there are perceived costs to downtime. I'm sure they can do something in the few hours a year between three and five nines mjo]

As such, since all of these systems have multiple uses and fulfill multiple--and critical--business functions, saying they exist solely to provide boardroom "five nines" bragging rights is a bit of a fallacy.

Further, having setup MANY of these types of systems, I assure you that every one of the companies who've done so have done an extensive cost/benefit analysis beforehand, as no one wants to spend a single penny more on IT than needed.

[disagree, these problems are seldom fully thought through, the point. mjo]

jump to top Michael Long says:

This the wrong language. The goal is actually 100% availability which is entirely different than saying that something needs to be running 100% of the time.

jump to top Aron Rubin says:

As mentioned above, for most it will always come down to ROI. Execs and Sales people always want 100% uptime when possible but when accounting and IT show them what it will cost they usually realize that 99.9% is more then acceptable.

I work in the medical IT industry so when we have a downtime that could mean someones life and so far the best that has been done barely approaches 99.99%.

[good input thanks mjo]

jump to top James says:

You're setting up a straw man, TH.

"Most companies don't need five nines." Very true. And in fact, most companies recognize that as well. In fact, most companies don't strive for five nines. No large companies I have dealt with (up to 10K users in one of them) has made a stated goal to get better than four nines, and most would be quite pleased to reach three-and-a-half nines (about four hours 20 minutes) -- in the same way large construction or warehouse sites often keep track of how many days since the last safety infraction or job-related injury.

[good for them! maybe even two nines would do. mjo]

Everybody in the industry knows your great revelation that the cost to add those nines is exponential in hardware, and besides, usually the software just isn't up to it. And anybody who doesn't know these facts quickly gets replaced because they're spending too much money and not meeting their goals.

[from the other comments, looks like there is a lot of discussion as to how many nines are required mjo]

jump to top That Tall Guy says:

Actually,
While I agree with you on most of the impacts and costs observations, but something people overlook is that server consolidation to a mainframe will likely _lower_ power and cooling costs while drastically improving uptime.

But that discussion aside, I think you'd be surprised by the number of companies who do need a much higher % of uptime then even eBay.

Many financial institutions including banks and stock exchanges (yes that is plural as there are many) have government mandates which stipulate availability characteristics.

If YouTube goes down, it may not impact your day that much. If Gmail goes out you might be pissed you can't do business... Yet, if your bank goes down and you can't get cash necessary to pay for goods and services, like critical medication, not like a grande latte, then you can bet there will be reparations. Consider what might happen in the case of a medical service provider...

I work on this scale, so it's a more "typical" environment for me and I know the statement about five 9's being impossible to accomplish, is wrong. There are many technologies, and institutions which use them to accomplish that very goal.

On a final thought, outages are the dominant result (i.e. greater then 40%, probably more like 60%) of "human error", either operator mistakes, poor design or improper processes / proceedures.

Although technology gets the bum rap, in fact hardware and software are each only about 20% each the cause with the other %'s being Misc amounts like Natural disasters. (Check Gartner for a chart).

It's fairly easy to reduce the chance HW goes out, just 2x your infrastructure, even SW can be load balanced and data made redundant.

What's exponentially complicated are the human issues, as the number of people grows the resultant number of communication channels is; [N * (N-1)] / 2

jump to top Tatlow says:

Yikes, I work for a planet killer! (Stratus Technologies, in the interest of full disclosure)

Tatlow makes great points (not the least of which is the virtualization, many-eggs-in-one-basket discussion). Remember the recent air-traffic control problem? Human error, not system failure.

Not many applications require five nines, but for those that do there can be no substitute. Many put email in that category. How about the cable TV system in a children's hostipital? True story.

The fact is quad-core Intel-based fault-tolerant servers, running any Windows and Linux applications, easily achieve five nines, are priced between $14K-$45K, install as easily as an ordinary x86 server, and largely monitor and manage themselves.

jump to top Ken Donoghue says:

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

th ads
th top picks
th ads