Tier ratings: Part of a much larger puzzle
A critical debate over the Uptime Institute’s Tier Classification System has long been simmering in the data center industry. It is now time to bring this discussion back to a boil.
The tier rating system is often referred to as the industry benchmark for setting design standards and predicting site reliability. A data center’s tier rating can suggest how much downtime to expect and if maintenance is being done properly.
But reliance on the tier ratings alone can be misleading. The expectation is that design and construction guarantee uptime reliability. Unfortunately, more often than not, that is not the case.
“You can build a Tier III facility and build a Tier II facility, and put them down the block from each other, and if the Tier II facility is managed and operated better, it could achieve a better result than a Tier III facility that is not as well operated,” argues Robert McClary, Senior Vice President and General Manager of Denver-based data center FORTRUST.
The Uptime Institute is aware of this limitation. As a first step, they have limited Tier Certification of Design Documents awards issued after Jan. 1, 2014 to two years in order to, “thwart improper or unintended consequences from the tiers,” such as touting Tier III design certifications and then not building to them, says Meredith Courtemanche, Site Editor of SearchDataCenter.
The institute also is expanding its Management and Operations, M&O, Stamp of Approval program “to outline, prioritize and weigh behaviors necessary for data center owners and operators to achieve the maximum uptime for their existing data centers.” According to Uptime’s Christopher Brown, “adherence to the Management and Operations behaviors has been proven to minimize opportunities for human error – the number one cause of data center downtime.”
McClary contends the industry spends too much time, effort and talk on data center design, a “one-time, point-in-time assessment.” He believes the long haul counts for more. “Data Centers are long-term assets, and the discussion has to be around management operations, lifecycle and risk mitigation more so than a tier rating of the facility,” McClary says.
The key fact is that once the data center is designed and built, inevitably, it is operated by humans.
“No tier standards that I’m aware of are capable of addressing or removing human error, mistakes, oversights, shortsightedness and other human stupidity from the design of data center facilities,” says Stephen J. Bigelow, Senior Technology Editor in the Data Center and Virtualization Media Group at TechTarget Inc.
SearchDataCenter’s Courtemanche advises data center customers to “look at the rating as step one, then dig deeper into data center power design, redundancy, connectivity and track record. Just like one Tier II facility is not necessarily equal to another Tier II facility, the same applies for M&O stamps or any other blanket designation.”
As for industry, McClary calls current conversation over Tier III and Tier IV certifications “out in the weeds,” a distraction from the real issues. It perpetuates the industry’s tendency to mitigate risks through over-provisioning. He says tiering exemplifies “a ‘break-fix’ philosophy: ‘If I design my data center with so much redundancy and so much resiliency, then the only thing that can take it down is human error.’”
Yet a 2013 study said a plurality of one-third of data center managers named human error as the most likely cause of downtime. Past studies have attributed 75 percent of downtime to human error.
Human error is why some argue that tier certifications are a part of a much larger puzzle.
“In this industry we tend to look at the root cause of downtime, and if it is human error or is some element that we don’t think we can easily control, we try to design around it, throw money at it, throw people at it,” McClary says.
Instead, he advocates “having a conversation on how you manage and operate in order to avoid or mitigate or eliminate human error.”
McClary does not accept human error as inevitable. “You can train and you can create an organizational structure that mitigates or eliminates human error if you are willing to put forth the processes, the structure and the discipline it takes to do that.”
Operations is about mindset, McClary adds, plus “ownership discipline, training, and positive work environment. A lot of those can be measured; they can be viewed, they can be audited.”
He says the best bottom-line metric of data center success is simple. “There’s really only one true measurement for data center, and that is the years of continuous uptime that you deliver against the number of unplanned outages that have occurred, and it’s that simple,” FORTRUST’s McClary says. “If you’re going to measure something it should be based on a result, not on a prediction.”
It seems almost too easy for McClary to argue for a results-based measurement regime. As of 2014, FORTRUST had logged more than 12 years of continuous uptime. Yet McClary knows it’s the pursuit of perfection that counts.
“Nothing is 100 percent. I know the math is inevitable — at some point in time that clock resets. But that is the true measure of the data center — everything else is predictive.”