Keep it cool: the greening of Microsoft’s data centers
Microsoft’s first European “mega data centre”, which opened recently on the western outskirts of Dublin, uses 50 per cent less energy than a traditional data centre built some three years ago.
Why Dublin? It’s only common sense: if you want to keep computers cool, keep them in a cool, moderate climate.
The $500 million, 28,000 square metre facility makes extensive use of what is technically called “free-air cooling” (or what is locally called bad weather). Ireland’s mild, damp climate – summer temperatures rarely exceed 24 0C – means that cooling systems rarely need to be powered up.
“It’s going to save us a lot of money,” Microsoft International President Jean-Philippe Courtois said at the recent opening. Just as importantly, it will minimise carbon emissions. In many data centres, cooling accounts for about half of the total energy use.
Thus, typical data centres have a Power Usage Effectiveness (PUE) rating of 2 – the metric used to determine the energy efficiency of a data centre. PUE is determined by dividing the amount of power or energy entering a data centre (i.e., the utility bill) by the power or energy used to run the computer infrastructure within it. PUE is therefore expressed as a ratio, with overall efficiency improving as the quotient decreases toward 1. Microsoft’s new data centre in Dublin has a rating of 1.25, with a goal to reach 1.125 by 2012.
“Greening” the data centre has now become an important IT issue. Even though IT usage can displace activities that are less energy efficient – e-tailing [selling goods over the Internet] causes less emissions than retailing, for example – the IT industry’s carbon footprint is now on a par with that of the aviation industry. Data centres account for most of the growth.
Data centre greening has also become an important political issue. The European Union has produced a code of conduct for green data centres, to which Microsoft is signed up. The Dublin centre was built in accordance with the code and has been recognised by the European Commission’s Sustainable Energy Europe Campaign as a “best practice” in environmental sustainability design.
The US Environmental Protection Agency estimates that data centre energy consumption in the USA doubled between 2000 and 2006, by which time it had reached 61 terawatt-hours or about 1.5 per cent of total national consumption. The European Commission estimates that data centres in Western Europe consumed 56 terawatt-hours in 2007 – and that this figure will reach 104 terawatt-hours by 2020.
The ongoing shift to cloud computing, which concentrates hitherto highly distributed computing resources into a relatively small number of highly engineered, capital-intensive data centres, has been a major contributor to the sector’s recent surge in energy use.
Yet the very act of condensing computing infrastructure in the “cloud” has opened up opportunities for reducing that consumption, which could yield significant environmental and economic dividends.
The European Union has produced a code of conduct for green data centres, to which Microsoft is fully signed up.
Those opportunities reside right across the entire data centre infrastructure, at the level of hardware and software. “Microsoft’s software plus services business now depends on an ever-expanding network of data centres: hundreds of thousands of servers, many petabytes of data, hundreds of megawatts of power, and billions of dollars in capital and operational expenses,” says Dan Reed, Corporate Vice President for Extreme Computing at Microsoft Research in the US. “The commodity components and customised software currently used to build data centres and applications introduce capital, operating and energy inefficiencies, not only at Microsoft but across the entire computing industry.”
Feng Zhao, Assistant Managing Director at Microsoft Research Asia, began to look at how to tackle those inefficiencies about three years ago. “In some senses we were trailblazers in trying to start this effort,” he says. Outside of high-performance computing – where energy performance has always been an important consideration – the issue had not received much attention. “Traditionally, system designers do not think of energy as a first-class constraint – the primary goal is performance,” Zhao says.
Through a project called Data Center (DC) Genome, Zhao and his colleagues worked with the Global Foundation Services’ data centre infrastructure services team to develop a holistic understanding of the energy flows within a data centre, which can provide data centre operators with a precise picture of temperature and humidity (and other environmental parameters if required). “There are so many parameters that govern how heat is generated, extracted and removed,” Zhao says. Cooling accounts for about half of the energy used in a data centre. Optimising the process offers obvious pay-offs.
Instead of trying to model fluctuations in temperature across space and time, Zhao’s group is measuring them – at a fine-grained level of detail. The team has designed a robust wireless sensor network that captures multiple fingerprints of the data centre’s operating conditions. Server racks are peppered with low-cost sensors, called “genomotes”, which collect temperature and humidity data at multiple locations. These datapoints can be assembled into a three-dimensional map, which can be accessed through a browser-based application called DC Genome Explorer. The evolution of the system over time can be played back like a movie, and all of the information can be archived and called up at will.
Going green: the Dublin data centre is Microsoft’s most efficient to date, taking advantage of Ireland’s naturally cool climate.
About a thousand such sensors have been deployed across Microsoft data centres, and many more are on the way. “They have already had a material impact on the way data centres are managed,” Zhao notes.
As part of the DC Genome initiative, Zhao and his colleagues are also working on a new, energy-efficient approach to server management. Typically, a data centre’s traffic is distributed evenly across its entire server infrastructure, but Zhao’s team is developing a system that will allow data centre operators to power down to a core subset of servers that are needed at any given time. By analysing historic traffic patterns, it is possible to predict loads in advance and provision the appropriate number of servers – along with buffering capacity to cater for unpredicted traffic spikes.
“The question is how big is that buffer going to be,” he notes. If it’s too large, potential energy savings are wasted; if it’s too small, the end-user experience may be impaired. But Zhao reckons that energy savings of up to 30 per cent can be attained, while maintaining existing quality of service standards.
“With the energy sensors, you can achieve even more,” he adds. While load prediction and server provisioning algorithms can tell an operator how many server racks they can switch off, sensor data can help them identify which ones to select. “You want to turn off the machines at the hottest spots,” Zhao notes. Development of the server management technology is ongoing. “We don’t have this entire loop deployed,” he adds.
Trishul Chilimbi, leader of the Runtime Analysis and Design (RAD) group at Microsoft Research, is tackling data centre energy efficiency from the standpoint of software engineering. He aims to reduce unnecessary computation effort – and with it, unnecessary energy consumption – by introducing a systematic method of approximation while maintaining defined quality of service (QoS) levels, so the end-user experience remains unaffected.
The system – called Green – introduces approximations in elements of software code associated with particular tasks or functions. Green is implemented in a compiler that operates at two levels. A calibration phase builds a QoS model of the programme that incorporates the approximations. “It automatically builds a model of what is an appropriate level of approximation for your quality of service,” Chilimbi says. At runtime, the actual QoS loss is constantly sampled, to ensure that it remains within the defined limits. “So the level of approximation can change dynamically at runtime,” Chilimbi says. “The key thing is you have to maintain the QoS agreement.”
Apart from a small number of domains, such as aviation and healthcare, in which absolute precision is essential, he says, the approach is widely applicable and easily implemented. “The actual Green infrastructure is extremely lightweight – there’s no noticeable overhead at all.”
Lightweight it may be, but it offers a dramatic payoff, however. In rendering graphics, for example, a 5 per cent reduction in quality delivers a four-fold reduction in energy consumption. In a back-end implementation of Microsoft’s Bing search engine, a 0.27 per cent reduction in QoS improved performance by 22 per cent and lowered energy use by 14 per cent.
A second project, Gargoyle, is looking at opportunities for hardware-software co-design, as a means of introducing greater energy efficiency in hardware performance. “Hardware has a very myopic view of software,” says Chilimbi. It only ‘sees’ a low-level set of instructions. Conversely, programming languages do not communicate with hardware either. “The boundary has been this instruction set architecture which hasn’t changed for the past decade,” he says.
By fundamentally altering this paradigm, a major improvement in energy efficiency could be attained. “If it’s not at least an order of magnitude I think it’s not worth pursuing,” Chilimbi says. That would require a major shift in systems design. “To justify that, the gains have to be high enough.”
The stakes certainly are.
The Everest Project at Microsoft Research is tackling another energy problem: the overcapacity built into data centres to cope with bursts of activity at peak periods. Everest saves up to 60 per cent of energy used during idle times, and allows for better response times of servers during peak times by intelligent data management that offloads data from overloaded disks and consolidates data on disks with light loads. It also improves response times by up to 70-fold.