Cooling AI: Keeping Temps Down
www.informationweek.com
John Edwards, Technology Journalist & AuthorFebruary 11, 20256 Min ReadTithi Luadthong via Alamy StockData centers are one of the most energy-intensive building structures, consuming 10- to 50-times more energy per square foot than a typical commercial office building and accounting for approximately 2% of the nation's total electricity consumption, says Todd Grabowski, president of global data center solutions at Johnson Controls, an HVAC and facilities management firm, citing US Department of Energy statistics.In an email interview, Grabowski notes that a rapid shift to AI workloads is driving data center energy demand to record high levels, with AI tasks now consuming up to 10-times more power than conventional IT operations. High-performance computing racks will require 100 to 120 kilowatts (kW) per rack in the near future, he predicts.Data centers specifically designed to handle AI workloads generally rely on servers using a graphics processor unit (GPU), a device initially designed for digital image processing and to accelerate computer graphics. A major drawback of these systems is that they generate a high thermal design power (TDP), meaning they produce a large amount of heat per processor, per server, and per rack.AIs Thermal ImpactWhen running AI processes, GPUs can consume over a kilowatt of power, much higher than classical CPUs, which typically require a maximum of approximately 400 watts, says Nenad Miljkovic, a professor in the mechanical science and engineering department at the University of Illinois Urbana-Champaign. "Pure air cooling will not work for the majority of AI servers, so liquid cooling is required," he states in an online interview. "Liquid is better than air, since it has better properties, including higher thermal conductivity and heat capacity." Drawbacks, however, include higher cost, reduced reliability, and greater implementation complexity.Related:GPU-based servers are designed and used for high-performance computing, which can process substantial amounts of data quickly, Grabowski says. He observes that AI clusters operate most efficiently when latency is reduced by utilizing high-bandwidth fiber optic connections, strategically placed servers, and an optimized network topology that minimizes data travel distance. Grabowski predicts that most future data centers will feature dense racks generating a large amount of heat and packed into multi-story facilities.The real issue facing data center operators isn't cooling, but energy management, states David Ibarra, international regional leader with datacenter builder DPR Construction. "The industry has substantial operational experience in effectively cooling and managing cooling systems for large-scale data centers," he explains in an online interview. "The primary challenge facing AI datacenter operators is the increased power densities of GPU rack clusters within the server racks." Ibarra notes that cooling loads diversification requires managing not only new GPU racks, but also CPU-based racks, storage, and network racks. "Therefore, engineering and planning must consider the varying characteristics of cooling loads for each type of rack."Related:Seeking SustainabilityAs demand increases, a growing number of data center operators are transitioning from traditional air-cooling to a hybrid cooling system combining both liquid and air-cooling technologies. "This change is driven by the increasing demand for large AI GPU racks, which require liquid cooling to efficiently remove heat from their high-core-count processors," Ibarra says.To advance sustainability, Miljkovic suggests locating data centers close to renewable energy sources. "For example, near a nuclear power plant, where power is abundant, and security is good."Solar and wind power are often touted as solutions by green advocates yet aren't generally considered practical given the fact that new data centers can easily consume over 500 megawatts of power and frequently exceed a gigawatt or more. A more practical approach is using data center-generated heat, Miljkovic says. "All of the heat generated from the data center can be re-used for district heating if coolant temperatures are allowed to be higher, which they can [accomplish] with liquid cooling."Related:Additional AlternativesA growing number of AI data centers are being designed to mimic power plants. Some are actually being built on decommissioned power plant sites, using rivers, lakes, and reservoirs for cooling, says Jim Weinheimer, vice president of data center operations at cloud services provider Rackspace. "These [facilities] must be carefully designed and operated, but they have huge cooling capacity without consuming water," he observes via email.Local climate can also play an important role in data center cooling. Cold weather locations are increasingly favored for new data center builds. Lower ambient temperatures reduce the amount of cooling needed and, therefore, the need for water or other coolant required by the AI data center, says Agostinho Villela, Scala Data Centers' chief innovation and technology officer,in an online interview. Alternatively, closed loop systems can be used to conserve water, since they reduce the need to draw on external water sources. Data center heat recovery systems can also reduce the aggregate need for power by providing facility heat in the winter.AI-driven cooling optimization technology is also beginning to play a crucial role in sustainable data center operations. By deploying machine learning algorithms to monitor and manage cooling systems, data centers can dynamically adjust airflow, liquid flow, and compressor activity based on real-time thermal data. "This adaptive approach not only prevents energy wastage but also extends the lifespan of hardware by maintaining consistent and efficient cooling conditions," Villela says. "Such systems can even predict potential equipment overheating, enabling preemptive measures that reduce downtime and additional energy expenditures."Looking ForwardLimitations in chip size and density will eventually force data center operators to explore new designs and materials, including facilities that may completely change the way data centers operate, Weinheimer predicts. "It will be a combination of factors and new technologies that allow us to make the next leap in computing power, and the industry is very motivated to make it a reality --thats what makes it so exciting to be part of this industry."Considering the number of cooling methods being tested and evaluated, the only thing that seems certain is continued uncertainty. "Its a bit like the Wild West," Miljkovic observes. "Lots of uncertainty, but also lots of opportunity to innovate."Read more about:Cost of AIAbout the AuthorJohn EdwardsTechnology Journalist & AuthorJohn Edwards is a veteran business technology journalist. His work has appeared in The New York Times, The Washington Post, and numerous business and technology publications, including Computerworld, CFO Magazine, IBM Data Management Magazine, RFID Journal, and Electronic Design. He has also written columns for The Economist's Business Intelligence Unit and PricewaterhouseCoopers' Communications Direct. John has authored several books on business technology topics. His work began appearing online as early as 1983. Throughout the 1980s and 90s, he wrote daily news and feature articles for both the CompuServe and Prodigy online services. His "Behind the Screens" commentaries made him the world's first known professional blogger.See more from John EdwardsNever Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.SIGN-UPYou May Also LikeWebinarsMore WebinarsReportsMore Reports
0 Comments ·0 Shares ·59 Views