Views: 0 Author: Site Editor Publish Time: 2025-04-11 Origin: Site
The exponential growth of AI computing power has ushered in unprecedented thermal challenges for modern data centers. High-performance AI accelerators like NVIDIA’s Blackwell GB200 and Meta’s Catalina platforms now demand cooling solutions capable of dissipating 4,000W+ per processor—far exceeding the limits of traditional air cooling. Liquid cooling technologies, particularly direct liquid cooling (DLC) and immersion cooling, have emerged as critical enablers for next-generation AI infrastructure. This article explores cutting-edge advancements in liquid cooling, including cold plate designs, immersion systems, and hybrid architectures, with real-world insights from industry leaders like CoolIT, Meta, Intel, and Alibaba Cloud.
DLC systems transfer heat by circulating coolant directly over heat-generating components. Cold plates—metallic blocks embedded with microchannel arrays—are core to this approach. Recent breakthroughs include:
·CoolIT’s 4000W Single-Phase DLC Cold Plate
Heat Dissipation: Achieves 4,000W thermal load at a heat resistance of <0.009 K/W, outperforming conventional solutions by 2x.
Split-Flow Technology: Coolant enters through microchannel midpoints, optimizing flow distribution to hotspots (e.g., GPU/CPU cores).
OMNI™ Monolithic Design: Full-copper cold plates eliminate brazing seams, reducing leakage risks and thermal resistance.
Industry Adoption: Deployed in Dell PowerEdge and Lenovo ThinkSystem servers for hyperscale AI workloads.
·Intel’s FSW-Based Cold Plates
Friction-stir welding (FSW) replaces traditional brazing, enhancing structural integrity and thermal performance. FSW reduces welding temperatures by 40%, minimizing thermal stress for high TDP processors.

·Single-Phase Immersion (SPILC):
Alibaba Cloud’s SPILC System: Utilizing electronic fluorinated liquids (EFL-3), SPILC captures 97.3% of heat at 6 L/min flow rates. Optimal counter-gravity flow improves heat transfer by 35% compared to co-flow configurations.
Meta’s Catalina AI Server: Combines air-cooled components (E1.S SSDs, OCP NICs) with liquid-cooled GB200 NVL72 GPUs. A hybrid approach balances efficiency (40°C coolant inlet) and scalability.
·Two-Phase Immersion (TPILC):
Fluids absorb heat via phase change (liquid-to-vapor). While TPILC improves COP by 72–79% over SPILC (Kanbur et al.), its complexity and costs limit adoption. Recent studies reveal TPILC underperforms cold plates at >300W/cm² fluxes due to critical heat flux (CHF) restrictions.

Parallel vs. split-flow designs:
·Parallel Flow: Simple but prone to hotspots.
·Split Flow: Midpoint entry divides coolant into radial streams, reducing thermal gradient by 61% (Intel).
·Meta’s Dual-Mode Cold Plates: Cold plates in Catalina servers support 3.9kW TDP via optimized microchannels and Frost LC-25 coolant (PG25-based).
·Copper vs. Aluminum: Copper offers 2.5x higher thermal conductivity but adds weight.
·Boiling Enhancement Coatings (BECs): Microporous copper layers boost heat transfer coefficients by 15x in TPILC systems.
·Coolant Distribution Units (CDUs):
Liquid-Liquid CDUs: Transfer heat from server loops to facility water (e.g., Boyd’s 100L/min systems).
Air-Liquid CDUs: Ideal for retrofitting air-cooled racks (e.g., CoolIT’s Dynamic Cold Plate).
·Alibaba Cloud’s Findings:
Counter-gravity flow reduces CPU temps by 33.8% over co-flow.
Optimal EFL-3 coolant (low viscosity) lowers thermal resistance by 10.5% vs. mineral oils.
Flow rate prioritization: Beyond 8 L/min, SPILC gains diminish due to pump overdrive.
·Meta’s Catalina Case Study:
Coolant Flow: 100 L/min at 15 PSI using Frost LC-25.
Hybrid Cooling: Combines rear-door cold plates (GPUs) with air-cooled front components.
·CHF Limitations: R1233zd refrigerants max out at ~80W/cm² (Google, 2024), making TPILC unfit for AI chips exceeding 300W/cm².
·Fluid Compatibility: Novec 649’s low boiling point (49°C) restricts ΔT in high-TDP scenarios.
Catalina’s design combines off-the-shelf GB200 NVL72 racks with custom cooling:
·Liquid-Cooled GPUs: Cold plates handle 1200W Blackwell GPUs.
·Air-Cooled Support Components: E1.S SSDs, OCP NICs cooled via traditional airflow.
·Power Efficiency: PUE of 1.15, 30% lower than air-cooled counterparts.
As seen in HPE’s 100% fanless DLC systems, AALC uses passive airflow to assist liquid loops during peak loads, reducing pump dependency.
·Phase-Change Materials (PCMs): Intel’s research on R134a refrigerant in microchannels reduces GPU temps by 34.5% at 6 L/min.
·Sustainable Fluids: Bio-based dielectric coolants (e.g., Shell Nature 3.0) lower GWP by 75% vs. synthetic fluids.
·AI-Driven Thermal Control: Google’s ML-based CDU optimization cuts cooling能耗 20% via predictive flow adjustments.
Liquid cooling is no longer optional for AI infrastructure. Cold plates dominate with their reliability in 4,000W+ scenarios, while SPILC provides scalable efficiency for hyperscale data centers. Hybrid systems like Meta’s Catalina highlight the industry’s shift toward adaptive thermal architectures, balancing performance, cost, and sustainability. As AI chip TDPs approach 5,000W by 2026, innovations in microchannel design, two-phase fluids, and AI-driven cooling will define the next frontier in thermal management.