Skip

Available now: PCIM Magazine

Discover exciting developments shaping the industry and stay up to date with the latest in the world of power electronics.

Challenges and solutions for powering the latest processor generations in hyperscale data centers

1 Jun 2024

As AI models increase in size and complexity, training them is requiring more power per motherboard and per rack. At the same time, parallel processing by GPUs and TPUs – with 500 cores and more – means high operating currents, and very steep transient loads. In short, the rise of AI data centres is as much a challenge for the power supply system as it is for processing power. Tackling this will be the topic for Dr. Gerald Deboy in his keynote speech at this year’s PCIM Europe.

An interview with the PCIM Europe keynote speaker Dr. Gerald Deboy, Fellow,  Infineon Technologies Austria AG, Villach, Austria

lowres-Server-room
Training AI in hyperscale data centres presents new challenges not just for processing, but also the power supply to the rack, the motherboard, and the processor.

Gerald Deboy got his M.Sc. and Ph.D. degrees in physics from the Technical University Mu-nich, Germany. In 1994, he joined Infineon Technologies AG, Neubiberg, Germany, later moving to Infineon Technologies Austria AG, Villach, where he is Head of the Systems Innovation Group for Power Discretes and System Engineering, as Distinguished En-gineer and since last year as Fellow. He has authored and coauthored over 130 papers in national and international journals, as well as contributing to student textbooks. He holds more than 100 international patents, and with David James Coe and Tatsuhiko Fujihira is the inventor of the superjunction principle, which revolutionised energy savings in high-voltage switching converters. Its use from laptops to EV charging stations has likely saved over 3.4 trillion kWh to date.

Dr. Deboy, what do you think of the current challenges in power distribution and con-version for hyperscale data centers?

The challenges start with keeping the power distribution losses down at the higher power requirements of newer motherboards, which are already typically at 6 to 8 kW. This is mandating a 48 V backplane, compared to the traditional 12 V ecosystems. This helps us to decrease losses 16-fold, but it results in needing a first stage of conversion to an intermediate bus, before converting again at the processor. At the same time, the higher transient require-ments with much higher load currents of those processers, with 100s of cores, are driving the optimal intermediate bus voltage further down. To this end, at Infineon we are already providing hybrid, magnetically- and capacitively-coupled, switch converters, to transform to intermediate bus voltages at 6 or 8 V.

The 6 or 8 V levels do have the advantage of reducing switching losses, compared to 12 V. This enables higher switching frequencies, which further helps to better cope with the faster transient load requirements, without having to resort to too many capacitors.

Furthermore, we need to cope with higher power levels in the server power supply units. To achieve the power density necessary – and we have already been able to achieve more than 300 W/in3 for isolating DC/DC stages op-erating on 400V – means using wide bandgap devices: typically silicon carbide in the AC/DC stage, and gallium nitride in the DC/DC power conversion stages.

What do you see as the challenges and opportunities for the power needs of the newest generations of graphic and tensor processing units?

Infineon’s approach to powering the processor itself combines power stages with inductors, to create a voltage regulator module (VRM) that can be mounted on the rear of the processor, making it possible to provide the necessary power at the point of load. This vertical inte-gration is important in minimizing the losses that the lateral distribution at these kinds of current – with transient loads up to 1000 A – would entail. It also enables the best possible use of the constrained motherboard area.

Because the real challenge is the current demand, trans-inductor coupled voltage regulators (TLVRs) are a good path forward, combining all of the output inductors of a multiphase buck converter into one system. If one phase fires, it increases the output in all the phases connected to the loop, enabling the fast transient response necessary.

It’s worth noting that there has been a significant change in voltage levels here too. Most processors now have a silicon capacitor layer directly on the processor, typically removing the transient requirements beyond a 5 to 8 MHz bandwidth. This means the processor can operate in a lower voltage band: there is less need to have an overvoltage reserve to avoid breaching the undervoltage lockout limit of the processor. Being able to lower this band has the benefit of reducing the losses in the processor. Because these scale at the square of the voltage, going from, for example 0.8 to 0.7 V, reduces losses by a factor of 82 to 72, which is around a 25% saving. This is a useful additional budget that can be redeployed to overclock the processor.

What would you see as the next develop-ments in the power supply of processors and for data centers?

Of course, the new possibilities in powering the processors has had to be reflected in processor and motherboard design. Instead of logic and power lines sitting on top of each other, the logic cell can now be separated on the top of the processor, with the power lines on the underside, where it can connect to the VRM.

The next challenge is the increasing of power ratings for the power supply units, from 3, to 5 and to 8 kW, and for racks where 40 kW is already typical, though this can extend beyond 100 kW. This is driving the need to migrate from single- to three-phase power supply units. And then further out, distributing plus and minus 400 VDC, instead of AC. But this is further out, as we are working through some complications. You need circuit breakers on several levels, to be able to isolate single fail-ures. These are interesting challenges, and I am sure we will arrive at interesting solutions.

Visual-with-background-HD-v2
Infineon dual-phase power modules, specifically designed to meet the needs of GPUs and TPUs, for vertical integration at the point of load, minimising distribution losses and providing high speed transition responses.

The previous content is supported by our partners.