We already knew AMD would power the world’s fastest supercomputer – the US Department of Energy (DOE) El Capitan. Expected to be installed in 2023 at the Lawrence Livermore National Laboratory (LLNL), the HPE-built system initially leveraged AMD’s Zen 4 CPU cores and MI Instinct GPU accelerators, unlocking unheard-of performance above the 2 Exaflop mark. Yet there’s something that the initial announcement didn’t say: the system won’t be leveraging disparate CPU and GPU accelerators. Instead, confirming our speculation, El Capitan will be leveraging AMD’s recently-announced MI 300 Accelerated Processing Units (APUs). It marks the first time an APU is a supercomputer’s central processing grunt (opens in new tab)– and at Exascale, no less.
“It’s the first time we’ve publicly stated this,” said associate director for HPC (High Performance Computing) at LLNL, Terri Quinn. In a world-first disclosure in a presentation delivered today to the 79th HPC User Forum at Oak Ridge National Laboratory (ORNL), he added that the information came straight from the source: “I cut these words out of [AMD’s] investors document, and that’s what it says: it’s a 3D chiplet design with AMD CDNA3 GPUs, Zen 4 CPUs, cache memory and HBM chiplets.”
AMD’s MI300 APUs will feature CPU and GPU chiplets in the same 3D-enabled packaging with a coherent, HBM3 memory architecture, powered by the company’s 4th generation Infinity Fabric and next generation Infinity Cache. Leveraging both Zen 4 and the CDNA 3 graphics acceleration architecture, MI300 APUs will leverage TSMC’s 5nm process technology (likely N5 or N5P). However, the balance of CPU and GPU cores per APU is still a wild guess.
Being APUs, El Capitan will benefit from what’s likely to be the densest performance profile ever achieved in the world of supercomputing. Make no mistake: El Capitan will represent the pinnacle of semiconductor performance, design, and integration. It’s not hyperbolic to say that it’s likely to be one of humanity’s most technologically complex endeavors.
It is all thanks to tightly-packaged AMD APUs, bundled into HPE Cray XE racks and tied together with Cray’s Slingshot-11 networking, powered by its 16 nanometer Rosetta controllers that can dish out 200 Gb/sec interconnects. The form factor and the number of accelerators per rack is still question mark. When push comes to shove, Frontier should also become one of the most energy-efficient systems (if not the most efficient), with operating power limited to 40 MW for an optimal performance/power balance. Workloads will run through El Capitan’s circuits starting from 2Q 2024, with the planned end of life set for 2030.
AMD’s continued roll into the Top500 list of the world’s most powerful supercomputers keeps advancing at a breakneck pace. The company is steamrolling Intel’s previous dominance, already scoring five out of the world’s top ten supercomputers – including first place, thanks to Frontier – against Intel’s single Xeon-based system powering China’s Tianhe-2A, currently ranking ninth (opens in new tab)† The company has come a long way from its infamous and nearly company-breaking Steamroller architecture family.
Not all news is bad news for Intel, however, as the company too has earned an Exascale contract with the Argonne National Laboratory. The Aurora supercomputer, too, will be a 2-exaflops HPE-Intel system that has undergone several revisions already. Aurora’s installation is already underway, although the exact date it enters operation is still unclear. Intel’s delays on its Sapphire Rapids CPUs have already pushed the supercomputer’s installation, so it remains to be seen how long the execution will take.
Nvidia, too has a relevant presence in the world’s top-performing systems, although it currently only operates in the GPU provider space, scoring three systems powered by its GPUs. But it recently achieved an essential contract as a provider of both CPUs and GPUs for MareNostrum 5, to be installed in the Barcelona Supercomputing Center (BSC) in Spain. The operation could commence as early as 2023.
Sadly, Nvidia has already taken the “Superchip” nomenclature with its Arm-based Grace CPU product for High-Performance Computing (HPC) deployments. So perhaps AMD should be looking to claim an “Überchip” already?