Nvidia Arm Superchips to Power $160 Million Supercomputer in Barcelona

Nvidia’s Grace superchip made waves when introduced earlier this year, as the company promised a supercharged Arm-based product that could take on Intel and AMD’s x86 dominance in the High-Performance Computing (HPC) space. Now, as reported by HPC Wire, the company has snagged a $160 million contract (~€151 million) to provide the brains and brawn of supercomputing hardware for one of EuroHPC’s supercomputing projects. The MareNostrum 5 (MareNostrum roughly translates to “our sea”) will be installed in the Barcelona Supercomputing Center (BSC) in Spain and will be operational as early as 2023.

Mare Nostrum 5 is being built as part of the EuroHPC JU project, and is expected to offer peak performance of 314 Petaflops of FP 64 computing power across both CPU and GPU accelerators, with 200 Petabytes of storage for in-access workloads, and a further 400 Petabytes of cold storage. Following trends in HPC architecture design and other projects across the EuroHPC project, it’s expected that the 200 Petabyte node will be kept in a fast, NAND-based storage subsystem, while the cold storage node (also called active storage, referring to data that’s crucial but not frequently accessed) will likely make use of more cost-effective, classical HDD topologies.

Nvidia’s dual-Grace superchip board design. Based on TSMC’s 5nm manufacturing process, Grace supports all of the latest connectivity tech, like PCIe Gen 5.0, DDR5, HBM3, CCIX 2.0 and CXL 2.0. (Image credit: Nvidia)

The system will employ Nvidia’s 144-core, Arm-based Grace “superchips” in dual-chip configurations, paired with the company’s H100 (Hopper) discrete GPU accelerators (which feature 80 billion transistors apiece with 80 GB of HBM3 memory and 3.2 TB/ s bandwidth). As a result, MareNostrum 5 is projected to deliver more than 18 Exaflops of AI acceleration (typically FP8 8-bit floating-point operations), making it the fastest AI supercomputer in the European Union. Besides Nvidia’s chip tech, the company’s Quantum-2 (aka NDR) InfiniBand software-defined networking (opens in new tab) leveraging the company’s BlueField data processing units (DPU) to keep all components talking at low latency with a high throughput of 400 GB/s – not unlike the performance achieved by Cray’s Slingshot interconnect.

New Technology Era

Leave a Reply

Your email address will not be published.