One of many huge bulletins at AMD’s Information Heart occasion a few weeks in the past was the announcement of its CDNA2 primarily based compute accelerator, the Intuition MI250X. The MI250X makes use of two MI200 Graphics Compute Dies on TSMC’s N6 manufacturing node, together with 4 HBM2E modules per die, utilizing a brand new ‘2.5D’ packaging design that makes use of a bridge between the die and the substrate for top efficiency and low energy connectivity. That is the GPU going into Frontier, one of many US Exascale techniques due for energy on very shortly. On the Supercomputing convention this week, HPE, underneath the HPE Cray model, had a type of blades on show, together with a full frontal die shot of the MI250X. Many because of Patrick Kennedy from ServeTheHome for sharing these photographs and giving us permission to republish them.

.

The MI250X chip is a shimmed package deal in an OAM kind issue. OAM stands for OCP Accelerator Module, which was developed by the Open Compute Venture (OCP) – an business requirements physique for servers and efficiency computing. And that is the accelerator kind issue normal the companions use, particularly if you pack lots of these right into a system. Eight of them, to be precise.

It is a 1U half-blade, that includes two nodes. Every node is an AMD EPYC ‘Trento’ CPU (that’s a customized IO model of Milan utilizing the Infinity Cloth) paired with 4 MI250X accelerators. Every thing is liquid cooled. AMD mentioned that the MI250X can go as much as 560 W per accelerator, so eight of these plus two CPUs may imply this unit requires 5 kilowatts of energy and cooling. If that is solely a half-blade, then we’re speaking some critical compute and energy density right here.

Every node appears comparatively self-contained – the CPU on the precise right here isn’t the wrong way up given the socket rear pin outs aren’t seen, however that’s liquid cooled as effectively. What seems to be like 4 copper heatpipes, two on all sides of the CPU, is definitely a full 8-channel reminiscence configuration. These servers don’t have energy provides, however they get the facility from a unified back-plane within the rack.

The again connectors look one thing like this. Every rack of Frontier nodes will likely be utilizing HPE’s Slingshot interconnect cloth to scale out throughout the entire supercomputer.

Programs like this are undoubtedly over-engineered for the sake of sustained reliability – that’s why we’ve got as a lot cooling as you may get, sufficient energy phases for a 560 W accelerator, and even with this picture, you’ll be able to see these base motherboards the OAM connects into are simply 16 layers, if not 20 or 24. For reference, a funds shopper motherboard right now may solely have 4 layers, whereas fanatic motherboards have 8 or 10, generally 12 for HEDT.

Within the international press briefing, Keynote Chair and Professor world famend HPC Professor Jack Dongarra, urged that Frontier could be very near being powered as much as be one of many first exascale techniques within the US. He didn’t outright say it might beat the Aurora supercomputer (Sapphire Rapids + Ponte Vecchio) to the title of first, as he doesn’t have the identical perception into that system, however he sounded hopeful that Frontier would submit a 1+ ExaFLOP rating to the TOP500 listing in June 2022.

Many because of Patrick Kennedy and ServeTheHome for permission to share his photographs.



LEAVE A REPLY

Please enter your comment!
Please enter your name here