NVIDIA is at the forefront of pioneering transformations in the AI sector. In an online presentation last month, NVIDIA’s Chief Scientist, Bill Dally, shed light on the evolution of computer performance delivery, especially in a world moving past Moore’s law.
Unlike the past where consistent enhancements came mainly from reducing chip sizes, today’s progress necessitates consistent innovation and thorough validation of new components. This was the crux of Dally’s keynote at Hot Chips, an esteemed event for chip and system engineers.
“That’s been setting the pace for us in the hardware industry because we feel we have to provide for this demand,” said Dally during the talk.
Huang’s Law
Guiding a robust team of over 300 at NVIDIA Research, Dally has been instrumental in achieving an incredible 1,000x enhancement in single GPU performance for AI inference over the previous ten years. This remarkable progress, initially termed “Huang’s Law” by IEEE Spectrum in honor of NVIDIA’s founder and CEO, Jensen Huang, was spotlighted further by a Wall Street Journal column. This was NVIDIA’s answer to the skyrocketing prominence of vast generative AI language models, which expand tenfold annually.
Delving into specifics, Dally highlighted several elements contributing to this exponential gain. Most notably, a 16-fold improvement came from innovating simpler numeric representations used in calculations. NVIDIA’s latest Hopper architecture, equipped with its Transformer Engine, employs a unique blend of eight- and 16-bit integer and floating-point math, optimized for the demands of contemporary generative AI. The gains in performance and energy efficiency from this innovation were significant.
Furthermore, the team’s development of sophisticated GPU instructions led to a 12.5x improvement by streamlining tasks and energy use. With the introduction of the NVIDIA Ampere architecture, a novel technique called structural sparsity was added. This method, which refines AI model weights without hampering accuracy, provided a 2x performance surge and has potential for future enhancements.
As a result, computers can be “as efficient as dedicated accelerators, but retain all the programmability of GPUs,” said Dally.
Dally also touched upon the synergistic benefits of NVLink interconnects within GPU systems and NVIDIA’s system-wide networking, all amplifying the single GPU performance gains.
Interestingly, while NVIDIA transitioned its GPUs from 28nm to 5nm semiconductor nodes over ten years, it contributed to just 2.5x of the overall improvements. But Dally remains optimistic about the persistence of Huang’s law, even as the benefits of Moore’s law wane.
He forecasted several future enhancement avenues, including refining numeric representations, augmenting sparsity in AI models, and improving memory and communication circuits. In Dally’s view, the modern era of computer design offers NVIDIA engineers unparalleled opportunities for collaboration, innovation, and impact.
Fun Time to be a Computer Engineer
Because each new chip and system generation demands new innovations, “it’s a fun time to be a computer engineer,” he said.
Such a scenario is a marked shift from the previous era governed by Moore’s law, which posited a doubling of performance every couple of years due to chip miniaturization. This older principle, partially defined by Denard scaling (a concept from a 1974 paper by IBM’s Robert Denard), eventually encountered inherent physical constraints, including heat management challenges for continuously miniaturizing devices.
Featured image: Credit: NVIDIA