Scientists and engineers across various fields can tackle challenging problems more efficiently with NVIDIA CUDA-X libraries powered by NVIDIA GB200 and GH200 superchips. The announcement was made today at the NVIDIA GTC global AI conference, where developers can now benefit from enhanced automatic integration and coordination between CPU and GPU resources. This is made possible by CUDA-X working with the latest superchip architectures, resulting in up to 11x speedups for computational engineering tools and 5x larger calculations compared to traditional accelerated computing architectures.

This advancement significantly speeds up and enhances workflows in engineering simulation, design optimization, and more, enabling scientists and researchers to achieve groundbreaking results faster. NVIDIA introduced CUDA in 2006, opening up a world of applications to accelerated computing power. Over the years, NVIDIA has developed over 900 domain-specific NVIDIA CUDA-X libraries and AI models, making it easier to adopt accelerated computing and driving remarkable scientific breakthroughs. Now, CUDA-X extends accelerated computing to a wide range of engineering disciplines, including astronomy, particle physics, quantum physics, automotive, aerospace, and semiconductor design.

The NVIDIA Grace CPU architecture boosts memory bandwidth while reducing power consumption. Additionally, NVIDIA NVLink-C2C interconnects offer high bandwidth, allowing the GPU and CPU to share memory, enabling developers to write less specialized code, handle larger problems, and enhance application performance.

Accelerating Engineering Solvers With NVIDIA cuDSS

NVIDIA's superchip architectures enable users to maximize performance from the same GPU by efficiently utilizing CPU and GPU processing capabilities.

The NVIDIA cuDSS library is utilized to solve complex engineering simulation problems involving sparse matrices for applications like design optimization and electromagnetic simulation workflows. cuDSS leverages Grace GPU memory and the high-bandwidth NVLink-C2C interconnect to factorize and solve large matrices that wouldn't fit in device memory. This allows users to solve extremely large problems in a fraction of the time.

The shared memory between the GPU and Grace GPU minimizes data movement, reducing overhead for large systems. By tapping into Grace CPU memory and superchip architecture, users can accelerate the most intensive solution steps by up to 4x with the same GPU, using cuDSS hybrid memory.

Companies like Ansys and Altair OptiStruct have integrated cuDSS into their solvers, resulting in significant performance improvements for electromagnetic simulations and finite element analysis workloads, respectively.

Scaling Up at Warp Speed With Superchip Memory

The GB200 and GH200 architectures' NVLink-CNC interconnects provide CPU and GPU memory coherency, enabling memory-limited applications to scale on a single GPU.

Engineers can implement out-of-core solvers to process larger data by seamlessly reading and writing between CPU and GPU memories. For instance, Autodesk performed simulations of up to 48 billion cells using eight GH200 nodes, which is more than 5x larger than simulations possible with eight NVIDIA H100 nodes.

Powering Quantum Computing Research With NVIDIA cuQuantum

Quantum computers have the potential to accelerate problems in various science and industry disciplines. Simulating complex quantum systems is crucial for advancing quantum computing.

The NVIDIA cuQuantum library accelerates quantum workloads, integrated with leading quantum computing frameworks to enhance simulation performance without requiring code changes. The GB200 and GH200 architectures provide an ideal platform for scaling up quantum simulations, as they allow large CPU memory usage without compromising performance.

Overall, NVIDIA's advancements in superchip architectures and libraries are revolutionizing the field of engineering and quantum computing, enabling faster problem-solving and groundbreaking discoveries.