Product attributes
Other attributes
The Nvidia H200 Tensor Core GPU is a graphics processing unit developed by Nvidia that implements the Hopper architecture and is the successor to the NVIDIA H100 Tensor Core GPU. The first chip to use Nvidia's HBM3e technology, the H200 offers larger and faster memory to accelerate generative AI and large language models (LLMs), while advancing scientific computing for high-performance computing (HPC) workloads with better energy efficiency and lower total cost of ownership. With HBM3e, the H200 delivers 141GB of memory at 4.8 terabytes per second, almost double the capacity and 2.4x more bandwidth compared to the NVIDIA A100 utilizing Nvidia's previous Ampere architecture.
Nvidia announced the GH200 "super chip" with a second Hopper architecture GPU with HBM3e on August 8, 2023. The H200 was unveiled on November 13, 2023, stating that the chip will be available from global system manufacturers and cloud service providers in the second quarter of 2024. Among the first cloud service providers to deploy the H200 will be Amazon Web Services, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure, in addition to CoreWeave, Lambda, and Vultr. The H200 GPU will be combined with the Grace CPU and an ultra-fast NVLink-C2C interconnect to produce Nvidia's GH200 chip. The first GH200 system to go live in the US will be the Venado supercomputer at Los Alamos National Laboratory. Another significant installation is the Jupiter supercomputer from the Jϋlich Supercomputing Centre. The supercomputer will house almost 24,000 GH200 chips for a combined 93 exaflops of AI compute.
Nividia states the H200 will lead to significant performance leaps, including nearly doubling inference speed on Llama 2 (70 billion-parameter LLM), compared to the H100 chip. The company plans to continue improving performance through software enhancements, including the release of open-source libraries like NVIDIA TensorRT™-LLM. The H200 provides a total of 141GB of HBM3e memory, running at around 6.25 Gbps effective, 4.8 TB/s of total bandwidth per GPU across the six HBM3e stacks. Compared to the H100, which offers 80GB of HBM3 and 3.35 TB/s of bandwidth.
The H200 will be available in NVIDIA HGX H200 server boards with four- and eight-way configurations, compatible with both the hardware and software of HGX H100 systems. These options allow the H200 to be deployed in different types of data centers, including on-premises, cloud, hybrid-cloud, and edge.