NVIDIA and Oracle Collaborate to Revolutionize AI and Data for Enterprises
NVIDIA and Oracle Cloud Infrastructure unveiled the first zettascale supercluster at Oracle CloudWorld, which will use over 100,000 NVIDIA GPUs, including the new Blackwell platform, to accelerate AI model training and deployment.
The NVIDIA Blackwell platform accelerates the first zettascale OCI Supercluster, which Oracle Cloud Infrastructure (OCI) announced at the Oracle CloudWorld conference today.
We designed this cluster to help enterprises train and deploy next-generation AI models using over 100,000 of NVIDIA’s most recent GPUs.
Using OCI Superclusters, customers have the option to choose from a diverse selection of NVIDIA GPUs and install them in a variety of locations, such as on-premises, in the public cloud, and on sovereign cloud infrastructure.
The Blackwell-based systems should be available in the first half of 2019. They can handle up to 131,072 Blackwell GPUs with NVIDIA ConnectX-7 NICs for RoCEv2 or NVIDIA Quantum-2 InfiniBand networking. This allows them to serve the cloud with an astonishing 2.4 zettaflops of peak AI computing.
The show also featured NVIDIA GB200 NVL72 liquid-cooled bare-metal instances to help power generative AI applications. The 72-GPU NVIDIA NVLink domain has grown, which lets the instances train and infer trillion-parameter models in real-time on a large scale.
The NVIDIA HGX H200, which utilizes NVLink and NVLink Switch to connect eight NVIDIA H200 Tensor Core GPUs in a single bare-metal instance, will be available from OCI this year. With NVIDIA ConnectX-7 NICs over RoCEv2 cluster networking, it can handle up to 65,536 H200 GPUs.
This instance is available for customers who wish to expedite their training workloads and provide real-time inference at scale. Additionally, the OCI stated that NVIDIA L40S GPU-accelerated instances for medium-sized AI tasks, NVIDIA Omniverse, and visualization are now generally available.
NVIDIA GPUs Power Oracle’s Edge Solutions
NVIDIA GPUs accelerate Oracle’s edge offerings, enabling scalable AI at the edge in disconnected and remote locations. This applies for single-node to multi-rack solutions. For example, Oracle’s Roving Edge Device v2 now supports up to three NVIDIA L4 Tensor Core GPUs in smaller-scale deployments.
Businesses are employing OCI clusters with NVIDIA technology to advance AI innovation. Reka, a foundation model startup, is employing the clusters to develop advanced multimodal AI models for enterprise agents.
“Reka’s multimodal AI models, built using OCI and NVIDIA technology, empower next-generation enterprise agents to understand our complex world through reading, seeing, hearing, and speaking,” said Dani Yogatama, the co-founder and CEO of Reka.
They went on to say, “With the NVIDIA GPU-accelerated infrastructure, we can effortlessly manage very large models and extensive contexts, all while facilitating the efficient scaling of dense and sparse training at the cluster level.”