Background
Jensen Huang, CEO and founder of NVIDIA, has long maintained that the data center is the new unit of computing. It’s a view that resonates in our modern software-defined-everything cloud-native world. Compute should happen where it makes the most sense, close to the data, and where the results can have the most immediate impact. Instantaneous time-to-value from our data is what we all want, a desire that has fueled an explosion of AI at the edge.
Even if we accept the data center as our new unit of computing, it’s a metaphor brought to life with combinations of physical servers, storage, and networking. NVIDIA has traditionally served this world by delivering piece parts. NVIDIA’s GPUs, networking adapters, and DPUs power the platforms from every OEM and nearly every cloud service provider (CSP). That’s not enough. NVIDIA is moving up the stack, now providing the platforms that underpin our AI-hungry enterprise needs.
NVIDIA is a platform company. It’s been putting together the piece parts for years. In just the past three years, NVIDIA acquired Mellanox for its interconnect technology, Bright Computing for its management software, Excelero for its software-defined storage stack, and Cumulus Networks for its networking software. It famously tried to buy Arm, but that acquisition was unfortunately abandoned over antitrust concerns. That hasn’t stopped NVIDIA from developing its CPU, as the company is now in production with the Arm-architecture-based Grace CPU.
While NVIDIA has a history of building platforms for the embedded market with its Jetson, Clara, and DRIVE AGX solutions, the company has primarily serviced the data center market at the add-in accelerator level. The shining exception to that is NVIDIA’s DGX deep-learning supercomputer, which it brought to market in 2016.
This week at NVIDIA’s GTC event, it became clear that NVIDIA strives to own the entire platform for AI-infused analytics. The time is right, as accelerated AI is fueling a shift in how enterprises derive value from their data and how businesses operate and engage with their customers. These new use cases require more than traditional servers can deliver.
And who better to pull the pieces than NVIDIA, perhaps the only company on the planet that has everything needed to assemble a solution right under its roof? That’s Jensen’s thinking.
News: Growing Power in the Cloud
Consuming AI in the cloud makes sense. The machines required for deep learning are expensive. Unfortunately, they’re also often idle. The consumption-based pricing model public cloud providers offer aligns perfectly with accelerated-AI workloads.
DGX & DGX Cloud
NVIDIA introduced its turnkey DGX “deep learning supercomputer” in 2017, continuously updating it as new generation accelerators are introduced. The latest generation, the NVIDIA DGX H100, is a powerful machine. Incorporating eight NVIDIA H100 GPUs with 640 Gigabytes of total GPU memory and two 56-core variants of the latest Intel Xeon processor, all tied together with NVIDIA’s NVLink interconnects, the machine can deliver 32 petaFLOPS of performance.
This week at GTC, NVIDIA formally announced that it is making the DGX directly available through public cloud providers. NVIDIA’s Jensen said that “NVIDIA DGX is an AI supercomputer and a blueprint of AI factories being built around the world.” AI supercomputers are complex and time-consuming to build, and today we’re announcing the Nvidia DGX Cloud, the fastest and easiest way to have your own DGX AI supercomputer. Just open your browser.”
NVIDIA’s DGX comes in several a range of options. The offerings include the stand-alone NVIDIA DGX A100 and H100 systems, the NVIDIA DGX BasePOD reference architecture, and the NVIDIA DGX SuperPOD turnkey AI data center solution. Our current understanding of DGX cloud is that it is the DGX H100 that the CSPs offer and will be priced at about $3,600/month. DGX Cloud will be available on Oracle’s Cloud Infrastructure imminently, with Azure offering DGX Cloud sometime in Q2.
Azure & the Omniverse Cloud
NVIDIA’s Omniverse Cloud is a platform-as-a-service tuned for the full lifecycle of industrial Omniverse applications, flowing from design, development, deployment, and, ultimately, the management of Omniverse applications. Omnivese Cloud is built on NVIDIA’s OVX reference architecture.
NVIDIA OVX is purpose-built for powering the creation and operation of NVIDIA Omniverse applications at data center scale. The machine contains four NVIDIA L40 GPUs, 192 GB of GPU memory, two 32-core processors, 1TB of system memory, 8TB of NVMe-attached flash storage, and a combination of NVIDIA BlueField DPUs and NVIDIA ConnectX-7 smart NICs. It’s a beast of a computer.
This week, NVIDIA and Microsoft announced that Microsoft will offer NVIDIA Omniverse Cloud as a Microsoft Azure service. This will benefit Azure users, as it allows them to link real-time data from sensors in the physical world with the digital twins contained within the Omniverse. It’s a powerful capability, closing a gap in the cloud for the growing number of industries embracing digital twins.
News: New Inference Platforms
Generative AI has unleashed a frenzy of GPU-assisted inference needs. I’ve been on multiple calls with Wall Street analysts over the past sixty days who all want to understand the winners and losers, and whether this is a short-term bump or a long-term trend.
NVIDIA believes that the need for inference in the enterprise will only grow. This makes sense if you look across the landscape of inference use cases. Even before generative AI caught the public’s attention, image processing at the edge, recommender engines, and other inference applications have been riding an explosive growth curve.
This week NVIDIA expanded its reach into enterprise inference workloads with a range of new inference platforms. These are turnkey solutions, quite different from providing GPUs and other accelerators to be integrated into end-solutions. These new inference platforms all combine NVIDIA’s full stack of inference software with the latest NVIDIA Ada, Hopper, and Grace Hopper processors. NVIDIA tells us that these platforms are ideal for AI-assisted video processing, image generation, large language model deployment, and recommender inference.
NVIDIA released four inference platforms, each targeting a different workload:
· NVIDIA L4 for Video offers enhanced video decoding and transcoding capabilities for video streaming, augmented reality, generative AI, and related applications.
· NVIDIA L40 for Image Generation is optimized for graphics and AI-enabled 2D, 3D, and video image generation. The L40 is also the platform underpinning NVIDIA Omniverse.
· NVIDIA H100 NVL for Large Language Model Deployment is targeted at large language models, such as ChatGPT, at scale.
· NVIDIA Grace Hopper for Recommendation Models is ideal for deploying graph recommendation models, vector databases, and graph neural networks.
These new inference platforms are available through NVIDIA’s usual channel partners. It’s also available in the public cloud, with Google’s GCP announcing it will be the first CSP to offer L4-based instances. GCP is offering the new instances in preview beginning this week.
Analysis
The most impactful piece of NVIDIA’s platform announcements is hidden in the details. The top public cloud providers don’t buy turnkey systems, or, if they do, they don’t expose them as such to their customers.
The CSPs usually act as their own OEM, building servers optimized for their environment. Some of the CSPs even produce their own silicon. They do this to optimize and own their supply chain, but there’s also a bottom-line bonus. Manufacturing your servers allows you to reclaim the margin that would otherwise go to an OEM. Or to a chipmaker.
It’s significant that NVIDIA can break that tradition. DGX Cloud directly exposes NVIDIA’s turnkey system, placing it front-and-center to the CSP customer. DGX Cloud demonstrates that NVIDIA’s value trumps the CSP’s desire to roll their own, as they do with more traditional servers. It also shows customer pull for NVIDIA, demonstrating a dependence that works in NVIDIA’s favor – a dependence that will only deepen over time.
Jensen likes to say that AI is having its iPhone moment, which implies NVIDIA, much like Apple in 2007, is enjoying a moment of its own.