Image of VAST Data logo

Quick Take: VAST Data’s Nvidia DPU-Based AI Cloud Architecture

A version of this Blog Post was previously published on Forbes.

VAST Data recently introduced a new AI cloud architecture based on Nvidia’s BlueField-3 DPU technology. The architecture is designed to improve performance, security, and efficiency for AI data services. The approach seeks to enhance data center operations and introduce a secure, zero-trust environment by integrating storage and database processing into AI servers. 

Nvidia DPUs Remove the Bottleneck

VAST Data is leveraging Nvidia’s BlueField-3 DPU to innovate within its AI cloud solution. A DPU is a specialized processor designed to offload, accelerate, and isolate data center workloads, enabling higher performance, increased security, and more efficient data processing.

VAST disaggregates its resources into Nvidia BlueField-3 DPUs. This means that the DPU takes over certain data processing tasks traditionally handled by the server, such as networking, security, and storage operations. By offloading these functions to the DPU, VAST can reduce the load on the main CPU, allowing it to focus on AI and machine learning computations.

Here’s how it works: using the Nvidia BlueField-3 DPU, VAST creates a parallel system architecture where storage and database processing services are embedded directly into AI servers.

This setup provides a dedicated, stateless container for each GPU server running the VAST parallel services operating system. It promotes true linear scalability of data services across a vast number of GPUs without the bottlenecks typically introduced by traditional x86 hardware and networking layers.

By removing the dependency on multiple layers of traditional hardware and leveraging the processing power of the DPU, VAST’s network-attached Data Platform infrastructure becomes significantly more efficient. This efficiency translates into what VAST tells us is a 70% reduction in the power usage and data center footprint for VAST infrastructure, contributing to overall energy consumption savings.

The approach also yields a nice benefit for GPU cloud providers with multi-tenant environments. With VAST’s zero-trust security model, the DPU enables data isolation and data management from the host operating system. By hosting data services on the DPU and utilizing standard client protocols, VAST minimizes potential attack vectors and ensures that data remains secure.

Analysis

When Nvidia launched its first BlueField DPU, based on technology acquired with Nvidia’s acquisition of Mellanox, the industry saw it as just another intelligent network adapter. It could offload expensive storage and networking tasks such as deep packet inspection or compression. But Nvidia proved that the accelerator is capable of much more.

Not long after Nvidia launched BlueField, VMware (now called “VMware by Broadcom”) took things a step further. It demonstrated that properly designed infrastructure software could leverage an Nvidia BlueField DPU to significantly boost overall system performance. In its vSphere 8.0 release, VMware moved critical elements of its vSphere Distributed Switch and NSX networking and observability stack to Nvidia’s DPU. VAST Data is now taking a similar approach.

The move towards a disaggregated computing model facilitated by the DPU technology is a significant departure from traditional, monolithic designs. By embedding the entirety of VAST’s operating system natively into an AI cluster, VAST capitalizes on the inherent strengths of Nvidia’s BlueField-3 DPUs and effectively transforms supercomputers into highly specialized AI data engines. This is a significant step towards removing storage bottlenecks in AI and similarly performance-sensitive environments.

Beyond offload, VAST’s zero-trust security model is a critical element. Today, AI training is often a “cloud first” environment, with organizations using GPU cloud providers to train models. VAST Data excels in this market, partnering with top-tier providers like Lambda, CoreWeave, and Core42. Multi-tenant environments like these require a robust and hardware-enforced security model, such as the one VAST Data delivers with its DPU-based architecture.

Large AI clusters are already moving away from traditional storage solutions that struggle to keep up with the increasing scale and performance required for AI workloads. In this market, VAST Data competes with companies like WEKA, which is also finding solid success in the GPU cloud market, and parallel file systems like IBM’s GPFS and the open-source Lustre.

The approach taken by VAST Data and Nvidia is a significant leap forward in optimizing data services for the unique demands of AI. Leveraging DPUs to further remove performance bottles in the data path is a significant differentiator for VAST Data as it competes in this hyper-competitive environment. With this announcement, VAST delivers a compelling and possibly game-changing solution for high-performance data.

Disclosure: The author is an industry analyst, and NAND Research an industry analyst firm, that engages in, or has engaged in, research, analysis, and advisory services with many technology companies, which may include those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.