Dell AI

Research Note: Dell Adds 20+ Features to its AI Factory

Dell Technologies announced more than 20 updates to its AI Factory portfolio ahead of next week’s SC25 event, spanning compute, storage, networking, and cooling infrastructure. The announcements center on three primary themes: expanded support for NVIDIA Blackwell GPUs across multiple server platforms, introduction of AMD MI355X-based systems, and deeper integration of automation tools across the infrastructure stack.

The most significant technical development is the introduction of The Dell PowerEdge XE8712 rack-scale system, which Dell positions as the industry’s first purpose-built rack-scale platform for NVIDIA GB200 NVL4 configurations.

The company also expanded its networking capabilities with Enterprise SONiC support for NVIDIA Spectrum-X Ethernet and introduced new high-radix switches capable of 102.4 Tbps throughput.

Storage enhancements include PowerScale software-only licensing and AI-optimized features in ObjectScale.

Compute Infrastructure Advancements

Dell’s introduced three new server platforms while expanding GPU support across existing systems. These announcements show Dell taking a multi-vendor approach, providing options across different AI workload types and deployment models.

PowerEdge XE8712 Rack-Scale System

The PowerEdge XE8712 is Dell’s entry into purpose-built rack-scale infrastructure for NVIDIA’s GB200 NVL4 platform. Dell says this system achieves industry-leading GPU density at 144 NVIDIA Blackwell GPUs per IR7000 rack through a 36-node configuration.

The rack-scale architecture integrates:

  • Factory-tested IR7000 Integrated Racks with pre-configured GB200 Grace Blackwell Superchip nodes
  • Rack-level serviceability features with quick-disconnect manifolds for maintenance operations
  • PowerCool liquid cooling with the new Rack-mount Coolant Distribution Unit (RCDU) providing 160 kW cooling capacity
  • Integrated Rack Controller (IRC) for unified rack-scale management and leak detection capabilities

The system targets large-scale HPC simulations and AI training workloads where GPU density and cooling efficiency become critical constraints.

AMD-Based AI Infrastructure

Dell introduced its new PowerEdge XE9785 server in both air-cooled and liquid-cooled (XE9785L) variants, powered by AMD Instinct MI355X accelerators paired with AMD Pollara 400 AI NICs.

Dell claims performance improvements of:

  • Up to 2.7x faster MLPerf model training performance compared to previous generations
  • 50% increase in GPU memory capacity for handling larger model parameters
  • Up to 44% higher memory bandwidth for data-intensive operations

The liquid-cooled XE9785L variant employs direct-to-chip cooling that Dell says eliminates up to 80% of system heat, reducing datacenter cooling costs.

PowerEdge R770AP: Build on Intel Xeon 6 Platform

The Dell PowerEdge R770AP with Intel Xeon 6 P-core processors targets HPC and latency-sensitive AI workloads. Dell claims 2.1x performance improvement for latency-sensitive applications such as high-frequency trading and real-time analytics.

This platform addresses workloads where CPU performance and memory capacity matter more than GPU acceleration, though specific benchmarks and workload characterizations would help organizations assess applicability to their use cases.

NVIDIA Blackwell Ecosystem Expansion

Dell expanded NVIDIA Blackwell support across multiple system types:

  • PowerEdge XE7740 and XE7745 servers now support NVIDIA Blackwell GPUs for enterprise AI deployments
  • NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs for agentic AI applications
  • AI PC ecosystem support for NVIDIA PRO Blackwell and RTX Ada GPUs

Storage Infrastructure

Dell’s storage announcements focus on AI workload optimization and deployment flexibility.

PowerScale Software Licensing and Features

Dell announced PowerScale availability as independent software licensing on qualified hardware, reducing vendor lock-in for organizations with existing storage infrastructure or those seeking hardware flexibility.

However, Dell has not yet disclosed which hardware qualifies for software-only licensing, compatibility requirements, or support implications when running PowerScale on non-Dell hardware.

Additional PowerScale updates include:

  • Parallel NFS (pNFS) support for improved concurrent access patterns in distributed AI training workloads
  • Integration with NVIDIA Inference Xfer Library (NIXL) from NVIDIA Dynamo to accelerate inference by offloading KV cache to PowerScale’s storage engine

The KV cache offloading capability addresses a common bottleneck in large language model inference where key-value cache data can quickly exhaust GPU memory. By offloading this data to high-performance storage, organizations may achieve better GPU utilization and support larger batch sizes.

ObjectScale AI Optimization

Dell introduced AI-optimized search capabilities in ObjectScale with S3 Tables and S3 Vector support. These features target AI workloads that require efficient metadata querying and vector similarity search for RAG applications.

S3 Vector support enables organizations to store and query embedding vectors alongside object data, potentially simplifying architecture for RAG pipelines by consolidating storage tiers.

Networking Infrastructure

Dell’s networking announcements center on high-bandwidth fabric capabilities and support for NVIDIA’s Spectrum-X Ethernet platform, addressing the critical challenge of keeping GPUs fed with data during distributed training operations.

High-Radix Switch Platform

Dell introduced the PowerSwitch Z9964F-ON and Z9964FL-ON switches with 102.4 Tbps switching capacity, supported by the latest Enterprise SONiC release. These switches enable what Dell describes as multi-plane two-tier network architectures scaling to over 100,000 GPUs.

Dell’s approach reduces component requirements by:

  • Up to 67% reduction in switch count by eliminating traditional three-tier architectures
  • 40% reduction in optical transceivers
  • Corresponding decreases in rack space, cabling complexity, power consumption, and cooling requirements

The move to higher-radix switches and simplified network topologies is part of a broader industry-wide trend as GPU clusters scale beyond traditional datacenter network architectures.

Enterprise SONiC and Spectrum-X Integration

Dell expanded its Enterprise SONiC Distribution to support NVIDIA Spectrum-X Ethernet platforms, combining Spectrum-4 switches with BlueField-3 SuperNICs. The integration provides:

  • RoCEv2 (RDMA over Converged Ethernet) for low-latency data transfers
  • Priority Flow Control (PFC) and Enhanced Transmission Selection (ETS) for congestion management
  • Dynamic load balancing and lossless Ethernet capabilities designed for multi-tenant GPU workloads

This move addresses a critical networking challenge for AI infrastructure, where traditional Ethernet often introduces variable latency and packet loss that can idle GPUs during distributed training. NVIDIA Spectrum-X delivers InfiniBand-like performance characteristics over Ethernet infrastructure, reducing networking costs while maintaining deterministic performance.

Cooling and Rack Infrastructure

Dell’s cooling announcements focus on liquid cooling capabilities required for next-generation GPU densities.

PowerCool Liquid Cooling Technology

Dell’s new PowerCool RCDU provides 160 kW cooling capacity per rack with what Dell describes as “unmatched space efficiency.” The system includes:

  • Sub-millimeter leak detection capabilities through its Integrated Rack Controller
  • Advanced telemetry for monitoring coolant flow, temperature differentials, and system health
  • Integration with OpenManage Enterprise for unified infrastructure management

Liquid cooling enables higher facility water temperatures and reduces energy consumption compared to air cooling, potentially lowering operational costs. Enterprises, however should evaluate total cost of ownership including:

  • Initial capital expenditure for coolant distribution units, plumbing infrastructure, and facility modifications
  • Ongoing maintenance requirements for coolant systems including fluid replacement and leak monitoring
  • Operational expertise needed for liquid cooling management and troubleshooting
  • Risk mitigation strategies for coolant system failures that could impact multiple racks

Analysis

Dell’s SC25 announcements see a comprehensive portfolio expansion across compute, storage, networking, and cooling infrastructure for AI and HPC workloads. The rapid evolution of its AI Factory portfolio shows Dell’s investment in addressing enterprise AI infrastructure requirements from multiple angles, including rack-scale systems, multi-vendor GPU support, advanced cooling, and simplified automation.

The Dell AI Factory announcements ultimately provide enterprises with expanded options for deploying GPU-accelerated infrastructure at scale, addressing genuine technical challenges around cooling, networking, and operational complexity. For enterprises with substantial AI infrastructure requirements and budgets to match, Dell’s comprehensive portfolio offers a viable path to deployment.

Competitive Outlook & Advice to IT Buyers

These sections are only available to NAND Research clients and IT Advisory Members. Please reach out to [email protected] to learn more.

Disclosure: The author is an industry analyst, and NAND Research an industry analyst firm, that engages in, or has engaged in, research, analysis, and advisory services with many technology companies, which may include those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.