Revolutionizing Compute: Intel Arc, AMD, and Nvidia GPUs Team Up for Fluid Simulation

In a groundbreaking demonstration of cross-vendor GPU compatibility and sheer computational might, Dr. Moritz Lehmann, known as ProjectPhysX on Reddit, successfully harnessed the combined VRAM of an Intel Arc B580, an AMD RX 7700 XT, and an Nvidia Titan Xp. This extraordinary setup powered a large-scale fluid dynamics simulation, showcasing the immense potential of heterogeneous GPGPU (General-Purpose computing on Graphics Processing Units) environments and offering a glimpse into a future of hardware flexibility.

Key Takeaway: A recent experiment led by FluidX3D developer Dr. Moritz Lehmann successfully pooled 36GB of VRAM from Intel Arc, AMD, and Nvidia GPUs to execute a complex fluid simulation. Leveraging OpenCL, this setup bypassed traditional vendor lock-ins, demonstrating that mixed-brand GPUs can effectively combine their resources for demanding computational tasks, opening new avenues for efficiency and hardware utilization in GPGPU applications.

The Heterogeneous Multi-GPU Experiment Unveiled

While the term "SLI" (Scalable Link Interface) is typically associated with NVIDIA's proprietary multi-GPU technology for gaming, ProjectPhysX's configuration represents a true heterogeneous GPGPU setup. Here, multiple graphics cards from different manufacturers collaborate in parallel on a single, compute-intensive task. The specific cards involved were:

Intel Arc B580: A modern GPU from Intel's nascent discrete graphics lineup.
AMD RX 7700 XT: A current-generation card from AMD.
Nvidia Titan Xp: A high-performance, though older-generation, card from Nvidia.

Crucially, each of these GPUs contributed its 12GB of VRAM, cumulatively offering a substantial 36GB of pooled memory accessible for the demanding computational workload.

FluidX3D: The Enabler of Cross-Vendor Harmony

The success of this complex, multi-vendor setup was primarily attributed to the FluidX3D CFD (Computational Fluid Dynamics) software, developed from scratch by Dr. Moritz Lehmann himself. FluidX3D is specifically designed to leverage the power of OpenCL (Open Computing Language), an open, royalty-free standard for parallel programming across heterogeneous platforms.

OpenCL's Role: OpenCL's vendor-agnostic nature is key. It allows the FluidX3D software to communicate with and utilize the processing units of various GPUs, regardless of their manufacturer. The communication between GPUs in this setup occurs efficiently over the standard PCIe interface, rather than requiring proprietary interlinks like NVLink or CrossFire. Data is transferred from each GPU's VRAM to the CPU's RAM, where pointers are swapped, and then data is copied back to the respective GPUs' VRAM, allowing seamless inter-domain communication.
Optimized Workload Distribution: For the simulation of a crow in flight, which involved an astounding 680 million grid cells, the workload was intelligently partitioned. The simulation box was split into three distinct domains, with each 12GB GPU handling one segment. This optimized distribution is crucial for balancing the load and maximizing the efficiency of the combined hardware.

Demonstrated Performance and Broader Implications

The crow simulation, consisting of 45,705 discrete time steps, was completed in a mere 2 hours and 11 minutes. This impressive result underscores the practical viability and efficiency of heterogeneous multi-GPU computing for specialized tasks.

This experiment isn't an isolated case. Dr. Lehmann has previously showcased even more extreme setups, including one featuring 8 GPUs from three vendors pooling an astonishing 132GB of VRAM to tackle a 2.5 billion grid cell simulation of a Cessna-172 flight. This further solidifies the scalability and broad applicability of FluidX3D's approach.

Not for Gaming (Yet): It's crucial to note that while highly effective for GPGPU compute, such heterogeneous setups are generally not beneficial for video gaming. Gaming workloads are highly latency-sensitive, and managing different GPU driver stacks and architectures for real-time frame rendering presents significant challenges that OpenCL or similar APIs currently don't seamlessly address for this use case.
Ideal for Compute-Intensive Tasks: The true power of this approach lies in compute-intensive applications such as computational fluid dynamics, large-scale video processing, offline rendering, molecular dynamics (e.g., protein folding), and deep learning. For these workloads, pooling VRAM and processing power from multiple GPUs, regardless of vendor, offers significant advantages.
Energy Efficiency: FluidX3D itself has been highlighted for its remarkable energy efficiency, reportedly operating at a fraction (0.1% to 0.00001%) of the energy cost compared to some commercial CFD solvers.
Hardware Flexibility and Cost-Effectiveness: This type of heterogeneous computing frees users from vendor lock-in, allowing them to mix and match GPUs based on availability, price-to-performance, or specific VRAM requirements. This democratizes high-performance computing, enabling researchers and professionals to leverage diverse hardware to maximize their computational resources.

ProjectPhysX's continued work with FluidX3D, which is open-source and available on GitHub, is a testament to the evolving landscape of GPGPU. It demonstrates that with the right software layers, the sum of disparate hardware parts can indeed be greater than their individual capabilities, pushing the boundaries of what's possible in parallel computing.