Intel Arc Battlemage, Xe2 GPUs, and Changing Hyper-Threading

Intel Arc Battlemage, Xe2 GPUs, and Changing Hyper-Threading

Introduction

On June 3, 2024, Intel shared exciting details about its upcoming GPU and CPU microarchitectures at Computex 2024. The spotlight was on the Battlemage GPU architecture and the Lunar Lake CPU lineup. Let's dive into the key highlights and architectural changes.

Battlemage GPU Architecture

SIMD 16 and Improved Efficiency

Intel's Battlemage GPU architecture introduces a significant change: it operates with SIMD 16 (Single Instruction, Multiple Data) instead of the previous SIMD 8. This architectural shift offers efficiency benefits and better compatibility. Games are expected to run smoothly "right out of the box."

Xe2 Render Slice

Intel also unveiled the Xe2 Render Slice, which plays a crucial role in Battlemage. This new slice design promises improved performance and sets the stage for Intel's next desktop GPUs.

Utilization and Waste Reduction

Intel focused on enhancing utilization and reducing wasted processing. Changes to Hi-Z (hierarchical depth buffer) and culling help optimize resource allocation. The pixel back-end now boasts 2x shading throughput, benefiting scenes with transparencies.

Execute Indirect Support

Native execute indirect support is a game-changer. It's essential for next-generation games, especially engines like Unreal 5. Intel's commitment to day-one support aims to address usability shortcomings seen in previous Arc GPUs.

Compression Algorithm and Ray Tracing

Intel supports an 8:N compression algorithm on the pixel back-end, improving performance. In ray tracing, Xe2 now features 3 traversal pipelines, 2 triangle intersections per RT unit, and 18 box intersections per unit.

Lunar Lake P-cores ("Lion Cove")

No More Hyper-Threading

Intel's P-cores for Lunar Lake mobile CPUs undergo a microarchitectural overhaul. The most significant change? Hyper-Threading is gone. While Hyper-Threading boosts IPC by roughly 30%, it also requires duplication of architectural elements. The new P-cores deliver better performance per watt and area without Hyper-Threading.

Fine-Grain Clock Control

Intel introduces finer clock control intervals (16.67MHz steps). This allows better performance tuning, especially in power-constrained scenarios like laptops.

Memory Subsystem and Cache

Updates to the memory subsystem include an intermediate L1 cache, creating a 3-level hierarchy. Total per-core cache capacity sees slight improvements.

Lunar Lake E-cores ("Skymont")

4 Cores per Cluster

Intel's E-cores for Lunar Lake increase from 2 to 4 cores per cluster. These E-cores aim to compensate for the removal of Hyper-Threading on P-cores.

Improved Performance

Single-threaded performance shows gains in integer functions (38%) and floating point (68%). Multi-threaded performance benefits from both faster cores and increased core count.

Desktop Performance Comparison

Intel compares the new Skymont E-cores against the outgoing Raptor Cove P-cores. Although the comparison isn't entirely even due to scaling differences, it provides interesting insights.

Conclusion

As Intel continues to innovate, we eagerly await the arrival of these architectural changes in both mobile and desktop platforms. Stay tuned for thorough benchmarks once the updated P-cores hit the desktop scene!