Intel Arc Battlemage, Xe2 GPUs, and Changing Hyper-Threading
6/15/2024Intel Arc Battlemage, Xe2 GPUs, and Changing Hyper-Threading
Introduction
On June 3, 2024, Intel shared exciting details about its upcoming GPU and CPU microarchitectures at Computex 2024. The spotlight was on the Battlemage GPU architecture and the Lunar Lake CPU lineup. Let's dive into the key highlights and architectural changes.
Battlemage GPU Architecture
SIMD 16 and Improved Efficiency
Intel's Battlemage GPU architecture introduces a significant change: it operates with SIMD 16 (Single Instruction, Multiple Data) instead of the previous SIMD 8. This architectural shift offers efficiency benefits and better compatibility. Games are expected to run smoothly "right out of the box."
Xe2 Render Slice
Intel also unveiled the Xe2 Render Slice, which plays a crucial role in Battlemage. This new slice design promises improved performance and sets the stage for Intel's next desktop GPUs.
Utilization and Waste Reduction
Intel focused on enhancing utilization and reducing wasted processing. Changes to Hi-Z (hierarchical depth buffer) and culling help optimize resource allocation. The pixel back-end now boasts 2x shading throughput, benefiting scenes with transparencies.
Execute Indirect Support
Native execute indirect support is a game-changer. It's essential for next-generation games, especially engines like Unreal 5. Intel's commitment to day-one support aims to address usability shortcomings seen in previous Arc GPUs.
Compression Algorithm and Ray Tracing
Intel supports an 8:N compression algorithm on the pixel back-end, improving performance. In ray tracing, Xe2 now features 3 traversal pipelines, 2 triangle intersections per RT unit, and 18 box intersections per unit.
Lunar Lake P-cores ("Lion Cove")
No More Hyper-Threading
Intel's P-cores for Lunar Lake mobile CPUs undergo a microarchitectural overhaul. The most significant change? Hyper-Threading is gone. While Hyper-Threading boosts IPC by roughly 30%, it also requires duplication of architectural elements. The new P-cores deliver better performance per watt and area without Hyper-Threading.
Fine-Grain Clock Control
Intel introduces finer clock control intervals (16.67MHz steps). This allows better performance tuning, especially in power-constrained scenarios like laptops.
Memory Subsystem and Cache
Updates to the memory subsystem include an intermediate L1 cache, creating a 3-level hierarchy. Total per-core cache capacity sees slight improvements.
Lunar Lake E-cores ("Skymont")
4 Cores per Cluster
Intel's E-cores for Lunar Lake increase from 2 to 4 cores per cluster. These E-cores aim to compensate for the removal of Hyper-Threading on P-cores.
Improved Performance
Single-threaded performance shows gains in integer functions (38%) and floating point (68%). Multi-threaded performance benefits from both faster cores and increased core count.
Desktop Performance Comparison
Intel compares the new Skymont E-cores against the outgoing Raptor Cove P-cores. Although the comparison isn't entirely even due to scaling differences, it provides interesting insights.
Conclusion
As Intel continues to innovate, we eagerly await the arrival of these architectural changes in both mobile and desktop platforms. Stay tuned for thorough benchmarks once the updated P-cores hit the desktop scene!