DEV Community

Hamza
Hamza

Posted on • Originally published at tekmag.thsite.top

AMD and Intel Join Forces on ACE: 16x AI Boost for x86 CPUs

For the first time in decades, AMD and Intel are speaking the same instruction-set language. The two rival chipmakers, working through the x86 Ecosystem Advisory Group (EAG), have unveiled the AI Compute Extensions (ACE) — a unified set of matrix instructions that delivers up to 16x the AI compute density of today's AVX10 instructions.

ACE is the first major output of the EAG, an industry body the two companies co-founded in 2024 to combat ARM's growing momentum and eliminate the instruction-set fragmentation that has plagued x86 for years. The result is a standardized matrix-acceleration architecture that promises to run identical AI code on any x86 CPU — whether it comes from Intel or AMD — without recompilation.

What ACE Brings to the Table

At its core, ACE introduces eight new 2D Tile Registers, each capable of holding a 16x16 matrix of 32-bit values. These tile registers sit alongside the existing AVX10 vector registers, but instead of processing data one dimension at a time, they handle full 2D matrix operations in a single cycle.

The key innovation is an outer-product multiplication algorithm. Each clock cycle, ACE processes two 16x4 input matrices, computing inner products across a 16x16 grid of processing elements. That means 1,024 multiplications per cycle — compared to just 64 for an unoptimized AVX10 implementation running at the same precision. The 16x density gain comes directly from that efficiency improvement.

According to a technical whitepaper published by the x86 Ecosystem Advisory Group, ACE integrates seamlessly with AVX10, providing what it calls a “low-friction and ubiquitous matrix acceleration capability.” Software can use existing AVX10 vector instructions to pre-process and format data, then hand it off to ACE for the heavy matrix lifting.

Why This Matters

Matrix multiplication is the mathematical backbone of modern AI — every neural network, from small on-device models to massive language models, spends the majority of its compute cycles on matrix operations. Today, CPUs rely on general-purpose SIMD instructions like AVX10 for these workloads, which leaves massive performance on the table compared to GPU tensor cores or dedicated NPUs.

ACE doesn’t aim to replace GPUs. As analyst Jim McGregor of TIRIAS Research told Network World, “The CPU will never be more efficient than the GPU/AI accelerator.” What ACE does is allow CPUs to handle AI workloads efficiently in scenarios where GPUs aren’t practical — embedded systems, edge computing, thin-and-light laptops, or real-time inference tasks where GPU activation overhead would be wasteful.

For data centers, the energy efficiency gains could be significant. Many inference workloads currently run on CPUs because the latency cost of moving data to and from a GPU isn’t justified. ACE makes that CPU-based inference substantially more power-efficient.

The x86 EAG’s First Major Win

The x86 Ecosystem Advisory Group was formed in 2024 with a clear mandate: prevent ARM from eating x86’s lunch by ensuring Intel and AMD platforms remain compatible and competitive. Before the EAG, developers targeting x86 sometimes had to ship separate code paths for Intel and AMD CPUs — a fragmentation that ARM’s unified architecture never suffered from.

ACE builds on earlier joint work to standardize APX (Advanced Performance Extensions). Together, these initiatives represent the most significant cooperation between the two x86 giants since the original x86-64 specification was developed in the early 2000s.

“I’m pleased to see the partnership between the two companies finally paying off,” McGregor added. “As expected, changes to the instruction set can take a generation or two to filter through the product lines of both companies. However, working together is a huge advantage for the x86 architecture.”

When Will ACE Arrive?

No CPUs with native ACE support have been announced yet. The specification is complete, and the whitepaper has been published, but hardware implementation typically lags instruction-set definition by 2-3 years. Industry speculation points to AMD’s Zen 7 architecture (expected around 2028) and Intel’s corresponding Nova Lake or later generation as likely candidates.

Software enablement is already underway. The x86 EAG has confirmed that work is in progress to add ACE support to major scientific computing libraries like NumPy and SciPy, as well as AI frameworks PyTorch and TensorFlow. This means the software stack should be ready by the time the first ACE-enabled hardware ships.

What Analysts Are Saying

The move has been widely welcomed by the developer community. On HackerNews, where the specification was trending on June 18, developers praised the unified approach, noting that standardized matrix instructions could reduce the need for platform-specific optimizations in scientific computing and ML workloads.

“ACE offers a significant increase in matrix multiply performance, scalability, and energy efficiency,” the whitepaper states, framing it as a long-term investment in the x86 ecosystem’s future. “The widespread adoption and high performance of x86 make it an ideal choice for developers; the addition of ACE to the ISA further strengthens the future of the x86 ecosystem.”

The Bigger Picture

The ACE announcement comes at a pivotal moment for the chip industry. AI workloads are driving unprecedented demand for compute, and CPU makers are racing to add specialized AI hardware. Apple’s M-series chips already include a Neural Engine, and Qualcomm’s Snapdragon X Elite features a dedicated AI accelerator. For x86 to remain competitive in the AI era, standardized matrix instructions aren’t optional — they’re essential.

AMD and Intel’s collaboration on ACE signals that both companies recognize this reality. By agreeing on a common instruction set, they eliminate a key advantage ARM has enjoyed: unified software compatibility. If ACE delivers on its promise, the next generation of x86 laptops, servers, and edge devices will handle AI workloads significantly faster without needing a discrete GPU.

Related on TekMag: GLM-5.2: Open-Source AI Model Beats GPT-5.5 for 1/6 the Cost — open-source AI continues to reshape the landscape. And Qualcomm CEO: AI Agents Will Replace Apps — the hardware race to power the next generation of AI.

Top comments (0)