SemiPulse | AI-Powered Semiconductor Supply Chain Intelligence & Market Signals

Semiconductor News & Analysis Feed

1 articles

2026-05-06

Replacing GPU Compute Dies With PNM-Enabled HBM Cubes For Long-Context Decode Attention (UCSD, Columbia, Yonsei U., NVIDIA, Samsung) - Semiconductor Engineering

0.92

semiengineering.com 2026-05-06 Semiconductor Engineering

A new technical paper, “AMMA: A Multi-Chiplet Memory-Centric Architecture for Low-Latency 1M Context Attention Serving,” was published by researchers at UC San Diego, Columbia University, Yonsei University, NVIDIA, and Samsung.Abstract“All current LLM serving systems place the GPU at the center, from production-level attention-FFN disaggregation to NVIDIA’s Rubin GPU-LPU heterogeneous platform. Ev