semiengineering.com
2026-05-06
Semiconductor Engineering
A new technical paper, “AMMA: A Multi-Chiplet Memory-Centric Architecture for Low-Latency 1M Context Attention Serving,” was published by researchers at UC San Diego, Columbia University, Yonsei University, NVIDIA, and Samsung.Abstract“All current LLM serving systems place the GPU at the center, from production-level attention-FFN disaggregation to NVIDIA’s Rubin GPU-LPU heterogeneous platform. Ev