Nvidia's Rubin CPX plans clouded as Groq gains bigger inference role

Industry Analysis

Nvidia’s stalling Rubin CPX isn’t merely a product delay—it reflects the collision of U.S. export controls and a paradigm shift in AI inference architecture. Tightened restrictions on advanced packaging from Taiwan, China have inflated supply chain costs for custom ASICs, while hyperscalers pivot to Groq’s deterministic-latency LPU for production-grade inference. This undermines Nvidia’s 'universal GPU + CUDA' moat in inference workloads. Over the next 12–24 months, the market will bifurcate: training remains dominated by Hopper/Blackwell, but inference will fragment across heterogeneous accelerators—Groq, Cerebras, and Chinese NPUs included. If Nvidia fails to deliver a cost-competitive dedicated inference chip by 2027, its end-to-end dominance in AI infrastructure will face its first structural crack.