Semiconductor News & Analysis Feed

24 articles
2026-06-16
developer.nvidia.com 2026-06-16 NVIDIA Developer
Mixture-of-experts (MoE) models have quickly become a foundational component of modern, large-scale AI systems. They are widely adopted because they enable substantially larger model capacity while activating only a subset of parameters for each token, offering an unparalleled approach for scaling performance within a practical compute budget. As model scales continue to grow, the optimization of
2026-06-13
developer.nvidia.com 2026-06-13 NVIDIA Developer
AI agents have fundamentally changed the complexity of inference workloads. Until now, the industry has struggled to define a standard for measuring how inference systems perform under these conditions. Artificial Analysis AgentPerf (AA-AgentPerf) offers the industry’s first multi-vendor open benchmarks profiling trajectories that are representative of real-world AI agent coding tasks.  This post
2026-06-11
developer.nvidia.com 2026-06-11 NVIDIA Developer
Developers building real-time AI—such as chat assistants, copilots, and agentic workflows—are often constrained by token-by-token generation speed. This limits responsiveness, increases serving costs, and makes fluid, interactive experiences difficult to achieve.   DiffusionGemma, created by Google DeepMind and optimized to run efficiently across NVIDIA platforms, introduces a new approach to tex
2026-06-10
developer.nvidia.com 2026-06-10 NVIDIA Developer
AI factories are changing what data-center infrastructure must do. Unlike traditional data centers, AI factories are built to manufacture intelligence at scale. They run power-dense training and inference workloads, increasingly support agentic and reasoning models, and must deliver predictable performance even as compute demand shifts rapidly. In this environment, electrical infrastructure is no
2026-06-10
developer.nvidia.com 2026-06-10 NVIDIA Developer
Federated learning (FL) research often begins with a deceptively simple question: What should we try next? A new aggregation rule, a FedProx coefficient, a server optimizer setting, a SCAFFOLD variant, or a model architecture tweak may all look promising before an experiment starts.  After the run finishes, the harder questions begin: Did the change actually improve the metric? Was the comparison
2026-06-09
developer.nvidia.com 2026-06-09 NVIDIA Developer
Pre-training frontier LLMs comes down to throughput. When training spans trillions of tokens across thousands of accelerators, every percentage point of step time can add up to days of training and substantial compute costs. Numerical precision is one of the highest-leverage knobs available, but low- bit mixed-precision pretraining is hard to get right. To address this, the NVFP4 training recipe
2026-06-04
developer.nvidia.com 2026-06-04 NVIDIA Developer
Single-turn chatbots are evolving into long-running agents that can reason, maintain context, use tools, and run efficiently across many turns to complete complex workflows. However, these multi-agent workflows cause token counts to grow quickly. Agents plan, call tools, invoke sub-agents, receive information, and then pass history, outputs, and reasoning steps back into the model continuously. A
2026-06-02
developer.nvidia.com 2026-06-02 NVIDIA Developer
As AI agents move from the digital world to the physical environment, they can readily use NVIDIA Jetson to accelerate real-world deployment with optimized memory and performance.  NVIDIA JetPack 7.2 directly supports one-command deployment of NVIDIA NemoClaw, an open source stack that adds privacy and security controls to OpenClaw. It introduces NVIDIA agent skills for Jetson—Jetson device-side
2026-06-01
developer.nvidia.com 2026-06-01 NVIDIA Developer
Physical AI systems must understand the real world before they can act within it. Robots, autonomous vehicles, and smart spaces need to understand what’s happening in their world, predict what’s likely to happen next, and generate actions for specific environments, embodiments, and tasks. NVIDIA Cosmos 3 is a frontier foundation model for physical AI that combines physical reasoning, world gener
2026-06-01
developer.nvidia.com 2026-06-01 NVIDIA Developer
The AI era is driving a new class of infrastructure: AI factories that transform data into intelligence for autonomous AI agents operating at unprecedented scale. Powered by accelerated computing, AI factories enable enterprises to train, fine-tune, and deploy AI with greater speed and efficiency.  This new class of infrastructure also introduces a fundamentally new attack surface spanning infras
2026-06-01
developer.nvidia.com 2026-06-01 NVIDIA Developer
__fail__
2026-06-01
developer.nvidia.com 2026-06-01 NVIDIA Developer
AI is now essential infrastructure, powered by AI factories that generate intelligence in the form of tokens. As demand grows, these factories must scale faster, operate more efficiently, and lower the cost of intelligence across the five-layer stack: energy, chips, infrastructure, models, and applications. NVIDIA DSX platform provides the complete playbook for designing, simulating, building, an
2026-05-29
developer.nvidia.com 2026-05-29 NVIDIA Developer
AI applications are moving beyond text generation to multimodal systems that can perceive, search, and reason across images, documents, video, and language in real time—turning fragmented information into actionable insights.   Step 3.7 Flash, the latest from StepFun, brings these capabilities to production and enterprise-scale, available on NVIDIA-accelerated infrastructure. It is a 198B-paramet
2026-05-28
developer.nvidia.com 2026-05-28 NVIDIA Developer
The cold-start problem In production inference deployments, demand fluctuates over time, requiring inference replicas to scale elastically. However, cold-starting inference workloads on Kubernetes can take several minutes. During that time, GPUs are allocated but idle, generating no tokens and serving no requests. This delay increases the risk of service level agreement (SLA) violations during t
2026-05-27
developer.nvidia.com 2026-05-27 NVIDIA Developer
Large language models (LLMs) are revolutionizing the financial trading landscape by enabling sophisticated analysis of vast amounts of unstructured data to generate actionable trading insights. These advanced AI systems can process financial news, social media sentiment, earnings reports, and market data to predict stock price movements and automate investment strategies with unprecedented accurac
2026-05-27
developer.nvidia.com 2026-05-27 NVIDIA Developer
NVIDIA CUDA 13.3 brings new capabilities and performance optimizations to developers across the CUDA ecosystem. The launch of NVIDIA CUDA Tile programming in C++, enables high-level, tile-based kernel development that automatically manages complex low-level GPU details for optimal performance and portability. Additionally, CUDA Tile programming is now supported on Compute Capability 9.0 (NVIDIA Ho
2026-05-27
developer.nvidia.com 2026-05-27 NVIDIA Developer
NVIDIA CompileIQ tackles one of the hardest problems in performance engineering: finding the compiler options that unlock the best performance for a specific workload. Consider a team that has spent weeks optimizing an LLM inference pipeline on GPUs, tuning batch sizes, quantizing to FP8, adopting flash attention, fusing every kernel they can. The profiler says there’s nothing left to squeeze. B
2026-05-21
developer.nvidia.com 2026-05-21 NVIDIA Developer
__fail__
2026-05-20
developer.nvidia.com 2026-05-20 NVIDIA Developer
__fail__
2026-05-15
developer.nvidia.com 2026-05-15 NVIDIA Developer
__fail__