Nvidia admits one GPU to rule them all was a fairy tale

Nvidia is preparing to launch a new chip designed to speed up AI responses, breaking with its long-running habit of flogging the same processor for every job.
Nvidia chief executive Jensen Huang is expected to unveil a chip focused on “inference”, meaning running models rather than training them.
According to people familiar with the plans for GTC next week, the chip is the first new product to emerge from December’s $20bn deal to hire the founders of Groq, a start-up building “language processing units” tuned for high-speed answers to complex AI queries.
Three months after that deal, Nvidia is expected to debut a Groq-based LPU to sit alongside its forthcoming flagship Vera Rubin graphics processing unit. It is part of a product family meant to head off challengers and meet new kinds of AI applications.
The move lands as the world’s most valuable company gets grief from start-ups and customers, such as Google, all busy cooking up their own AI chips. This week, Meta announced a new family of four inference-focused processors.
One Silicon Valley venture investor said: “We are entering an interesting phase that is not ‘Nvidia dominant’,”
For the past three years, Nvidia’s $4.5tn market capitalisation has been built on its GPUs, which have become the backbone of generative AI. They train models such as the ones behind OpenAI’s ChatGPT.
Huang has insisted that a single system can handle training and then run the chatbots and coding tools built on top. Big Tech has spent hundreds of billions deploying these boxes while funding their own specialised silicon.
But the growing sophistication of AI tools, including “agentic” coding systems, is pushing Huang to ditch the mantra that one GPU fits every workload.
The Groq deal was worth about $20bn, according to people familiar with the transaction, making it one of the biggest deals in Nvidia’s 33-year history. It includes licensing and the hiring of key talent, including Groq founder and former Google chip executive Jonathan Ross.
Groq, which had been working with Samsung to manufacture its products, previously bragged that its LPUs were faster and more efficient than Nvidia’s GPUs for inference. Nvidia clearly listened.
Nvidia’s flagship Blackwell and Rubin systems lean on high-bandwidth memory to cope with the massive data loads that AI models fling around. But HBM is expensive and in increasingly short supply as SK Hynix and Micron struggle to keep up with demand.
The Groq-style chip will use SRam rather than the dynamic Ram used for HBM, according to people familiar with Nvidia’s plans, because SRam is more available and better suited to speeding up AI “reasoning” tasks.
Bank of America reckons that by 2030, inference will account for 75 per cent of AI data centre spending, up from about 50 per cent last year, and it expects a “broadened AI portfolio” at GTC.