Nvidia is preparing to launch a new chip designed to speed up AI responses, breaking with its long-running habit of flogging the same processor for every job.
Nvidia chief executive Jensen Huang is expected to unveil a chip focused on āinferenceā, meaning running models rather than training them.
According to people familiar with the plans for GTC next week, the chip is the first new product to emerge from Decemberās $20bn deal to hire the founders of Groq, a start-up building ālanguage processing unitsā tuned for high-speed answers to complex AI queries.
Three months after that deal, Nvidia is expected to debut a Groq-based LPU to sit alongside its forthcoming flagship Vera Rubin graphics processing unit. It is part of a product family meant to head off challengers and meet new kinds of AI applications.
The move lands as the worldās most valuable company gets grief from start-ups and customers, such as Google, all busy cooking up their own AI chips. This week, Meta announced a new family of four inference-focused processors.
One Silicon Valley venture investor said: āWe are entering an interesting phase that is not āNvidia dominantā,ā
For the past three years, Nvidiaās $4.5tn market capitalisation has been built on its GPUs, which have become the backbone of generative AI. They train models such as the ones behind OpenAIās ChatGPT.
Huang has insisted that a single system can handle training and then run the chatbots and coding tools built on top. Big Tech has spent hundreds of billions deploying these boxes while funding their own specialised silicon.
But the growing sophistication of AI tools, including āagenticā coding systems, is pushing Huang to ditch the mantra that one GPU fits every workload.
The Groq deal was worth about $20bn, according to people familiar with the transaction, making it one of the biggest deals in Nvidiaās 33-year history. It includes licensing and the hiring of key talent, including Groq founder and former Google chip executive Jonathan Ross.
Groq, which had been working with Samsung to manufacture its products, previously bragged that its LPUs were faster and more efficient than Nvidiaās GPUs for inference. Nvidia clearly listened.
Nvidiaās flagship Blackwell and Rubin systems lean on high-bandwidth memory to cope with the massive data loads that AI models fling around. But HBM is expensive and in increasingly short supply as SK Hynix and Micron struggle to keep up with demand.
The Groq-style chip will use SRam rather than the dynamic Ram used for HBM, according to people familiar with Nvidiaās plans, because SRam is more available and better suited to speeding up AI āreasoningā tasks.
Bank of America reckons that by 2030, inference will account for 75 per cent of AI data centre spending, up from about 50 per cent last year, and it expects a ābroadened AI portfolioā at GTC.
Ā
Nvidia admits one GPU to rule them all was a fairy tale
The Hot Take: Nvidia starting to feel the heat of competition and see those $ evaporate as they try other vendors.