Nvidia has signed a non-exclusive licensing agreement with Groq to use its technology for accelerating artificial intelligence inference. The deal is valued at US$ 20 billion which underscores the growing importance of inference as AI systems move from training to large-scale deployment.
The agreement, finalized on December 24, 2025, gives Nvidia access to Groq’s low-latency inference technology designed to run trained AI models efficiently and at lower cost. Several Groq team members, including founder Jonathan Ross and president Sunny Madra, will join Nvidia to help integrate and scale the technology inside Nvidia’s platforms.
In a LinkedIn post, Jonathan Ross, Chief Software Architect, Nvidia, stated “I’ll be joining Nvidia to help integrate the licensed technology. GroqCloud will continue to operate without interruption.”
Groq will continue to operate as an independent company. Simon Edwards has been appointed Chief Executive Officer, and GroqCloud services are expected to remain available without interruption. The licensing arrangement is focused on scaling the underlying technology rather than absorbing Groq’s operations or customer base.
The phase where trained models generate responses to real-time requests, has become one of the most critical and cost-sensitive parts of the AI stack. While training large models requires enormous compute resources, inference determines whether those models can be deployed economically across products and enterprises. As AI adoption shifts from experimentation to production, demand for faster and more efficient inference has increased sharply.
Groq has concentrated on building processors optimized for deterministic, low-latency performance in real-time workloads. Nvidia, whose GPUs dominate AI training, has been working to strengthen its inference offering as customers prioritize reliability, throughput, and cost control at scale.
According to Silicon Angle, Nvidia CEO Jensen Huang has told employees that Groq’s technology will be integrated into Nvidia’s AI factory architecture. The goal is to expand the platform’s ability to handle a wider range of inference-heavy and real-time applications, complementing Nvidia’s existing training-focused infrastructure.

