Groq’s AI Chips Are Making NVIDIA’s GPU Pricing Look Less Untouchable

NVIDIA is still the boss fight in AI hardware, but the gap is starting to look less impossible for challengers.

According to comments from a Nebius expert in an AlphaSense interview reported by Wccftech, alternative AI chips are gaining more attention as the industry moves away from simply thinking in GPU rental hours and starts caring more about cost per million tokens.

That sounds very enterprise-core, but it matters more than you think. Tokens are basically the unit AI models process when they read, write, summarise, code, generate NPC dialogue, power chatbots, or run backend tools. If the industry prices AI by tokens instead of raw GPU time, suddenly the winner is not always the most powerful chip. Sometimes, it is the chip that can process enough output at the lowest cost.

NVIDIA GPUs are still expensive, especially on demand

The Nebius expert said pricing depends on the GPU type and whether customers reserve capacity or rent it on demand.

For on-demand AI compute, the figures given were:

NVIDIA H100: US$2.95 per hour
NVIDIA H200: US$3.50 per hour
NVIDIA Blackwell B200: between US$4.90 and US$6.50 per hour

Reserved capacity brings the price down, but only if a company is playing at massive scale. For one-to-two-year contracts involving at least 10,000 GPUs, the quoted prices were US$1.50 per hour for H100, US$2.20 for H200, and at least US$3.50 for B200.

That is hyperscaler money, not your average Malaysian startup money. For SEA companies building AI tools, game services, localisation systems, support bots, or creator platforms, these costs can quickly become brutal once usage scales.

The big shift: inference is now the main workload

The interesting bit is where demand is coming from. The Nebius expert estimated that inference now makes up around 90% to 95% of enterprise AI workloads.

In simple terms, fewer companies are training their own giant models from scratch. Most are using pre-trained models or APIs and spending money when those models generate answers. That is inference.

For gaming and esports-adjacent businesses, this is very relevant. Think AI moderation for toxic chat, automated highlight clipping, match summaries, customer support, content translation, NPC dialogue tools, and analytics dashboards. These are not always training-heavy tasks. They are usage-heavy tasks.

Groq’s pitch: cheaper tokens, faster output

This is where Groq enters the story. NVIDIA announced near the end of 2025 that it had signed a non-exclusive licensing agreement with Groq covering the startup’s AI inference technology. Wccftech notes that, according to the Nebius expert, Groq’s chips can cost around five to 10 cents per million tokens.

By comparison, NVIDIA’s B100, B200, or B300 GPUs were described as costing about 25 cents per million tokens. The same expert also said Groq’s chips can deliver up to 800 tokens per second, while the NVIDIA chips were pegged at around 450 tokens per second.

If those figures hold up across real workloads, that is not a small difference. That is the kind of pricing gap that can change which provider a company chooses, especially when AI features are being used thousands or millions of times a day.

Why Malaysia and SEA should care

For Malaysian gamers, this will not instantly make your GPU cheaper or your ping better. But behind the scenes, cheaper AI inference could make a lot of digital services more affordable to run.

SEA studios and platforms usually cannot throw money around like US hyperscalers. If AI tools become cheaper per token, smaller teams may be able to add better support bots, multilingual features, creator tools, anti-cheat analysis, or community moderation without burning their whole budget.

It also matters for local content. Malaysia and SEA are multilingual by default. English, Malay, Mandarin, Tamil, Indonesian, Thai, Vietnamese — proper localisation is expensive. If inference costs drop, more apps and games could justify better translation and regional support instead of treating SEA like an afterthought.

NVIDIA is not suddenly in trouble overnight. Its top-end AI GPUs remain the performance standard, and demand for compute is still extremely high, with providers able to run close to full utilisation. But the conversation is changing. If companies start optimising for cost per million tokens, then specialised inference chips like Groq’s become much harder to ignore.

For the AI hardware meta, this is basically the moment where the overpowered champion is still S-tier, but the counter-picks are finally starting to look real.

Source: Wccftech Gaming

Groq’s AI Chips Are Making NVIDIA’s GPU Pricing Look Less Untouchable

NVIDIA GPUs are still expensive, especially on demand

The big shift: inference is now the main workload

Groq’s pitch: cheaper tokens, faster output

Why Malaysia and SEA should care

Tags