Google’s AI Hypercomputer Is Built for the Next Wave of AI Agents

Google is going big on the so-called agentic AI era, and this time it is not just talking about smarter chatbots. At Cloud Next 26, the company announced its AI Hypercomputer, a new Google Cloud infrastructure platform designed to bring together custom TPUs, Axion CPUs, NVIDIA GPUs, networking, storage and machine learning software into one massive AI system.

For Malaysian and SEA readers, this is the kind of backend tech that sounds distant at first, but can quietly shape the apps, games, creator tools and AI services we use every day. Faster AI training and inference usually means quicker feature rollouts, cheaper scaling for cloud platforms, and more ambitious AI tools for developers building games, esports analytics, virtual assistants or content pipelines.

What Google actually announced

The AI Hypercomputer is Google’s pitch for large-scale AI infrastructure beyond the traditional “supercomputer” idea. Instead of relying on one type of chip, Google is mixing multiple compute options into a single platform:

8th-gen Google TPUs
Google Axion Arm-based Cloud CPUs
NVIDIA Vera Rubin NVL72 GPUs
AI-focused networking
High-speed storage
Open software and ML frameworks

The headline hardware is Google’s new TPUv8 family, split into two chips: TPU 8t for training and TPU 8i for inference.

TPU 8t is for training massive models

The TPU 8t is built for training large frontier AI models. Google says a single TPU 8t superpod can scale up to 9,600 chips with 2PB of shared high-bandwidth memory. The company is claiming 121 exaflops of FP4 compute per pod, around 2.84x higher than its previous Ironwood generation.

Google is also pushing several improvements here, including double the interchip bandwidth versus the previous generation, 10x faster storage access, and TPUDirect for feeding data directly into TPUs. The platform also uses Google’s Virgo Network, JAX and Pathways software, with Google saying this can support near-linear scaling up to one million chips in a single logical cluster.

The TPU 8t also introduces native FP4 support, which helps reduce memory bandwidth pressure by using fewer bits per parameter while still aiming to keep large-model accuracy usable.

TPU 8i is focused on inference

The TPU 8i is aimed at inference, which is the part where trained AI models actually respond to users. This matters because inference is where real-world costs can explode once millions of people start using an AI product.

Google says TPU 8i comes with 288GB of HBM memory and 384MB of on-chip SRAM, a 3x capacity increase over the previous generation. The chip offers 331.8 exaflops of FP8 compute per pod, which Google says is 6.74x higher than Ironwood.

For modern Mixture of Experts models, Google doubled ICI bandwidth to 19.2Tb/s. Its new Boardfly architecture reduces maximum network diameter by more than 50%, while a Collectives Acceleration Engine can cut on-chip latency by up to 5x.

NVIDIA Rubin is also part of the plan

Google is not only relying on its own silicon. The company says NVIDIA GPUs remain a core part of its AI accelerator lineup, and Google Cloud will be among the first to offer NVIDIA Vera Rubin NVL72 systems. These will sit alongside existing Hopper and Blackwell-based instances.

That combination is interesting because cloud customers will not be forced into only one hardware path. For studios, AI startups, enterprise teams and research groups in SEA, flexibility matters. Some workloads may fit Google TPUs better, while others may still prefer NVIDIA’s CUDA-heavy ecosystem.

Why this matters for SEA

No, your gaming PC in Malaysia is not suddenly getting a TPU 8t inside it. But cloud AI infrastructure affects the tools around gaming and entertainment: smarter game NPCs, faster localisation, automated video editing, AI moderation, esports data analysis, AI-generated assets and enterprise copilots.

Google also says customers using this infrastructure include the US DOE, Boston Dynamics, Citadel Securities, Thinking Machine Labs and Axia Energy. That shows the target is serious large-scale AI work, not consumer gimmicks.

The big takeaway: Google wants its cloud to be one of the main homes for agentic AI, using its own TPUs and CPUs while still bringing in NVIDIA Rubin for customers who need that ecosystem. For Malaysia and SEA, the impact will likely arrive through the apps and services built on top of it, not through the hardware itself.

Source: Wccftech Gaming