Google Unveils TPU 8t and TPU 8i: Purpose-Built Chips for the Agentic AI Era

Back to News

Google Cloud has announced its eighth-generation Tensor Processing Units at Cloud Next 2026, and for the first time, the company is splitting its TPU strategy into two distinct architectures optimized for fundamentally different workloads. The result is TPU 8t for training and TPU 8i for inference — a bifurcation that reflects the industry’s recognition that the “agentic era” demands different silicon for different stages of the AI pipeline.

TPU 8t: The Training Powerhouse

TPU 8t is designed for one thing: training the largest frontier models at maximum throughput.

Superpods scale to 9,600 chips, delivering 2 petabytes of shared HBM and 121 ExaFLOPs of compute
Virgo Network fabric supports linking over 134,000 chips in a single fabric and 1 million+ chips across clusters
3x higher compute performance vs. the previous Ironwood generation
2.7x better training price-performance
Hosted on Google’s custom Axion ARM-based CPUs

For organizations training frontier-scale models, TPU 8t superpods represent the largest commercially available training clusters, rivaling anything from NVIDIA or custom internal infrastructure at hyperscalers.

TPU 8i: The Inference Specialist

TPU 8i is purpose-built for the workload that’s growing fastest: real-time inference for agentic AI systems.

3x more on-chip SRAM (384MB) than Ironwood
288GB of HBM to host large KV caches entirely on silicon — critical for Mixture-of-Experts models
Boardfly topology replaces the traditional 3D torus, reducing network hops and latency
Dedicated Collectives Acceleration Engine (CAE) for faster distributed inference
80% better inference price-performance vs. previous generation

The KV cache optimization is particularly significant. As agentic systems run longer sessions with more complex reasoning chains, keeping the cache on-chip eliminates the memory bandwidth bottleneck that plagues GPU-based inference at scale.

AI Hypercomputer Stack Updates

The new TPUs are integrated into Google’s broader AI Hypercomputer architecture, which also received major updates:

Google Cloud Managed Lustre now delivers 10 TB/s bandwidth
TPUDirect RDMA and TPU Direct Storage bypass CPU bottlenecks for data loading
Enhanced GKE capabilities for agent-native workload orchestration
Native PyTorch support maintained across both TPU variants

The Competitive Landscape

Google’s bifurcated TPU strategy is a direct response to the market reality that training and inference are fundamentally different compute problems. NVIDIA’s Blackwell and upcoming Rubin architectures attempt to serve both with a single chip family; Google is betting that purpose-built silicon wins in an era where every watt and millisecond matters.

The 121 ExaFLOPs figure for a single superpod is particularly notable — it puts Google’s training infrastructure on par with anything available from any cloud provider.

Source: cloud.google.com, blog.google, tomshardware.com

Google Unveils TPU 8t and TPU 8i: Purpose-Built Chips for the Agentic AI Era

TPU 8t: The Training Powerhouse

TPU 8i: The Inference Specialist

AI Hypercomputer Stack Updates

The Competitive Landscape

Never Miss a Breakthrough