Google Cloud has announced its eighth-generation Tensor Processing Units at Cloud Next 2026, and for the first time, the company is splitting its TPU strategy into two distinct architectures optimized for fundamentally different workloads. The result is TPU 8t for training and TPU 8i for inference — a bifurcation that reflects the industry’s recognition that the “agentic era” demands different silicon for different stages of the AI pipeline.
TPU 8t: The Training Powerhouse
TPU 8t is designed for one thing: training the largest frontier models at maximum throughput.
- Superpods scale to 9,600 chips, delivering 2 petabytes of shared HBM and 121 ExaFLOPs of compute
- Virgo Network fabric supports linking over 134,000 chips in a single fabric and 1 million+ chips across clusters
- 3x higher compute performance vs. the previous Ironwood generation
- 2.7x better training price-performance
- Hosted on Google’s custom Axion ARM-based CPUs
For organizations training frontier-scale models, TPU 8t superpods represent the largest commercially available training clusters, rivaling anything from NVIDIA or custom internal infrastructure at hyperscalers.
TPU 8i: The Inference Specialist
TPU 8i is purpose-built for the workload that’s growing fastest: real-time inference for agentic AI systems.
- 3x more on-chip SRAM (384MB) than Ironwood
- 288GB of HBM to host large KV caches entirely on silicon — critical for Mixture-of-Experts models
- Boardfly topology replaces the traditional 3D torus, reducing network hops and latency
- Dedicated Collectives Acceleration Engine (CAE) for faster distributed inference
- 80% better inference price-performance vs. previous generation
The KV cache optimization is particularly significant. As agentic systems run longer sessions with more complex reasoning chains, keeping the cache on-chip eliminates the memory bandwidth bottleneck that plagues GPU-based inference at scale.
AI Hypercomputer Stack Updates
The new TPUs are integrated into Google’s broader AI Hypercomputer architecture, which also received major updates:
- Google Cloud Managed Lustre now delivers 10 TB/s bandwidth
- TPUDirect RDMA and TPU Direct Storage bypass CPU bottlenecks for data loading
- Enhanced GKE capabilities for agent-native workload orchestration
- Native PyTorch support maintained across both TPU variants
The Competitive Landscape
Google’s bifurcated TPU strategy is a direct response to the market reality that training and inference are fundamentally different compute problems. NVIDIA’s Blackwell and upcoming Rubin architectures attempt to serve both with a single chip family; Google is betting that purpose-built silicon wins in an era where every watt and millisecond matters.
The 121 ExaFLOPs figure for a single superpod is particularly notable — it puts Google’s training infrastructure on par with anything available from any cloud provider.
Source: cloud.google.com, blog.google, tomshardware.com