Current location:
Recently, Google officially launched the 8th generation Tensor Processing Unit (TPU) at the Google Cloud Next 2026 conference. This product is jointly designed by Google Cloud and DeepMind and is divided into two chips: TPU 8t for large-scale training and TPU 8i for low latency inference.

Google stated that the goal of the eighth generation TPU is to provide computing power support for AI agent scenarios and to adapt to evolving model architectures on a large scale. This is the first time that Google has split training and inference tasks into independent chips, marking a significant shift in its AI hardware roadmap.
TPU 8t localization training acceleration aims to shorten the development cycle of cutting-edge models from months to weeks. The performance of each node in TPU 8t has improved by nearly three times compared to the previous generation. A single TPU 8t node is now expanded to 9600 chips and equipped with 2PB shared HBM memory, doubling the bandwidth between chips. Google states that one node can provide up to 121 ExaFlops of computing power. At the same time, TPU 8t also achieves over 97% effective throughput through a comprehensive RAS function to reduce the occurrence of hardware failures causing training to stop.

TPU 8i focuses on inference scenarios and optimizes for multi-agent collaboration and low latency requirements. Each node of TPU 8i is expanded to 1152 chips, equipped with 384MB of on-chip SRAM and 288GB of HBM memory, which allows the active working set of the model to be fully retained on the chip. At the same time, Google doubled the number of physical CPU hosts per server and switched to using customized Axion CPUs. The new Boardfly topology architecture reduces the maximum network diameter by over 50%, significantly reducing latency.
Both chips are planned to be officially supplied to the public later in 2026. Compared with the seventh generation Ironwood TPU released in November last year, TPU 8t has improved performance by 2.8 times at the same price, and TPU 8i has improved performance by 80%; Both chips have more than doubled their performance per watt compared to the previous generation, with TPU 8t reaching 124% and TPU 8i reaching 117%.
Google emphasizes that for the first time, both chips run on Google's self-developed Axion ARM based CPU host, allowing them to optimize the entire system for performance and energy efficiency.
In terms of software support, both chips support native frameworks such as JAX, MaxText, PyTorch, SGLang, and vLLM that developers use on a daily basis. They also provide bare metal access, allowing customers to directly access hardware.
In terms of power optimization, the eighth generation TPU has also been upgraded. Google has optimized the energy efficiency of the entire technology stack, and the integrated power management function can dynamically adjust power consumption according to real-time needs. The performance per watt of the eighth generation TPU is up to twice that of the previous generation Ironwood. At the same time, both products are equipped with fourth generation liquid cooling technology, which can achieve performance density that air cooling cannot match.
*Disclaimer: The above content is reproduced on the WeChat official account of the semiconductor industry circle, which does not represent the views and positions of the company, but only for exchange and learning. If you have any questions or objections, please contact us.
TVS/ESD Diode Transistor MOS FET LDO Hall IC BLCD controller