Engineering:Tensor Processing Unit
Tensor Processing Unit 3.0 | |
Designer | |
---|---|
Introduced | May 2016 |
Type | Neural network Machine learning |
Tensor Processing Unit (TPU) is an AI accelerator application-specific integrated circuit (ASIC) developed by Google for neural network machine learning, using Google's own TensorFlow software.[1] Google began using TPUs internally in 2015, and in 2018 made them available for third-party use, both as part of its cloud infrastructure and by offering a smaller version of the chip for sale.
Comparison to CPUs and GPUs
Compared to a graphics processing unit, TPUs are designed for a high volume of low precision computation (e.g. as little as 8-bit precision)[2] with more input/output operations per joule, without hardware for rasterisation/texture mapping.[3] The TPU ASICs are mounted in a heatsink assembly, which can fit in a hard drive slot within a data center rack, according to Norman Jouppi.[4]
Different types of processors are suited for different types of machine learning models. TPUs are well suited for CNNs, while GPUs have benefits for some fully-connected neural networks, and CPUs can have advantages for RNNs.[5]
History
The tensor processing unit was announced in May 2016 at Google I/O, when the company said that the TPU had already been used inside their data centers for over a year.[4][3] The chip has been specifically designed for Google's TensorFlow framework, a symbolic math library which is used for machine learning applications such as neural networks.[6] However, as of 2017 Google still used CPUs and GPUs for other types of machine learning.[4] Other AI accelerator designs are appearing from other vendors also and are aimed at embedded and robotics markets.
Google's TPUs are proprietary. Some models are commercially available, and on February 12, 2018, The New York Times reported that Google "would allow other companies to buy access to those chips through its cloud-computing service."[7] Google has said that they were used in the AlphaGo versus Lee Sedol series of man-machine Go games,[3] as well as in the AlphaZero system, which produced Chess, Shogi and Go playing programs from the game rules alone and went on to beat the leading programs in those games.[8] Google has also used TPUs for Google Street View text processing and was able to find all the text in the Street View database in less than five days. In Google Photos, an individual TPU can process over 100 million photos a day.[4] It is also used in RankBrain which Google uses to provide search results.[9]
Google provides third parties access to TPUs through its Cloud TPU service as part of the Google Cloud Platform[10] and through its notebook-based services Kaggle and Colaboratory.[11][12]
Products
TPUv1 | TPUv2 | TPUv3 | TPUv4[14][16] | TPUv5[17] | Edge v1 | |
---|---|---|---|---|---|---|
Date introduced | 2016 | 2017 | 2018 | 2021 | 2023 | 2018 |
Process node | 28 nm | 16 nm | 16 nm | 7 nm | Unstated | |
Die size (mm2) | 331 | < 625 | < 700 | < 400 | Unstated | |
On-chip memory (MiB) | 28 | 32 | 32 | 32 | 48 | |
Clock speed (MHz) | 700 | 700 | 940 | 1050 | Unstated | |
Memory | 8 GiB DDR3 | 16 GiB HBM | 32 GiB HBM | 32 GiB HBM | 16 GB HBM | |
Memory bandwidth | 34 GB/s | 600 GB/s | 900 GB/s | 1200 GB/s | 819 GB/s | |
TDP (W) | 75 | 280 | 220 | 170 | Not Listed | 2 |
TOPS (Tera Operations Per Second) | 92 | 45 | 123 | 275 | 393 | 4 |
TOPS/W | 0.31 | 0.16 | 0.56 | 1.62 | Not Listed | 2 |
First generation TPU
The first-generation TPU is an 8-bit matrix multiplication engine, driven with CISC instructions by the host processor across a PCIe 3.0 bus. It is manufactured on a 28 nm process with a die size ≤ 331 mm2. The clock speed is 700 MHz and it has a thermal design power of 28–40 W. It has 28 MiB of on chip memory, and 4 MiB of 32-bit accumulators taking the results of a 256×256 systolic array of 8-bit multipliers.[18] Within the TPU package is 8 GiB of dual-channel 2133 MHz DDR3 SDRAM offering 34 GB/s of bandwidth.[15] Instructions transfer data to or from the host, perform matrix multiplications or convolutions, and apply activation functions.[18]
Second generation TPU
The second-generation TPU was announced in May 2017.[19] Google stated the first-generation TPU design was limited by memory bandwidth and using 16 GB of High Bandwidth Memory in the second-generation design increased bandwidth to 600 GB/s and performance to 45 teraFLOPS.[15] The TPUs are then arranged into four-chip modules with a performance of 180 teraFLOPS.[19] Then 64 of these modules are assembled into 256-chip pods with 11.5 petaFLOPS of performance.[19] Notably, while the first-generation TPUs were limited to integers, the second-generation TPUs can also calculate in floating point, introducing the bfloat16 format invented by Google Brain. This makes the second-generation TPUs useful for both training and inference of machine learning models. Google has stated these second-generation TPUs will be available on the Google Compute Engine for use in TensorFlow applications.[20]
Third generation TPU
The third-generation TPU was announced on May 8, 2018.[21] Google announced that processors themselves are twice as powerful as the second-generation TPUs, and would be deployed in pods with four times as many chips as the preceding generation.[22][23] This results in an 8-fold increase in performance per pod (with up to 1,024 chips per pod) compared to the second-generation TPU deployment.
Fourth generation TPU
On May 18, 2021, Google CEO Sundar Pichai spoke about TPU v4 Tensor Processing Units during his keynote at the Google I/O virtual conference. TPU v4 improved performance by more than 2x over TPU v3 chips. Pichai said "A single v4 pod contains 4,096 v4 chips, and each pod has 10x the interconnect bandwidth per chip at scale, compared to any other networking technology.”[24]
There is also an "inference" version, called v4i,[25] that does not require liquid cooling.[26]
Fifth generation TPU
In 2021, Google revealed that the physical layout of TPU v5 is being performed by a novel application of deep reinforcement learning.[27] Google sees TPU v5 as being of the same generation as H100 and (As of June 2023) expects to be able to benchmark such a comparison sometime during 2023.[28]
Similar to the v4i being a lighter-weight version of the v4, the fifth generation has a "cost-efficient"[29] version called v5e.[17]
Edge TPU
In July 2018, Google announced the Edge TPU. The Edge TPU is Google's purpose-built ASIC chip designed to run machine learning (ML) models for edge computing, meaning it is much smaller and consumes far less power compared to the TPUs hosted in Google datacenters (also known as Cloud TPUs[30]). In January 2019, Google made the Edge TPU available to developers with a line of products under the Coral brand. The Edge TPU is capable of 4 trillion operations per second with 2 W of electrical power.[31]
The product offerings include a single-board computer (SBC), a system on module (SoM), a USB accessory, a mini PCI-e card, and an M.2 card. The SBC Coral Dev Board and Coral SoM both run Mendel Linux OS – a derivative of Debian.[32][33] The USB, PCI-e, and M.2 products function as add-ons to existing computer systems, and support Debian-based Linux systems on x86-64 and ARM64 hosts (including Raspberry Pi).
The machine learning runtime used to execute models on the Edge TPU is based on TensorFlow Lite.[34] The Edge TPU is only capable of accelerating forward-pass operations, which means it's primarily useful for performing inferences (although it is possible to perform lightweight transfer learning on the Edge TPU[35]). The Edge TPU also only supports 8-bit math, meaning that for a network to be compatible with the Edge TPU, it needs to either be trained using the TensorFlow quantization-aware training technique, or since late 2019 it's also possible to use post-training quantization.
On November 12, 2019, Asus announced a pair of single-board computer (SBCs) featuring the Edge TPU. The Asus Tinker Edge T and Tinker Edge R Board designed for IoT and edge AI. The SBCs officially support Android and Debian operating systems.[36][37] ASUS has also demonstrated a mini PC called Asus PN60T featuring the Edge TPU.[38]
On January 2, 2020, Google announced the Coral Accelerator Module and Coral Dev Board Mini, to be demonstrated at CES 2020 later the same month. The Coral Accelerator Module is a multi-chip module featuring the Edge TPU, PCIe and USB interfaces for easier integration. The Coral Dev Board Mini is a smaller SBC featuring the Coral Accelerator Module and MediaTek 8167s SoC.[39][40]
Pixel Neural Core
On October 15, 2019, Google announced the Pixel 4 smartphone, which contains an Edge TPU called the Pixel Neural Core. Google describe it as "customized to meet the requirements of key camera features in Pixel 4", using a neural network search that sacrifices some accuracy in favor of minimizing latency and power use.[41]
Google Tensor
Google followed the Pixel Neural Core by integrating an Edge TPU into a custom system-on-chip named Google Tensor, which was released in 2021 with the Pixel 6 line of smartphones.[42] The Google Tensor SoC demonstrated "extremely large performance advantages over the competition" in machine learning-focused benchmarks; although instantaneous power consumption also was relatively high, the improved performance meant less energy was consumed due to shorter periods requiring peak performance.[43]
Lawsuit
In 2019, Singular Computing, founded in 2009 by Joseph Bates, a visiting professor at MIT,[44] filed suit against Google alleging patent infringement in TPU chips.[45] By 2020, Google had successfully lowered the number of claims the court would consider to just two: claim 53 of US patent 8407273 filed in 2012 and claim 7 of US patent 9218156 filed in 2013, both of which claim a dynamic range of 10-6 to 106 for floating point numbers, which the standard float16 cannot do (without resorting to subnormal numbers) as it only has five bits for the exponent. In a 2023 court filing, Singular Computing specifically called out Google's use of bfloat16, as that exceeds the dynamic range of float16.[46] Singular claims non-standard floating point formats were non-obvious in 2009, but Google retorts that the VFLOAT[47] format, with configurable number of exponent bits, existed as prior art in 2002.[48] (As of January 2024), subsequent lawsuits by Singular have brought the number of patents being litigated up to eight.
See also
- Cognitive computer
- AI accelerator
- Structure tensor, a mathematical foundation for TPU's
- Tensor Core, a similar architecture by Nvidia
- TrueNorth, a similar device simulating spiking neurons instead of low-precision tensors
- Vision processing unit, a similar device specialised for vision processing
References
- ↑ "Cloud Tensor Processing Units (TPUs)". https://cloud.google.com/tpu/docs/tpus.
- ↑ Armasu, Lucian (2016-05-19). "Google's Big Chip Unveil For Machine Learning: Tensor Processing Unit With 10x Better Efficiency (Updated)". http://www.tomshardware.com/news/google-tensor-processing-unit-machine-learning,31834.html.
- ↑ 3.0 3.1 3.2 Jouppi, Norm (May 18, 2016). "Google supercharges machine learning tasks with TPU custom chip" (in en-US). https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html.
- ↑ 4.0 4.1 4.2 4.3 "Google's Tensor Processing Unit explained: this is what the future of computing looks like" (in en). TechRadar. http://www.techradar.com/news/computing-components/processors/google-s-tensor-processing-unit-explained-this-is-what-the-future-of-computing-looks-like-1326915.
- ↑ Wang, Yu Emma; Wei, Gu-Yeon; Brooks, David (2019-07-01). "Benchmarking TPU, GPU, and CPU Platforms for Deep Learning". arXiv:1907.10701 [cs.LG].
- ↑ "TensorFlow: Open source machine learning" "It is machine learning software being used for various kinds of perceptual and language understanding tasks" — Jeffrey Dean, minute 0:47 / 2:17 from Youtube clip
- ↑ "Google Makes Its Special A.I. Chips Available to Others" (in en). The New York Times. https://www.nytimes.com/2018/02/12/technology/google-artificial-intelligence-chips.html.
- ↑ McGourty, Colin (6 December 2017). "DeepMind's AlphaZero crushes chess" (in en). https://chess24.com/en/read/news/deepmind-s-alphazero-crushes-chess.
- ↑ "Google's Tensor Processing Unit could advance Moore's Law 7 years into the future" (in en). PCWorld. http://www.pcworld.com/article/3072256/google-io/googles-tensor-processing-unit-said-to-advance-moores-law-seven-years-into-the-future.html.
- ↑ "Frequently Asked Questions | Cloud TPU" (in en). https://cloud.google.com/tpu/docs/faq.
- ↑ "Google Colaboratory" (in en). https://colab.research.google.com/notebooks/tpu.ipynb.
- ↑ "Use TPUs | TensorFlow Core" (in en). https://www.tensorflow.org/guide/tpu.
- ↑ Jouppi, Norman P.; Yoon, Doe Hyun; Ashcraft, Matthew; Gottscho, Mark (June 14, 2021). "Ten lessons from three generations that shaped Google's TPUv4i". International Symposium on Computer Architecture. Valencia, Spain. doi:10.1109/ISCA52012.2021.00010. ISBN 978-1-4503-9086-6. https://conferences.computer.org/iscapub/pdfs/ISCA2021-4ghucdBnCWYB7ES2Pe4YdT/333300a001/333300a001.pdf.
- ↑ 14.0 14.1 "System Architecture | Cloud TPU" (in en). https://cloud.google.com/tpu/docs/system-architecture-tpu-vm.
- ↑ 15.0 15.1 15.2 Kennedy, Patrick (22 August 2017). "Case Study on the Google TPU and GDDR5 from Hot Chips 29". Serve The Home. https://www.servethehome.com/case-study-google-tpu-gddr5-hot-chips-29/.
- ↑ Stay tuned, more information on TPU v4 is coming soon, retrieved 2020-08-06.
- ↑ 17.0 17.1 Cloud TPU v5e Inference Public Preview, retrieved 2023-11-06.
- ↑ 18.0 18.1 Jouppi, Norman P.; Young, Cliff; Patil, Nishant; Patterson, David; Agrawal, Gaurav; Bajwa, Raminder; Bates, Sarah; Bhatia, Suresh et al. (June 26, 2017). "In-Datacenter Performance Analysis of a Tensor Processing Unit™". Toronto, Canada.
- ↑ 19.0 19.1 19.2 Bright, Peter (17 May 2017). "Google brings 45 teraflops tensor flow processors to its compute cloud". Ars Technica. https://arstechnica.com/information-technology/2017/05/google-brings-45-teraflops-tensor-flow-processors-to-its-compute-cloud/.
- ↑ Kennedy, Patrick (17 May 2017). "Google Cloud TPU Details Revealed". Serve The Home. https://www.servethehome.com/google-cloud-tpu-details-revealed/.
- ↑ Frumusanu, Andre (8 May 2018). "Google I/O Opening Keynote Live-Blog". https://www.anandtech.com/show/12726/google-io-keynote-liveblog-10am-pt.
- ↑ Feldman, Michael (11 May 2018). "Google Offers Glimpse of Third-Generation TPU Processor". Top 500. https://www.top500.org/news/google-offers-glimpse-of-third-generation-tpu-processor/.
- ↑ Teich, Paul (10 May 2018). "Tearing Apart Google's TPU 3.0 AI Coprocessor". The Next Platform. https://www.nextplatform.com/2018/05/10/tearing-apart-googles-tpu-3-0-ai-coprocessor/.
- ↑ "Google Launches TPU v4 AI Chips". 20 May 2021. https://www.hpcwire.com/2021/05/20/google-launches-tpu-v4-ai-chips/.
- ↑ Kennedy, Patrick (2023-08-29). "Google Details TPUv4 and its Crazy Optically Reconfigurable AI Network". https://www.servethehome.com/google-details-tpuv4-and-its-crazy-optically-reconfigurable-ai-network/.
- ↑ "Why did Google develop its own TPU chip? In-depth disclosure of team members". 2021-10-20. https://www.censtry.com/blog/why-did-google-develop-its-own-tpu-chip-in-depth-disclosure-of-team-members.html.
- ↑ Mirhoseini, Azalia; Goldie, Anna (2021-06-01). "A graph placement methodology for fast chip design". Nature 594 (7962): 207–212. doi:10.1038/s41586-022-04657-6. PMID 35361999. http://176.9.41.242/doc/reinforcement-learning/model/2021-mirhoseini.pdf. Retrieved 2023-06-04.
- ↑ Jouppi, Norman (2023). "TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings". arXiv:2304.01433 [cs.AR].
The appropriate H100 match would be a successor to TPU v4 widely deployed in a similar time frame and technology (e.g., in 2023 and 4nm)
- ↑ "Expanding our AI-optimized infrastructure portfolio: Introducing Cloud TPU v5e and announcing A3 GA". Google. 2023-08-29. https://cloud.google.com/blog/products/compute/announcing-cloud-tpu-v5e-and-a3-gpus-in-ga.
- ↑ "Cloud TPU" (in en). https://cloud.google.com/tpu.
- ↑ "Edge TPU performance benchmarks" (in en-us). https://coral.ai/docs/edgetpu/benchmarks/.
- ↑ "Dev Board" (in en-us). https://coral.ai/products/dev-board.
- ↑ "System-on-Module (SoM)" (in en-us). https://coral.ai/products/som.
- ↑ "Bringing intelligence to the edge with Cloud IoT" (in en-US). 2018-07-25. https://www.blog.google/products/google-cloud/bringing-intelligence-to-the-edge-with-cloud-iot/.
- ↑ "Retrain an image classification model on-device". https://coral.withgoogle.com/docs/edgetpu/retrain-classification-ondevice/.
- ↑ "組込み総合技術展&IoT総合技術展「ET & IoT Technology 2019」に出展することを発表" (in ja-JP). https://www.asus.com/jp/News/jr4skqts65jsuggg.
- ↑ Shilov, Anton. "ASUS & Google Team Up for 'Tinker Board' AI-Focused Credit-Card Sized Computers". https://www.anandtech.com/show/15095/asus-google-team-up-for-tinker-board-aifocused-creditcard-sized-computers.
- ↑ Aufranc, Jean-Luc (2019-05-29). "ASUS Tinker Edge T & CR1S-CM-A SBC to Feature Google Coral Edge TPU & NXP i.MX 8M Processor" (in en-US). https://www.cnx-software.com/2019/05/29/asus-tinker-edge-t-cr1s-cm-a-sbc-google-coral-edge-tpu-nxp-i-mx-8m-processor.
- ↑ "New Coral products for 2020" (in en). https://developers.googleblog.com/2020/01/new-coral-products-for-2020.html.
- ↑ "Accelerator Module" (in en-us). https://coral.ai/products/accelerator-module.
- ↑ "Introducing the Next Generation of On-Device Vision Models: MobileNetV3 and MobileNetEdgeTPU" (in en). http://ai.googleblog.com/2019/11/introducing-next-generation-on-device.html.
- ↑ "Improved On-Device ML on Pixel 6, with Neural Architecture Search". November 8, 2021. https://ai.googleblog.com/2021/11/improved-on-device-ml-on-pixel-6-with.html.
- ↑ Frumusanu, Andrei (November 2, 2021). "Google's Tensor inside of Pixel 6, Pixel 6 Pro: A Look into Performance & Efficiency | Google's IP: Tensor TPU/NPU". AnandTech. https://www.anandtech.com/show/17032/tensor-soc-performance-efficiency/5.
- ↑ Hardesty, Larry (2011-01-03). "The surprising usefulness of sloppy arithmetic". MIT. https://news.mit.edu/2010/fuzzy-logic-0103.
- ↑ Bray, Hiawatha (2024-01-10). "Local inventor challenges Google in billion-dollar patent fight". Boston Globe (Boston). https://www.bostonglobe.com/2024/01/10/business/local-inventor-challenges-google-billion-dollar-patent-fight/.
- ↑ "SINGULAR COMPUTING LLC, Plaintiff, v. GOOGLE LLC, Defendant: Amended Complaint for Patent Infringement". RPX Corporation. 2020-03-20. https://www.pacermonitor.com/public/filings/DCYQIDDI/Singular_Computing_LLC_v_Google_LLC__madce-24-10008__0001.0.pdf.
- ↑ Wang, Xiaojun; Leeser, Miriam (2010-09-01). "VFloat: A Variable Precision Fixed- and Floating-Point Library for Reconfigurable Hardware". ACM Transactions on Reconfigurable Technology and Systems 3 (3): 1-34. doi:10.1145/1839480.1839486. https://dl.acm.org/doi/abs/10.1145/1839480.1839486. Retrieved 2024-01-10.
- ↑ "Singular Computing LLC v. Google LLC". 2023-04-06. https://casetext.com/case/singular-computing-llc-v-google-llc-1.
External links
- Cloud Tensor Processing Units (TPUs) (Documentation from Google Cloud)
- Photo of Google's TPU chip and board
- Photo of Google's TPU v2 board
- Photo of Google's TPU v3 board
- Photo of Google's TPU v2 pod
Original source: https://en.wikipedia.org/wiki/Tensor Processing Unit.
Read more |