Engineering:Teraflops Research Chip

From HandWiki
Revision as of 16:45, 4 February 2024 by NBrushPhys (talk | contribs) (update)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Teraflops Research Chip
General Info
Launched2006
Designed byIntel Tera-Scale Computing Research Program
Performance
Max. CPU clock rate5.67 GHz
Data width38-bit
Architecture and classification
Instruction set96-bit VLIW
Physical specifications
Transistors
  • 100,000,000
Cores
  • 80
Socket(s)
  • custom 1248-pin LGA (343 signal pins)
History
SuccessorXeon Phi

Intel Teraflops Research Chip (codenamed Polaris) is a research manycore processor containing 80 cores, using a network-on-chip architecture, developed by Intel's Tera-Scale Computing Research Program.[1] It was manufactured using a 65 nm CMOS process with eight layers of copper interconnect and contains 100 million transistors on a 275 mm2 die.[2][3][4] Its design goal was to demonstrate a modular architecture capable of a sustained performance of 1.0 TFLOPS while dissipating less than 100 W.[3] Research from the project was later incorporated into Xeon Phi. The technical lead of the project was Sriram R. Vangal.[4]

The processor was initially presented at the Intel Developer Forum on September 26, 2006[5] and officially announced on February 11, 2007.[6] A working chip was presented at the 2007 IEEE International Solid-State Circuits Conference, alongside technical specifications.[2]

Architecture

The chip consists of a 10x8 2D mesh network of cores and nominally operates at 4 GHz.[nb 1] Each core, called a tile (3 mm2), contains a processing engine and a 5-port wormhole-switched router (0.34 mm2) with mesochronous interfaces, with a bandwidth of 80 GB/s and latency of 1.25 ns at 4 GHz.[2] The processing engine in each tile contains two independent, 9-stage pipeline, single-precision floating-point multiplyaccumulator (FPMAC) units, 3 KB of single-cycle instruction memory and 2 KB of data memory.[3] Each FPMAC unit is capable of performing 2 single-precision floating-point operations per cycle. Each tile has thus an estimated peak performance of 16 GFLOPS at the standard configuration of 4 GHz. A 96-bit very long instruction word (VLIW) encodes up to eight operations per cycle.[3] The custom instruction set includes instructions to send and receive packets into/from the chip's network and well as instructions for sleeping and waking a particular tile.[4] Underneath each tile, a 256 KB SRAM module (codenamed Freya) was 3D stacked, thus bringing memory nearer to the processor to increase overall memory bandwidth to 1 TB/s, at the expense of higher cost, thermal stress and latency, and a small total capacity of 20 MB.[7] The network of Polaris was shown to have a bisection bandwidth of 1.6 Tbit/s at 3.16 GHz and 2.92 Tbit/s at 5.67 GHz.[8]

Teraflops Research Chip's tile diagram.

Other prominent features of the Teraflops Research chip include its fine-grained power management with 21 independent sleep regions on a tile and dynamic tile sleep, and very high energy efficiency with 27 GFLOPS/W theoretical peak at 0.6 V and 19.4 GFLOPS/W actual for stencil at 0.75 V.[4][9]

Instruction types and their latency[4]
Instruction type Latency (cycles)
FPMAC 9
LOAD/STORE 2
SEND/RECEIVE 2
JUMP/BRANCH 1
STALL/WFD ?
SLEEP/WAKE 6
Application performance of Teraflops Research Chip[nb 2][4]
Application [math]\displaystyle{ FLOP }[/math] count [math]\displaystyle{ \text{TFLOPS}_{avg} }[/math] [math]\displaystyle{ \% \text{TFLOPS}_{peak} }[/math] Active tiles
Stencil 358K 1.00 73.3% 80
SGEMM:

Matrix multiplication

2.63M 0.51 37.5% 80
Spreadsheet 64.2K 0.45 33.2% 80
2D FFT 196K 0.02 2.73% 64
Experimental results of the Teraflops Research Chip[nb 3]
[math]\displaystyle{ V_{CC} }[/math] [math]\displaystyle{ f_{max} }[/math][nb 4] [math]\displaystyle{ \text{TFLOPS}_{peak} }[/math][nb 5] Power[nb 6] [math]\displaystyle{ T }[/math] Source
0.60 V 1.0 GHz 0.32 TFLOPS 11 W 110 °C [2]
0.675 V 1.0 GHz 0.32 TFLOPS 15.6 W 80 °C [4]
0.70 V 1.5 GHz 0.48 TFLOPS 25 W 110 °C [2]
0.70 V 1.35 GHz 0.43 TFLOPS 18 W 80 °C [4]
0.75 V 1.6 GHz 0.51 TFLOPS 21 W 80 °C [4]
0.80 V 2.1 GHz 0.67 TFLOPS 42 W 110 °C [2]
0.80 V 2.0 GHz 0.64 TFLOPS 26 W 80 °C [4]
0.85 V 2.4 GHz 0.77 TFLOPS 32 W 80 °C [4]
0.90 V 2.6 GHz 0.83 TFLOPS 70 W 110 °C [2]
0.90 V 2.85 GHz 0.91 TFLOPS 45 W 80 °C [4]
0.95 V 3.16 GHz 1.0 TFLOPS 62 W 80 °C [4]
1.00 V 3.13 GHz 1.0 TFLOPS 98 W 110 °C [2]
1.00 V 3.8 GHz 1.22 TFLOPS 78 W 80 °C [4]
1.05 V 4.2 GHz 1.34 TFLOPS 82 W 80 °C [4]
1.10 V 3.5 GHz 1.12 TFLOPS 135 W 110 °C [2]
1.10 V 4.5 GHz 1.44 TFLOPS 105 W 80 °C [4]
1.15 V 4.8 GHz 1.54 TFLOPS 128 W 80 °C [4]
1.20 V 4.0 GHz 1.28 TFLOPS 181 W 110 °C [2]
1.20 V 5.1 GHz 1.63 TFLOPS 152 W 80 °C [4]
1.25 V 5.3 GHz 1.70 TFLOPS 165 W 80 °C [4]
1.30 V 4.4 GHz 1.39 TFLOPS ? 110 °C [2]
1.30 V 5.5 GHz 1.76 TFLOPS 210 W 80 °C [4]
1.35 V 5.67 GHz 1.81 TFLOPS 230 W 80 °C [4]
1.40 V 4.8 GHz 1.52 TFLOPS ? 110 °C [2]

Issues

Intel aimed to help software development for the new exotic architecture by creating a new programming model, especially for the chip, called Ct. The model never gained the following Intel hoped for and has been eventually incorporated into Intel Array Building Blocks, a now defunct C++ library.

See also

Notes

  1. Though the chip was later shown by Intel to run as high as 5.67 GHz.
  2. At 1.07 V and 4.27 GHz.
  3. All measurements present performance with all 80 cores active.
  4. Substantially higher frequencies at the same voltages (compared to the initial ISSCC report) were attained in 2008 with use of a custom cooling solution.
  5. Values in italic were extrapolated by [math]\displaystyle{ \text{FLOPS}_{peak} = f_{max} \cdot 80 \text{ tiles} \cdot 2 \tfrac{\text{FPMAC}}{\text{tile}} \cdot 2 \tfrac{\text{FLOPS}}{\text{FPMAC}\cdot\text{cycle}} }[/math], where the maximal frequency was manually extracted from plots and are thus only approximate in their nature.
  6. Values in italic were manually extracted from plots and are thus only approximate in their nature.

References

  1. Intel Corporation. "Teraflops Research Chip". http://techresearch.intel.com/articles/Tera-Scale/1449.htm. 
  2. 2.00 2.01 2.02 2.03 2.04 2.05 2.06 2.07 2.08 2.09 2.10 2.11 Vangal, Sriram; Howard, Jason; Ruhl, Gregory; Dighe, Saurabh; Wilson, Howard; Tschanz, James; Finan, David; Iyer, Priya et al. (2007). "An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS". pp. 98–589. doi:10.1109/ISSCC.2007.373606. ISBN 978-1-4244-0852-8. https://ieeexplore.ieee.org/document/4242283. 
  3. 3.0 3.1 3.2 3.3 Peh, Li-Shiuan; Keckler, Stephen W.; Vangal, Sriram (2009), Keckler, Stephen W.; Olukotun, Kunle; Hofstee, H. Peter, eds., "On-Chip Networks for Multicore Systems", Multicore Processors and Systems (Springer US): pp. 35–71, doi:10.1007/978-1-4419-0263-4_2, ISBN 978-1-4419-0262-7, Bibcode2009mps..book...35P, http://link.springer.com/10.1007/978-1-4419-0263-4_2, retrieved 2020-05-14 
  4. 4.00 4.01 4.02 4.03 4.04 4.05 4.06 4.07 4.08 4.09 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 4.18 4.19 4.20 Vangal, S.R.; Howard, J.; Ruhl, G.; Dighe, S.; Wilson, H.; Tschanz, J.; Finan, D.; Singh, A. et al. (2008). "An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS". IEEE Journal of Solid-State Circuits 43 (1): 29–41. doi:10.1109/JSSC.2007.910957. ISSN 0018-9200. Bibcode2008IJSSC..43...29V. https://ieeexplore.ieee.org/document/4443212. 
  5. "Intel Develops Tera-Scale Research Chips". 2006. https://www.intel.com/pressroom/archive/releases/2006/20060926corp_b.htm. 
  6. Intel Corporation (February 11, 2007). "Intel Research Advances 'Era Of Tera'". http://www.intel.com/pressroom/archive/releases/20070204comp.htm. 
  7. Bautista, Jerry (2008). "Tera-scale computing and interconnect challenges - 3D stacking considerations". 2008 IEEE Hot Chips 20 Symposium (HCS). Stanford, CA, USA: IEEE. pp. 1–34. doi:10.1109/HOTCHIPS.2008.7476514. ISBN 978-1-4673-8871-9. https://ieeexplore.ieee.org/document/7476514. 
  8. Intel's Teraflops Research Chip. Intel Corporation. 2007. http://download.intel.com/pressroom/kits/Teraflops/Teraflops_Research_Chip_Overview.pdf. 
  9. Fossum, Tryggve (2007). "High End MPSOC - The Personal Super Computer". MPSoC Conference 2007. pp. 6. https://en.wikichip.org/w/images/0/0b/intel_mpsoc_2007.pdf.