Comparison of CPU microarchitectures

From HandWiki

The following is a comparison of CPU microarchitectures.

Microarchitecture Year Pipeline stages Misc Process node & fabs Out-of-order execution & ROB size Superscalar processor
MCST Elbrus-8S 2014 ??? VLIW, Elbrus (proprietary, closed) version 5, 64-bit 28 nm TSMC ??? ???
MCST Elbrus-8SV 2018 ??? VLIW, Elbrus (proprietary, closed) version 5, 64-bit 28nm TSMC ??? ???
MCST Elbrus-16S ??? ??? ??? ??? ??? ???
AMD K5 1996 5 Superscalar, branch prediction, speculative execution, out-of-order execution, register renaming[lower-alpha 1] Yes 16 Yes
AMD K6 1997 6 Superscalar, branch prediction, speculative execution, out-of-order execution, register renaming[lower-alpha 2] Yes 24 Yes
AMD K6-III 1999 Branch prediction, speculative execution, out-of-order execution[1] Yes ??? Yes
AMD K7 1999 Out-of-order execution, branch prediction, Harvard architecture Yes 72 Yes
AMD K8 2003 64-bit, integrated memory controller, 16 byte instruction prefetching Yes 72 Yes
AMD K10 2007 Superscalar, out-of-order execution, 32-way set associative L3 victim cache, 32-byte instruction prefetching Yes 72 Yes
ARM1 1985[2] ??? first ever implementation of arm achitecture by ARM Ltd, it operates by a limited 32 bit integer limit, and was composed of merely 25,000 transistors [3][4], used ARMv1 instruction set architecture 3000nm [5] No No at 8MHz clock
ARM2 used ARMv2 instruction set architecture
ARM7TDMI (-S) 2001 3
ARM7EJ-S 2001 5
ARM810 5 static branch prediction, double-bandwidth memory
ARM9TDMI 1998 5
ARM1020E 6
XScale PXA210/PXA250 2002 7
ARM1136J(F)-S 8
ARM1156T2(F)-S 9
ARM Cortex-M3 2004 3
ARM Cortex-M1 2007 3
ARM Cortex-M0 2009 3
ARM Cortex-M4(F) 2010
ARM CCortex-M0+ 2012
ARM Cortex-M7(F) 2014
ARM Cortex-M23 (Grebe) 2016
ARM Cortex-A5 2009 8 Multi-core, single issue, in-order No Yes
ARM Cortex-A7 MPCore 2011 8 Partial dual-issue, in-order, 2-way set associative level 1 instruction cache No Yes
ARM Cortex-A8 2005 13 Dual-issue, in-order, speculative execution, superscalar, 2-way pipeline decode 65/55/45 nm No Yes
ARM Cortex-A9 MPCore 2007 8–11 Out-of-order, speculative issue, superscalar 65/45/40/32/28 nm
ARM Cortex-A12 2014 8–11 28 nm
ARM Cortex-A15 MPCore 2010 15/17-25 Multi-core (up to 16), out-of-order, speculative issue, 3-way superscalar 32/28/20 nm Yes 60 Yes
ARM Cortex-A17 MPCore 2014 11+ 28 nm
ARM Cortex-A32 2014 8 28 nm No Yes
ARM Cortex-A34 2019 8 No Yes
ARM Cortex-A35 2017 8 28 / 16 / 14 / 10 nm No Yes
ARM Cortex-A53 2012 Partial dual-issue, in-order 28 / 20 / 16 / 14 / 12 / 10 / 4 nm & 6nm TSMC No Yes
ARM Cortex-A55 2017 8 in-order, speculative execution 6nm TSMC & 4nm Samsung 4LPX & 8nm Samsung 8LPP & 10nm Samsung 10LPP No Yes
ARM Cortex-A510 2021 in-order, speculative execution No Yes
ARM Cortex-A520 2023 in-order, speculative execution No Yes
ARM Cortex-A320 2025 in-order, speculative execution No Yes
ARM lumex-C1-Nano (A530) 2025 in-order, speculative execution No Yes
ARM Cortex-A57 2012 Deeply out-of-order, wide multi-issue, 3-way superscalar Yes 128 Yes
ARM Cortex-A72 2015 Out-of-order superscalar Yes 128 Yes
ARM Cortex-A73 2016 Out-of-order superscalar Yes 128 Yes
ARM Cortex-A75 2017 11–13 Out-of-order superscalar, speculative execution, register renaming, 3-way Samsung 10LPP 10nm Yes 128 Yes
ARM Cortex-A76 2018 13 Out-of-order superscalar, 4-way pipeline decode Yes 128 Yes
ARM Cortex-A77 2019 13 Out-of-order superscalar, speculative execution, register renaming, 6-way pipeline decode, 10-issue, branch prediction, L3 cache Yes 160 Yes
ARM Cortex-A78 2020 14 Out-of-order superscalar, register renaming, 4-way pipeline decode, 6 instruction per cycle, branch prediction, L3 cache 6nm TSMC & Samsung 4LPX 4nm Yes 160 Yes
ARM Cortex-A710 2021 10 Yes 160 Yes
ARM Cortex-A715 2022 Yes 192 Yes
ARM Cortex-A720 2023 Yes 192 Yes
ARM Cortex-A725 2024 Yes 224 Yes
ARM C1-Pro (A730) 2025 Yes 220 Yes
ARM C1-Premium 2025 Yes
ARM Cortex-X1 2020 13 5-wide decode out-of-order superscalar, L3 cache Yes 224 Yes
ARM Cortex-X2 2021 10 Yes 288 Yes
ARM Cortex-X3 2022 9 Yes 320 Yes
ARM Cortex-X4 2023 10 Yes 384 Yes
ARM Cortex-X925 2024 Yes 768 Yes
ARM C1-Ultra (X930) 2025 Yes 960 Yes
Cavium ThunderX 2014
Cavium ThunderX2 2018
Apple A6 Swift 2012 12 32nm Yes 45 Yes
Apple Cyclone 2013 28nm Yes 192 Yes
Apple Typhoon 2014 20nm Yes ??? Yes
Apple Twister 2015 16 / 14 nm Yes ??? Yes
Apple Hurricane 2016 16 / 10 nm Yes ??? Yes
Apple Zephyr 2016 16 / 10 nm Yes ??? Yes
Apple Monsoon 2017 10 nm Yes ??? Yes
Apple Mistral 2017 10 nm Yes ??? Yes
Apple Vortex 2018 7 nm Yes ??? Yes
Apple Tempest 2018 7 nm Yes ??? Yes
Apple Lightning 2019 7 nm Yes 560 ? Yes
Apple Thunder 2019 7 nm Yes ??? Yes
Apple Firestorm 2020 Yes 330 Coalesced retire queue Yes
Apple Icestorm 2020 Yes 60 Coalesced retire queue Yes
Apple Avalanche 2021 Yes 293 Coalesced retire queue Yes
Apple Blizzard 2021 Yes 78 Coalesced retire queue Yes
Apple Everest 2022 Yes 270 Coalesced retire queue Yes
Apple Sawtooth 2022 Yes 108 Coalesced retire queue Yes
Apple A17 P 2023 Yes 321 Coalesced retire queue Yes
Apple A17 E 2023 Yes 122 Coalesced retire queue Yes
Apple A18 P 2024 Yes 361 Coalesced retire queue Yes
Apple A19 P 2025 Yes 430 Coalesced retire queue Yes
Apple A19 E 2025 Yes 157 Coalesced retire queue Yes
Qualcomm Scorpion 2008 10 65/45 nm
Qualcomm Krait 2012 11 28 nm Yes
Qualcomm Oryon 2024 Yes 650 Yes
Qualcomm OryonV3 2025 3nm TSMC Yes 650 Yes
AVR32 AP7 7
AVR32 UC3 3 Harvard architecture
AMD Bobcat 2011 Out-of-order execution Yes 56 Yes
Bulldozer 2011 20 Shared multithreaded L2 cache, multithreading, multi-core, around 20 stage long pipeline, integrated memory controller, out-of-order, superscalar, up to 16 cores per chip, up to 16 MB L3 cache, Virtualization, Turbo Core, FlexFPU which uses simultaneous multithreading[6] Yes 128 Yes
Piledriver 2012 Shared multithreaded L2 cache, multithreading, multi-core, around 20 stage long pipeline, integrated memory controller, out-of-order, superscalar, up to 16 MB L2 cache, up to 16 MB L3 cache, Virtualization, FlexFPU which use simultaneous multithreading,[6] up to 16 cores per chip, up to 5 GHz clock speed, up to 220 W TDP, Turbo Core
AMD Jaguar 2013 Multi-core, branch prediction Yes 64 Yes
AMD Steamroller 2014 Multi-core, branch prediction, superscalar, out-of-order execution Yes 108 Yes
AMD Excavator 2015 20 Multi-core
AMD Zen 2017 19 Multi-core, superscalar, 2-way simultaneous multithreading, 4-way decode, out-of-order execution, L3 cache Yes 192 Yes
AMD Zen+ 2018 19 Multi-core, superscalar, 4-way decode, out-of-order execution, L3 cache
AMD Zen 2 August 7, 2019 19 Multi-chip module, multi-core, superscalar, 4-way decode, out-of-order execution, L3 cache 7nm TMSC Yes 224 Yes
AMD Zen 3 2020 19 Multi-chip module, multi-core, superscalar, 4-way decode, out-of-order execution, SMT, L3 cache Yes 256 Yes
AMD Zen 4 2022 ??? Multi-chip module, multi-core, superscalar, L3 cache Yes 320 Yes
AMD Zen 5 2024 ??? Multi-chip module, multi-core, superscalar, L3 cache Yes 448 Yes
AMD Zen 6 release in late 2026 to early 2027 ??? superscalar out-of-order execution TSMC 2 nm yes ??? yes
Ampere Ampere 1 2022 superscalar, server cpu Yes 174 Yes
Crusoe 2000 In-order execution, 128-bit VLIW, integrated memory controller
Efficeon 2004 In-order execution, 256-bit VLIW, fully integrated memory controller
Cyrix Cx5x86 1995 6[7] Branch prediction
Cyrix 6x86 1996 Superscalar, superpipelined, register renaming, speculative execution, out-of-order execution
DLX 5
eSi-3200 5 In-order, speculative issue
eSi-3250 5 In-order, speculative issue
EV4 (Alpha 21064) Superscalar
EV7 (Alpha 21364) Superscalar design with out-of-order execution, branch prediction, 4-way simultaneous multithreading, integrated memory controller
EV8 (Alpha 21464) Superscalar design with out-of-order execution
65k Ultra low power consumption, register renaming, out-of-order execution, branch prediction, multi-core, module, capable of reach higher clock
P5 (Pentium) 1993 5 Superscalar Maybe? Yes
P6 (Pentium Pro) 14 Speculative execution, register renaming, superscalar design with out-of-order execution
P6 (Pentium II) 14[8] Branch prediction
P6 (Pentium III) 1995 14[8]
Intel Itanium "Merced" 2001 Single core, L3 cache
Intel Itanium 2 "McKinley" 2002 11[9] Speculative execution, branch prediction, register renaming, 30 execution units, multithreading, multi-core, coarse-grained multithreading, 2-way simultaneous multithreading, Dual-domain multithreading, Turbo Boost, Virtualization, VLIW, RAS with Advanced Machine Check Architecture, Instruction Replay technology, Cache Safe technology, Enhanced SpeedStep technology
Intel NetBurst (Willamette) 2000 20 2-way simultaneous multithreading (Hyper-threading), Rapid Execution Engine, Execution Trace Cache, quad-pumped Front-Side Bus, Hyper-pipelined Technology, superscalar, out-of order
NetBurst (Northwood) 2002 20 2-way simultaneous multithreading
NetBurst (Prescott) 2004 31 2-way simultaneous multithreading
NetBurst (Cedar Mill) 2006 31 2-way simultaneous multithreading
Intel Core 2006 12 Multi-core, out-of-order, 4-way superscalar Yes ??? Yes
Intel Atom 16 2-way simultaneous multithreading, in-order, no instruction reordering, speculative execution, or register renaming No Yes?
Intel Atom Oak Trail 2-way simultaneous multithreading, in-order, burst mode, 512 KB L2 cache
Intel Atom Bonnell 2008 SMT No Yes 2 IPC
Intel Atom Silvermont 2013 or 2011 Out-of-order execution 22nm Yes 32 Yes
Intel Atom Goldmont 2016 Multi-core, out-of-order execution, 3-wide superscalar pipeline, L2 cache Yes 78 Yes
Intel Atom Goldmont Plus 2017 Multi-core Yes 92 Yes
Intel Atom Tremont 2019 Multi-core, superscalar, out-of-order execution, speculative execution, register renaming
Intel Atom Gracemont 2021 Multi-core, superscalar, out-of-order execution, speculative execution, register renaming Yes 256 Yes
Intel Atom Crestmont 2023 Multi-core Yes 256 Yes
Intel Atom Skymont 2024 Multi-core
Intel Atom Darkmont 2025 Multi-core
Nehalem 2008 14 2-way simultaneous multithreading, out-of-order, 6-way superscalar, integrated memory controller, L1/L2/L3 cache, Turbo Boost
Sandy Bridge 2011 14 2-way simultaneous multithreading, multi-core, on-die graphics and PCIe controller, system agent with integrated memory and display controller, ring interconnect, L1/L2/L3 cache, micro-op cache, 2 threads per core, Turbo Boost,
Intel Haswell 2013 14–19 SoC design, multi-core, multithreading, 2-way simultaneous multithreading, hardware-based transactional memory (in selected models), L4 cache (in GT3 models), Turbo Boost, out-of-order execution, superscalar, up to 8 MB L3 cache (mainstream), up to 20 MB L3 cache (Extreme)
Broadwell 2014 14–19 Multi-core, multithreading
Skylake 2015 14–19 Multi-core, L4 cache on certain Skylake-R, Skylake-U and Skylake-Y models. On-package PCH on U, Y, m3, m5 and m7 models. 5 wide superscalar/5 issues.
Kaby Lake 2016 14–19 Multi-core, L4 cache on certain low and ultra low power models (Kaby Lake-U and Kaby Lake-Y),
Intel Sunny Cove 2019 14–20 Multicore, 2-way multithreading, massive OoOE engine, 5 wide superscalar/5 issue.
Intel Cypress Cove 2021 12–14 multicore, 5 wide superscalar/6 issues, massive OoOE engine, big core design.
Intel Willow Cove 2020 Multicore, SMT
Intel Golden Cove 2021 12–14 Multicore, SMT, 6 wide superscalar, massive OoOE engine, big core Yes 512 Yes
Intel Redwood Cove 2023 Multicore, SMT
Intel Lion Cove 2024 12 Multicore, without SMT, 8 wide decoder, big core.
Intel Cougar Cove 2025
Intel Xeon Phi 7120x 2013 7-stage integer, 6-stage vector Multi-core, multithreading, 4 hardware-based simultaneous threads per core which can't be disabled unlike regular HyperThreading, Time-multiplexed multithreading, 61 cores per chip, 244 threads per chip, 30.5 MB L2 cache, 300 W TDP, Turbo Boost, in-order dual-issue pipelines, coprocessor, Floating-point accelerator, 512-bit wide Vector-FPU
LatticeMico32 2006 6 Harvard architecture
Nvidia Denver 2014 Multicore, superscalar, 2-way decode, L2 Maybe? Yes
Nvidia Denver 2 2016
Nvidia Carmel 2018 Multicore, 10-way superscalar, L3 Maybe? Yes
POWER1 1990 Superscalar, out-of-order execution Yes ??? Yes
POWER3 1998 Superscalar, out-of-order execution Yes ??? Yes
POWER4 2001 Superscalar, speculative execution, out-of-order execution Yes ??? Yes
POWER5 2004 2-way simultaneous multithreading, out-of-order execution, integrated memory controller Yes ??? Yes
IBM POWER6 2007 2-way simultaneous multithreading, in-order execution, up to 5 GHz No Yes
IBM POWER7+ Multi-core, multithreading, out-of-order, superscalar, 4 intelligent simultaneous threads per core, 12 execution units per core, 8 cores per chip, 80 MB L3 cache, true hardware entropy generator, hardware-assisted cryptographic acceleration, fixed-point unit, decimal fixed-point unit, Turbo Core, decimal floating-point unit Yes ??? Yes
IBM POWER8 2013 15–23 Superscalar, L4 cache Maybe Yes
IBM POWER9 2017 12–16 Superscalar, out-of-order execution, L4 cache
IBM Power10 2021 Superscalar ??? Yes
IBM Power Processing Unit 2005 Superscaler, in-order execution No Yes
IBM Cell 2006 Multi-core, multithreading, 2-way simultaneous multithreading (PPE), Power Processor Element, Synergistic Processing Elements, Element Interconnect Bus, in-order execution No Yes
IBM Cyclops64 Multi-core, multithreading, 2 threads per core, in-order No Yes
IBM zEnterprise zEC12 2012 15/16/17 Multi-core, 6 cores per chip, up to 5.5 GHz, superscalar, out-of-order, 48 MB L3 cache, 384 MB shared L4 cache Yes ??? Yes
IBM A2 15 multicore, 4-way simultaneous multithreaded
Sony Emotion Engine 2000 6 designed for the playstation 2, Superscaler, in-order-execution 250nm No Yes
PowerPC 401 1996 3
PowerPC 405 1998 5
PowerPC 440 1999 7
PowerPC 470 2009 9 Symmetric multiprocessing (SMP)
PowerPC e300 4 Superscalar, branch prediction Maybe Yes
PowerPC e500 Dual 7 stage Multi-core
PowerPC e600 3-issue 7 stage Superscalar out-of-order execution, branch prediction Yes ??? Yes
PowerPC e5500 2010 4-issue 7 stage Out-of-order, multi-core Yes ??? Yes
PowerPC e6500 2012 Multi-core
PowerPC 603 4 5 execution units, branch prediction, no SMP
PowerPC 603q 1996 5 In-order No Maybe
PowerPC 604 1994 6 Superscalar, out-of-order execution, 6 execution units, SMP support Yes ??? Yes
PowerPC 620 1997 5 Out-of-order execution, SMP support Yes ??? Yes
PWRficient PA6T 2007 Superscalar, out-of-order execution, 6 execution units Yes ??? Yes
R4000 1991 8 Scalar
StrongARM SA-110 1996 5 Scalar, in-order No No?
SuperH SH2 5
SuperH SH2A 2006 5 Superscalar, Harvard architecture Maybe? Yes
SPARC Superscalar ??? Yes
hyperSPARC 1993 Superscalar ??? Yes
SuperSPARC 1992 Superscalar, in-order No Yes
SPARC64 VI/VII/VII+ 2007 Superscalar, out-of-order[10] Yes ??? Yes
UltraSPARC 1995 9
UltraSPARC T1 2005 6 Open source, multithreading, multi-core, 4 threads per core, scalar, in-order, integrated memory controller, 1 FPU No Maybe?
UltraSPARC T2 2007 8 Open source, multithreading, multi-core, 8 threads per core
SPARC T3 2010 8 Multithreading, multi-core, 8 threads per core, SMP, 16 cores per chip, 2 MB L3 cache, in-order, hardware random number generator
Oracle SPARC T4 2011 16 Multithreading, multi-core, 8 fine-grained threads per core of which 2 can be executed simultaneously, 2-way simultaneous multithreading, SMP, 8 cores per chip, out-of-order, 4 MB L3 cache, out-of order, Hardware random number generator Yes ??? Yes
Oracle Corporation SPARC T5 2013 16 Multithreading, multi-core, 8 fine-grained threads per core of which 2 can be executed simultaneously, 2-way simultaneous multithreading, 16 cores per chip, out-of-order, 16-way associative shared 8 MB L3 cache, hardware-assisted cryptographic acceleration, stream-processing unit, out-of order execution, RAS features, 16 cryptography units per chip, hardware random number generator Yes ??? Yes
Oracle SPARC M5 16 Multithreading, multi-core, 8 fine-grained threads per core of which 2 can be executed simultaneously, 2-way simultaneous multithreading, 6 cores per chip, out-of-order, 48 MB L3 cache, out-of order execution, RAS features, stream-processing unit, hardware-assisted cryptographic acceleration, 6 cryptography units per chip, Hardware random number generator Yes ??? Yes
Fujitsu SPARC64 X Multithreading, multi-core, 2-way simultaneous multithreading, 16 cores per chip, out-of order, 24 MB L2 cache, out-of order, RAS features Yes ??? Yes
Imagination Technologies MIPS Warrior
VIA C7 2005 In-order execution No Maybe?
VIA Nano (Isaiah) 2008 Superscalar out-of-order execution, branch prediction, 7 execution units Yes ??? Yes
WinChip 1997 4 In-order execution No Maybe?

See also

Notes

  1. According to AMDs K5 data sheet. The design incorporates many ideas and functional parts from AMDs Am29000 32-bit RISC microprocessor design.
  2. According to AMDs K6 data sheet. The design is based on NexGen's Nx686 and therefore not a direct successor to the K5.

References

  1. "Products We Design". amd.com. https://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_1260_1288%5E1295,00.html. Retrieved 19 January 2014. 
  2. "Almost every smartphone uses a processor based on the ARM1 chip created in 1985." & "The Berkeley and Stanford research papers on RISC inspired the ARM designers to choose a RISC design." around 1980
  3. "The ARM1 is much smaller and contained 25,000 transistors compared to 275,000 in the 386."
  4. "The chip has 26 address lines, allowing it to access 64MB of memory, and has 32 data lines, allowing it to read or write 32 bits at a time."
  5. "The 386 was higher density, with a 1.5 micron process compared to 3 micron for the ARM1."
  6. 6.0 6.1 "wp-content/uploads/2013/07/AMD-Steamroller-vs-Bulldozer". cdn3.wccftech.com. http://cdn3.wccftech.com/wp-content/uploads/2013/07/AMD-Steamroller-vs-Bulldozer.jpg. Retrieved 19 January 2014. 
  7. Kozierok, Charles M. (17 April 2001). "Cyrix 5x86 ("M1sc")". http://www.pcguide.com/ref/cpu/fam/g4C5x86-c.html. Retrieved 19 January 2014. 
  8. 8.0 8.1 "Computer Science 246: Computer Architecture". Harvard University. http://www.eecs.harvard.edu/cs246/lectures/cs246-MOBROBP6R10K.pdf. Retrieved 23 December 2013. "P6 pipeline" 
  9. Intel Itanium 2 Processor Hardware Developer's Manual. p. 14. http://www.intel.com/design/itanium2/manuals/25110901.pdf (2002) Retrieved 28 November 2011
  10. "Multi Core Processor SPARC64 Series : Fujitsu Global". fujitsu.com. http://www.fujitsu.com/global/services/computing/server/sparcenterprise/technology/performance/processor.html. Retrieved 19 January 2014.