Comparison of CPU microarchitectures

From HandWiki

The following is a comparison of CPU microarchitectures.

Microarchitecture Year Pipeline stages Misc
Elbrus-8S 2014 VLIW, Elbrus (proprietary, closed) version 5, 64-bit
AMD K5 1996 5 Superscalar, branch prediction, speculative execution, out-of-order execution, register renaming[lower-alpha 1]
AMD K6 1997 6 Superscalar, branch prediction, speculative execution, out-of-order execution, register renaming[lower-alpha 2]
AMD K6-III 1999 Branch prediction, speculative execution, out-of-order execution[1]
AMD K7 1999 Out-of-order execution, branch prediction, Harvard architecture
AMD K8 2003 64-bit, integrated memory controller, 16 byte instruction prefetching
AMD K10 2007 Superscalar, out-of-order execution, 32-way set associative L3 victim cache, 32-byte instruction prefetching
ARM7TDMI (-S) 2001 3
ARM7EJ-S 2001 5
ARM810 5 static branch prediction, double-bandwidth memory
ARM9TDMI 1998 5
ARM1020E 6
XScale PXA210/PXA250 2002 7
ARM1136J(F)-S 8
ARM1156T2(F)-S 9
ARM Cortex-A5 8 Multi-core, single issue, in-order
ARM Cortex-A7 MPCore 8 Partial dual-issue, in-order, 2-way set associative level 1 instruction cache
ARM Cortex-A8 2005 13 Dual-issue, in-order, speculative execution, superscalar, 2-way pipeline decode
ARM Cortex-A9 MPCore 2007 8–11 Out-of-order, speculative issue, superscalar
ARM Cortex-A15 MPCore 2010 15 Multi-core (up to 16), out-of-order, speculative issue, 3-way superscalar
ARM Cortex-A53 2012 Partial dual-issue, in-order
ARM Cortex-A55 2017 8 in-order, speculative execution
ARM Cortex-A57 2012 Deeply out-of-order, wide multi-issue, 3-way superscalar
ARM Cortex-A72 2015
ARM Cortex-A73 2016 Out-of-order superscalar
ARM Cortex-A75 2017 11–13 Out-of-order superscalar, speculative execution, register renaming, 3-way
ARM Cortex-A76 2018 13 Out-of-order superscalar, 4-way pipeline decode
ARM Cortex-A77 2019 13 Out-of-order superscalar, speculative execution, register renaming, 6-way pipeline decode, 10-issue, branch prediction, L3 cache
ARM Cortex-A78 2020 13 Out-of-order superscalar, register renaming, 4-way pipeline decode, 6 instruction per cycle, branch prediction, L3 cache
ARM Cortex-A710 2021 10
ARM Cortex-X1 2020 13 5-wide decode out-of-order superscalar, L3 cache
ARM Cortex-X2 2021 10
ARM Cortex-X3 2022 9
ARM Cortex-X4 2023 10
AVR32 AP7 7
AVR32 UC3 3 Harvard architecture
Bobcat 2011 Out-of-order execution
Bulldozer 2011 20 Shared multithreaded L2 cache, multithreading, multi-core, around 20 stage long pipeline, integrated memory controller, out-of-order, superscalar, up to 16 cores per chip, up to 16 MB L3 cache, Virtualization, Turbo Core, FlexFPU which uses simultaneous multithreading[2]
Piledriver 2012 Shared multithreaded L2 cache, multithreading, multi-core, around 20 stage long pipeline, integrated memory controller, out-of-order, superscalar, up to 16 MB L2 cache, up to 16 MB L3 cache, Virtualization, FlexFPU which use simultaneous multithreading,[2] up to 16 cores per chip, up to 5 GHz clock speed, up to 220 W TDP, Turbo Core
Steamroller 2014 Multi-core, branch prediction
Excavator 2015 20 Multi-core
Zen 2017 19 Multi-core, superscalar, 2-way simultaneous multithreading, 4-way decode, out-of-order execution, L3 cache
Zen+ 2018 19 Multi-core, superscalar, 4-way decode, out-of-order execution, L3 cache
Zen 2 2019 19 Multi-chip module, multi-core, superscalar, 4-way decode, out-of-order execution, L3 cache
Zen 3 2020 19 Multi-chip module, multi-core, superscalar, 4-way decode, out-of-order execution, SMT, L3 cache
Zen 4 2022 Multi-chip module, multi-core, superscalar, L3 cache
Crusoe 2000 In-order execution, 128-bit VLIW, integrated memory controller
Efficeon 2004 In-order execution, 256-bit VLIW, fully integrated memory controller
Cyrix Cx5x86 1995 6[3] Branch prediction
Cyrix 6x86 1996 Superscalar, superpipelined, register renaming, speculative execution, out-of-order execution
DLX 5
eSi-3200 5 In-order, speculative issue
eSi-3250 5 In-order, speculative issue
EV4 (Alpha 21064) Superscalar
EV7 (Alpha 21364) Superscalar design with out-of-order execution, branch prediction, 4-way simultaneous multithreading, integrated memory controller
EV8 (Alpha 21464) Superscalar design with out-of-order execution
65k Ultra low power consumption, register renaming, out-of-order execution, branch prediction, multi-core, module, capable of reach higher clock
P5 (Pentium) 1993 5 Superscalar
P6 (Pentium Pro) 14 Speculative execution, register renaming, superscalar design with out-of-order execution
P6 (Pentium II) 14[4] Branch prediction
P6 (Pentium III) 1995 14[4]
Intel Itanium "Merced" 2001 Single core, L3 cache
Intel Itanium 2 "McKinley" 2002 11[5] Speculative execution, branch prediction, register renaming, 30 execution units, multithreading, multi-core, coarse-grained multithreading, 2-way simultaneous multithreading, Dual-domain multithreading, Turbo Boost, Virtualization, VLIW, RAS with Advanced Machine Check Architecture, Instruction Replay technology, Cache Safe technology, Enhanced SpeedStep technology
Intel NetBurst (Willamette) 2000 20 2-way simultaneous multithreading (Hyper-threading), Rapid Execution Engine, Execution Trace Cache, quad-pumped Front-Side Bus, Hyper-pipelined Technology, superscalar, out-of order
NetBurst (Northwood) 2002 20 2-way simultaneous multithreading
NetBurst (Prescott) 2004 31 2-way simultaneous multithreading
NetBurst (Cedar Mill) 2006 31 2-way simultaneous multithreading
Intel Core 2006 12 Multi-core, out-of-order, 4-way superscalar
Intel Atom 16 2-way simultaneous multithreading, in-order, no instruction reordering, speculative execution, or register renaming
Intel Atom Oak Trail 2-way simultaneous multithreading, in-order, burst mode, 512 KB L2 cache
Intel Atom Bonnell 2008 SMT
Intel Atom Silvermont 2013 Out-of-order execution
Intel Atom Goldmont 2016 Multi-core, out-of-order execution, 3-wide superscalar pipeline, L2 cache
Intel Atom Goldmont Plus 2017 Multi-core
Intel Atom Tremont 2019 Multi-core, superscalar, out-of-order execution, speculative execution, register renaming
Intel Atom Gracemont 2021 Multi-core, superscalar, out-of-order execution, speculative execution, register renaming
Nehalem 2008 14 2-way simultaneous multithreading, out-of-order, 6-way superscalar, integrated memory controller, L1/L2/L3 cache, Turbo Boost
Sandy Bridge 2011 14 2-way simultaneous multithreading, multi-core, on-die graphics and PCIe controller, system agent with integrated memory and display controller, ring interconnect, L1/L2/L3 cache, micro-op cache, 2 threads per core, Turbo Boost,
Intel Haswell 2013 14–19 SoC design, multi-core, multithreading, 2-way simultaneous multithreading, hardware-based transactional memory (in selected models), L4 cache (in GT3 models), Turbo Boost, out-of-order execution, superscalar, up to 8 MB L3 cache (mainstream), up to 20 MB L3 cache (Extreme)
Broadwell 2014 14–19 Multi-core, multithreading
Skylake 2015 14–19 Multi-core, L4 cache on certain Skylake-R, Skylake-U and Skylake-Y models. On-package PCH on U, Y, m3, m5 and m7 models. 5 wide superscalar/5 issues.
Kaby Lake 2016 14–19 Multi-core, L4 cache on certain low and ultra low power models (Kaby Lake-U and Kaby Lake-Y),
Intel Sunny Cove 2019 14–20 Multicore, 2-way multithreading, massive OoOE engine, 5 wide superscalar/5 issue.
Intel Cypress Cove 2021 14 multicore, 5 wide superscalar/6 issues, massive OoOE engine, big core design.
Intel Willow Cove 2020 Multicore
Intel Golden Cove 2021 Multicore
Intel Xeon Phi 7120x 2013 7-stage integer, 6-stage vector Multi-core, multithreading, 4 hardware-based simultaneous threads per core which can't be disabled unlike regular HyperThreading, Time-multiplexed multithreading, 61 cores per chip, 244 threads per chip, 30.5 MB L2 cache, 300 W TDP, Turbo Boost, in-order dual-issue pipelines, coprocessor, Floating-point accelerator, 512-bit wide Vector-FPU
LatticeMico32 2006 6 Harvard architecture
Nvidia Denver 2014 Multicore, superscalar, 2-way decode, L2
Nvidia Carmel 2018 Multicore, 10-way superscalar, L3
POWER1 1990 Superscalar, out-of-order execution
POWER3 1998 Superscalar, out-of-order execution
POWER4 2001 Superscalar, speculative execution, out-of-order execution
POWER5 2004 2-way simultaneous multithreading, out-of-order execution, integrated memory controller
IBM POWER6 2007 2-way simultaneous multithreading, in-order execution, up to 5 GHz
IBM POWER7+ Multi-core, multithreading, out-of-order, superscalar, 4 intelligent simultaneous threads per core, 12 execution units per core, 8 cores per chip, 80 MB L3 cache, true hardware entropy generator, hardware-assisted cryptographic acceleration, fixed-point unit, decimal fixed-point unit, Turbo Core, decimal floating-point unit
IBM POWER8 2013 15–23 Superscalar, L4 cache
IBM POWER9 2017 12–16 Superscalar, out-of-order execution, L4 cache
IBM Power10 2021 Superscalar
IBM Cell 2006 Multi-core, multithreading, 2-way simultaneous multithreading (PPE), Power Processor Element, Synergistic Processing Elements, Element Interconnect Bus, in-order execution
IBM Cyclops64 Multi-core, multithreading, 2 threads per core, in-order
IBM zEnterprise zEC12 2012 15/16/17 Multi-core, 6 cores per chip, up to 5.5 GHz, superscalar, out-of-order, 48 MB L3 cache, 384 MB shared L4 cache
IBM A2 15 multicore, 4-way simultaneous multithreaded
PowerPC 401 1996 3
PowerPC 405 1998 5
PowerPC 440 1999 7
PowerPC 470 2009 9 Symmetric multiprocessing (SMP)
PowerPC e300 4 Superscalar, branch prediction
PowerPC e500 Dual 7 stage Multi-core
PowerPC e600 3-issue 7 stage Superscalar out-of-order execution, branch prediction
PowerPC e5500 2010 4-issue 7 stage Out-of-order, multi-core
PowerPC e6500 2012 Multi-core
PowerPC 603 4 5 execution units, branch prediction, no SMP
PowerPC 603q 1996 5 In-order
PowerPC 604 1994 6 Superscalar, out-of-order execution, 6 execution units, SMP support
PowerPC 620 1997 5 Out-of-order execution, SMP support
PWRficient PA6T 2007 Superscalar, out-of-order execution, 6 execution units
R4000 1991 8 Scalar
StrongARM SA-110 1996 5 Scalar, in-order
SuperH SH2 5
SuperH SH2A 2006 5 Superscalar, Harvard architecture
SPARC Superscalar
hyperSPARC 1993 Superscalar
SuperSPARC 1992 Superscalar, in-order
SPARC64 VI/VII/VII+ 2007 Superscalar, out-of-order[6]
UltraSPARC 1995 9
UltraSPARC T1 2005 6 Open source, multithreading, multi-core, 4 threads per core, scalar, in-order, integrated memory controller, 1 FPU
UltraSPARC T2 2007 8 Open source, multithreading, multi-core, 8 threads per core
SPARC T3 2010 8 Multithreading, multi-core, 8 threads per core, SMP, 16 cores per chip, 2 MB L3 cache, in-order, hardware random number generator
Oracle SPARC T4 2011 16 Multithreading, multi-core, 8 fine-grained threads per core of which 2 can be executed simultaneously, 2-way simultaneous multithreading, SMP, 8 cores per chip, out-of-order, 4 MB L3 cache, out-of order, Hardware random number generator
Oracle Corporation SPARC T5 2013 16 Multithreading, multi-core, 8 fine-grained threads per core of which 2 can be executed simultaneously, 2-way simultaneous multithreading, 16 cores per chip, out-of-order, 16-way associative shared 8 MB L3 cache, hardware-assisted cryptographic acceleration, stream-processing unit, out-of order execution, RAS features, 16 cryptography units per chip, hardware random number generator
Oracle SPARC M5 16 Multithreading, multi-core, 8 fine-grained threads per core of which 2 can be executed simultaneously, 2-way simultaneous multithreading, 6 cores per chip, out-of-order, 48 MB L3 cache, out-of order execution, RAS features, stream-processing unit, hardware-assisted cryptographic acceleration, 6 cryptography units per chip, Hardware random number generator
Fujitsu SPARC64 X Multithreading, multi-core, 2-way simultaneous multithreading, 16 cores per chip, out-of order, 24 MB L2 cache, out-of order, RAS features
Imagination Technologies MIPS Warrior
VIA C7 2005 In-order execution
VIA Nano (Isaiah) 2008 Superscalar out-of-order execution, branch prediction, 7 execution units
WinChip 1997 4 In-order execution

See also

Notes

  1. According to AMDs K5 data sheet. The design incorporates many ideas and functional parts from AMDs Am29000 32-bit RISC microprocessor design.
  2. According to AMDs K6 data sheet. The design is based on NexGen's Nx686 and therefore not a direct successor to the K5.

References