Engineering:Comparison of ARM cores

From HandWiki

This is a comparison of microarchitectures based on the ARM family of instruction sets designed by ARM Holdings and 3rd parties, sorted by version of the ARM instruction set, release and name.

ARM cores

Designed by ARM

Family Architecture Core Decode width Execution ports Pipeline depth Out-of-order execution FPU Pipelined VFP FPU registers NEON
(SIMD)
Process technology L0 cache L1 cache
I.cache+D.cache
(in KiB)
L2 cache L3 cache Core configurations Speed per core
(DMIPS/MHz)
ARM11 ARMv6 ARM1136J(F)-S single-issue ? 8 stages No VFPv2 Yes (8 or 32) × 32-bit No 90/65/45 nm ? Varying, typically 16 KB + 16 KB Varying, typically none N/A 1-4 1.25
Cortex-A ARMv7-A ARM Cortex-A5 1 ? 8 No VFPv4 (optional) ? 16 × 64-bit 64-bit wide (optional) ? ? ? 4-64 KB / core ? 1, 2, 4 1.57
ARM Cortex-A7 2 ? 8 No VFPv4 Yes 16 × 64-bit 64-bit wide 40/28 nm ? 8-64 KB / core up to 1 MB (optional) ? 1, 2, 4, 8 1.9
ARM Cortex-A8 2 ? 13 No VFPv3 No 32 × 64-bit 64-bit wide 65/55/45 nm ? 32 KB + 32 KB 256 or 512 (typical) KB ? 1, 4 2.0
ARM Cortex-A9 2 3 8 Yes VFPv3 (optional) Yes (16 or 32) × 64-bit 64-bit wide (optional) 65/45/40/32/28 nm ? 32 KB + 32 KB 1 MB ? 1, 2, 4 2.5
ARM Cortex-A12 3 ? 11 Yes VFPv4 Yes 32 × 64-bit 128-bit wide ? ? 32-64 KB + 32 KB 256 KB to 8 MB ? 1, 2, 4 3.0
ARM Cortex-A15 3 7 15/17-25 Yes VFPv4 Yes 32 × 64-bit 128-bit wide 32/28 nm ? 32 KB + 32 KB per core up to 4 MB per cluster, up to 8 MB per chip ? 2, 4, 8 (4×2) 3.5 to 4.01
ARM Cortex-A17 2 ? 11+ Yes VFPv4 Yes 32 × 64-bit 128-bit wide ? ? 32 KB + 32 KB per core 256 KB up to 8 MB ? up to 4 ?
Cortex-A[1] ARMv8-A ARM Cortex-A53 2-wide ? 8 Stages No VFPv4 Yes 32 × 64-bit 128-bit wide 28 / 20 ? 8–64 + 8–64 128KiB–2 MiB ? 1–4+ 2.3
ARM Cortex-A57 3-wide ? ? Yes VFPv4 Yes 32 × 64-bit 128-bit wide 28 / 20 ? 48 + 32 0.5–2 MiB ? 1–4+ 4.1 to 4.76
Family Architecture Core Decode width Execution ports Pipeline depth Out-of-order execution FPU Pipelined VFP FPU registers NEON
(SIMD)
Process technology L0 cache L1 cache
I.cache+D.cache
(in KiB)
L2 cache L3 cache Core configurations Speed per core
(DMIPS/MHz)

Designed by third parties

These cores implement the ARM instruction set, and were developed independently by companies with an architectural license from ARM.

Core Decode width Execution ports Pipeline depth Out-of-order execution FPU Pipelined VFP FPU registers NEON
(SIMD)
Process technology L0 cache L1 cache
I.cache+D.cache
(in KiB)
L2 cache L3 cache Core configurations Speed per core
(DMIPS/MHz)
Qualcomm Scorpion 2 ? 10 non-speculative[2] VFPv3 Yes ? 128-bit wide 65/45 nm ? 32 KB + 32 KB 256 KB (single-core)
512 KB (dual-core)
? 1, 2 2.1
Qualcomm Krait[3] 3 7 11 Yes VFPv4[4] Yes ? 128-bit wide 28 nm KB + 4 KB direct mapped 16 KB + 16 KB 4-way set associative 1 MB 8-way set associative (dual-core)/2 MB (quad-core) ? 2, 4 3.3 (Krait)
3.1 (Krait 200)
3.4 (Krait 300)[5]
3.6 (Krait 400)
Apple Swift 3 5 12 Yes VFPv4 Yes 32 × 64-bit 128-bit wide 32 nm ? 32 KB + 32 KB 1 MB ? 2 3.5
Apple Cyclone 6 9 15 Yes VFPv4 Yes ? 128-bit wide 28 ? 64 + 64 1 MiB 4 MiB 2 ?
Nvidia Denver 7 ? ? Yes VFPv4 Yes ? ? 28 ? ? ? ? 2 ?
Cavium ThunderX 2[6] 4? ? ? ? ? ? ? 28 ? ? ? ? 8–16, 24–48 ?
AppliedMicro X-Gene 4 8 ? ? ? ? ? ? ? ? ? ? ? ? ?
Broadcom Vulcan 4 6 ? ? ? ? ? ? ? ? ? ? ? ? ?
Core Decode width Execution ports Pipeline depth Out-of-order execution FPU Pipelined VFP FPU registers NEON
(SIMD)
Process technology L0 cache L1 cache
I.cache+D.cache
(in KiB)
L2 cache L3 cache Core configurations Speed per core
(DMIPS/MHz)

See also

References