Memory-level parallelism

Short description: Computer architecture feature

This article includes a list of references, related reading or external links, but its sources remain unclear because it lacks inline citations. Please help to improve this article by introducing more precise citations. (July 2025) (Learn how and when to remove this template message)

In computer architecture, memory-level parallelism (MLP) is the ability to have pending multiple memory operations, in particular cache misses or translation lookaside buffer (TLB) misses, at the same time.

In a single processor, MLP may be considered a form of instruction-level parallelism (ILP). However, ILP is often conflated with superscalar, the ability to execute more than one instruction at the same time, e.g. a processor such as the Intel Pentium Pro is five-way superscalar, with the ability to start executing five different microinstructions in a given cycle, but it can handle four different cache misses for up to 20 different load microinstructions at any time.

It is possible to have a machine that is not superscalar but which nevertheless has high MLP.

Arguably a machine that has no ILP, which is not superscalar, which executes one instruction at a time in a non-pipelined manner, but which performs hardware prefetching (not software instruction-level prefetching) exhibits MLP (due to multiple prefetches outstanding) but not ILP. This is because there are multiple memory operations outstanding, but not instructions. Instructions are often conflated with operations.

Furthermore, multiprocessor and multithreaded computer systems may be said to exhibit MLP and ILP due to parallelism—but not intra-thread, single process, ILP and MLP. Often, however, we restrict the terms MLP and ILP to refer to extracting such parallelism from what appears to be non-parallel single threaded code.

References

Glew, A. (1998). "MLP yes! ILP no!". ASPLOS VIII (abstract / slides)
Ronen, R.; Mendelson, A.; Lai, K.; Shih-Lien Lu; Pollack, F.; Shen, J. P. (2001). "Coming challenges in microarchitecture and architecture". Proc. IEEE 89 (3): 325–340. doi:10.1109/5.915377.
Zhou, H.; Conte, T. M. (2003). "Enhancing memory level parallelism via recovery-free value prediction". ICS'03. pp. 326–335. doi:10.1145/782814.782859. ISBN 1-58113-733-8.
Yuan Chou; Fahs, B.; Abraham, S. (2004). "Microarchitecture optimizations for exploiting memory-level parallelism". ISCA'04. pp. 76–87. doi:10.1109/ISCA.2004.1310765. ISBN 0-7695-2143-6.
Qureshi, M. K.; Lynch, D. N.; Mutlu, O.; Patt, Y. N. (2006). "A Case for MLP-Aware Cache Replacement". ISCA'06. pp. 167–178. doi:10.1109/ISCA.2006.5. ISBN 0-7695-2608-X.
Van Craeynest, K.; Eyerman, S.; Eeckhout, L. (2009). "MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor". HiPEAC 2009. 5409. pp. 110–124. doi:10.1007/978-3-540-92990-1_10. ISBN 978-3-540-92989-5.

v t e Parallel computing
General	Distributed computing Parallel computing Massively parallel Cloud computing High-performance computing Multiprocessing Manycore processor GPGPU Computer network Systolic array
Levels	Bit Instruction Thread Task Data Memory Loop Pipeline
Multithreading	Temporal Simultaneous (SMT) Speculative (SpMT) Preemptive Cooperative Clustered Multi-Thread (CMT) Hardware scout
Theory	PRAM model PEM Model Analysis of parallel algorithms Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric Slowdown Speedup
Elements	Process Thread Fiber Instruction window Array data structure
Coordination	Multiprocessing Memory coherency Cache coherency Cache invalidation Barrier Synchronization Application checkpointing
Programming	Stream processing Dataflow programming Models Implicit parallelism Explicit parallelism Concurrency Non-blocking algorithm
Hardware	Flynn's taxonomy SISD SIMD SIMT MISD MIMD Dataflow architecture Pipelined processor Superscalar processor Vector processor Multiprocessor symmetric asymmetric Memory shared distributed distributed shared UMA NUMA COMA Massively parallel computer Computer cluster Grid computer Hardware acceleration
APIs	Ateji PX Boost.Thread Chapel HPX Charm++ Cilk Coarray Fortran CUDA HIP Dryad C++ AMP Global Arrays MPI OpenMP OpenCL OpenHMPP OpenACC TPL PLINQ PVM POSIX Threads RaftLib UPC TBB ZPL
Problems	Automatic parallelization Deadlock Livelock Deterministic algorithm Embarrassingly parallel Parallel slowdown Race condition Software lockout Scalability Starvation
Category: parallel computing Media related to Parallel computing at Wikimedia Commons

Processor technologies

Models

Architecture

Instruction set
architectures

Types	CISC RISC Application-specific EDGE TRIPS VLIW EPIC MISC OISC NISC ZISC comparison addressing modes

Execution

Instruction pipelining	Pipeline stall Operand forwarding Classic RISC pipeline
Hazards	Data dependency Structural Control False sharing
Out-of-order	Tomasulo algorithm Reservation station Re-order buffer Register renaming
Speculative	Branch prediction Memory dependence prediction

Parallelism

Level	Bit Bit-serial Word Instruction Pipelining Scalar Superscalar Task Thread Process Data Vector Memory Distributed
Multithreading	Temporal Simultaneous Hyperthreading Speculative Preemptive Cooperative
Flynn's taxonomy	SISD SIMD SWAR SIMT MISD MIMD SPMD

Processor
performance

Transistor count
Instructions per cycle (IPC)
- Cycles per instruction (CPI)
Instructions per second (IPS)
Floating-point operations per second (FLOPS)
Transactions per second (TPS)
Synaptic updates per second (SUPS)
Performance per watt (PPW)
Cache performance metrics
Computer performance by orders of magnitude

Types

By application	Microprocessor Microcontroller Mobile Notebook Ultra-low-voltage ASIP
Systems on chip	System on a chip (SoC) Multiprocessor (MPSoC) Programmable (PSoC) Network on a chip (NoC)
Hardware accelerators	AI accelerator Vision processing unit (VPU) Physics processing unit (PPU) Digital signal processor (DSP) Tensor processing unit (TPU) Secure cryptoprocessor Network processor Baseband processor

Word size

Core count

Components

Functional units	Arithmetic logic unit (ALU) Address generation unit (AGU) Floating-point unit (FPU) Memory management unit (MMU) Load–store unit Translation lookaside buffer (TLB) Integrated memory controller (IMC)
Logic	Combinational Sequential Glue Logic gate Quantum Array
Registers	Processor register Status register Stack register Register file Memory buffer Program counter
Control unit	Instruction unit Data buffer Write buffer Microcode ROM Counter
Datapath	Multiplexer Demultiplexer Adder Multiplier CPU Binary decoder Address decoder Sum addressed decoder Barrel shifter
Circuitry	Integrated circuit 3D Mixed-signal Power management Boolean Digital Analog Quantum Switch

Power
management

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Memory-level parallelism. Read more

Anonymous

Search

Memory-level parallelism

Namespaces

More

Page actions

See also

References

Navigation

Navigation

Resources

Help

googletranslator

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Memory-level parallelism

See also

References

Navigation

Wiki tools

Page tools

Other projects

Categories