Victim cache

From HandWiki

A victim cache is a small, usually fully associative cache placed in the refill path of a CPU cache that stores all the blocks evicted from that level of cache, originally proposed in 1990. In modern architectures, this function is typically performed by Level 3 or Level 4 caches.

Overview

Victim caching is a hardware technique to improve performance of caches proposed by Norman Jouppi. As mentioned in his paper:[1] 

Miss caching places a fully-associative cache between cache and its re-fill path. Misses in the cache that hit in the miss cache have a one cycle penalty, as opposed to a many cycle miss penalty without the miss cache. Victim Caching is an improvement to miss caching that loads the small fully-associative cache with victim of a miss and not the requested cache line. [1]

A victim cache is a hardware cache designed to decrease conflict misses and improve hit latency for direct-mapped caches. It is employed at the refill path of a Level 1 cache, such that any cache-line which gets evicted from the cache is cached in the victim cache. Thus, the victim cache gets populated only when data is thrown out of Level 1 cache. In case of a miss in Level 1, the missed entry is looked up in the victim cache. If the resulting access is a hit, the contents of the Level 1 cache-line and the matching victim cache line are swapped.

Though initially proposed by Jouppi to improve cache performance of a direct-mapped cache Level 1, modern day microprocessors with multi-level cache hierarchy employ Level 3 or Level 4 cache to act as victim cache for the cache lying above it in the memory hierarchy. Intel's Crystal Well[2] of its Haswell processors introduced an on-package Level 4 cache which serves as a victim cache to processor's Level 3 cache.[3] A 4–12 MB Level 3 cache is used as a victim cache in POWER5 (IBM) microprocessors.

Background

As hardware architecture and technology advanced, processor performance and frequency grew much faster than memory cycle times, leading to a large gap in performance. The problem of increasing memory latency, relative to processor speed, has been dealt with by adding high speed cache memory.

Direct-mapped caches have faster access time than set-associative caches. However, when multiple cache blocks in memory map to the same cache-line in a direct-mapped cache, they end up evicting each other when anyone of them is accessed. This is known as the cache-conflict problem. This problem is resolved by increasing the associativity of the cache. But there is a limit to which associativity can be increased owing to the complexity in its implementation. Thus, for solving the cache conflict problem for a cache with limited associativity, a victim cache is employed.

Implementation

The behavior of a victim cache in its respective interaction with the corresponding level cache is given below:

Cache Hit: No action

Cache Miss, Victim Hit: The block is in the victim cache and the one in the cache are replaced with each other. This new entry in victim cache becomes the most recently used block.

Implementation Example

Cache Miss, Victim Miss: The block is brought to cache from next level. The block evicted from the cache gets stored in Victim cache.

Example:

Suppose a direct-mapped L1 cache with blocks A, B pointing to the same set. It is linked to a 2 entry fully associative victim cache with blocks C, D in it.

The trace to be followed: A, B, A, B…

From the diagram, we can see that, in case of victim cache (VC) hit, blocks A and B are swapped. The least recently used block of VC remains as it is. Hence, it gives an illusion of associativity to the direct-mapped L1 cache, in turn reducing the conflict misses.

In case of two caches, L1 and L2 with exclusive cache policy (L2 does not cache same the memory locations as L1), L2 acts as the victim cache for L1.

Performance implication

While measuring performance improvement by using victim cache, Jouppi[1] assumed a Level-1 direct-mapped cache augmented with a fully associative cache. For the test suite used by him, on an average 39% of the Level-1 data cache misses are found to be conflict misses, while on an average 29% of the Level-1 instruction misses are found to be conflict misses.[1] Since conflict misses amount to large percentage of total misses, therefore providing additional associativity by augmenting the Level 1 cache with a victim cache is bound to improve total miss rate significantly.

[4] Experimental results are deduced by considering a 32-Kb Direct-Mapped, 2-way and fully associative cache augmented with a 256 block (8 KB) victim cache and running on it 8 randomly selected SPEC95 Benchmarks. While the results cannot be generalized for all benchmarks, adding a victim cache provides a miss rate reduction ranging from 10% to 100% for all the cache configuration. The returns although seem to level off beyond victim cache size of 50 blocks, thus proving Jouppi's[1] observation that victim cache benefits reach a plateau after the first few victim blocks.

Miss rate reduction for a 64 KB cache size are found to be significantly lower, proving that victim caching is not indefinitely scalable.[4]

While comparing various cache configuration it was found that in certain cases adding a small victim cache can give performance benefit equivalent to that observed by multiplying the cache size by 2.[4]

References

  1. 1.0 1.1 1.2 1.3 1.4 Jouppi, N. P. (1990-05-01). "Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers". 17th Annual International Symposium on Computer Architecture, 1990. Proceedings. pp. 364–373. doi:10.1109/ISCA.1990.134547. ISBN 0-8186-2047-1. 
  2. "Products (Formerly Crystal Well)". Intel® ARK (Product Specs). http://ark.intel.com/products/codename/51802/Crystal-Well#@All. 
  3. Shimpi, Anand Lal. "Intel Iris Pro 5200 Graphics Review: Core i7-4950HQ Tested". http://www.anandtech.com/show/6993/intel-iris-pro-5200-graphics-review-core-i74950hq-tested/3. 
  4. 4.0 4.1 4.2 Victim-Caching for Large Caches and Modern Workloads. 1996.