Weissman score

From HandWiki
Short description: Performance metric for lossless compression

The Weissman score is a performance metric for lossless compression applications. It was developed by Tsachy Weissman, a professor at Stanford University, and Vinith Misra, a graduate student, at the request of producers for HBO's television series Silicon Valley, a television show about a fictional tech start-up working on a data compression algorithm.[1][2][3][4] It compares both required time and compression ratio of measured applications, with those of a de facto standard according to the data type.

The formula is the following; where r is the compression ratio, T is the time required to compress, the overlined ones are the same metrics for a standard compressor, and alpha is a scaling constant.[1] [math]\displaystyle{ W = \alpha {r \over \overline{r}} {\log{\overline{T}} \over \log{T}} }[/math]

Weissman score has been used by Daniel Reiter Horn and Mehant Baid of Dropbox to explain real-world work on lossless compression. According to the authors it "favors compression speed over ratio in most cases."[5]

Example

This example shows the score for the data of the Hutter Prize,[6] using the paq8f as a standard and 1 as the scaling constant.

Application Compression ratio Compression time [min] Weissman score
paq8f 5.467600 300 1.000000
raq8g 5.514990 420 0.720477
paq8hkcc 5.682593 300 1.039321
paq8hp1 5.692566 300 1.041145
paq8hp2 5.750279 300 1.051701
paq8hp3 5.800033 300 1.060801
paq8hp4 5.868829 300 1.073826
paq8hp5 5.917719 300 1.082325
paq8hp6 5.976643 300 1.093102
paq8hp12 6.104276 540 0.620247
decomp8 6.261574 540 0.63623
decomp8 6.276295 540 0.637726

Limitations

Although the value is relative to the standards against which it is compared, the unit used to measure the times changes the score (see examples 1 and 2). This is a consequence of the requirement that the argument of the logarithmic function must be dimensionless. The multiplier also can't have a numeric value of 1 or less, because the logarithm of 1 is 0 (examples 3 and 4), and the logarithm of any value less than 1 is negative (examples 5 and 6); that would result in scores of value 0 (even with changes), undefined, or negative (even if better than positive).

Examples

# Standard compressor Scored compressor Weissman score Observations
Compression ratio Compression time Log (compression time) Compression ratio Compression time Log (compression time)
1 2.1 2 min 0.30103 3.4 3 min 0.477121 1×(3.4/2.1)×(0.30103/0.477121)=1.021506 Change in unit or scale, changes the result.
2 2.1 120 sec 2.079181 3.4 180 sec 2.255273 1×(3.4/2.1)×(2.079181/2.255273)=1.492632
3 2.2 1 min 0 3.3 1.5 min 0.176091 1×(3.3/2.2)×(0/0.176091)=0 If time is 1, its log is 0; then the score can be 0 or infinity.
4 2.2 0.667 min −0.176091 3.3 1 min 0 1×(3.3/2.2)×(−0.176091/0)=infinity
5 1.6 0.5 h −0.30103 2.9 1.1 h 0.041393 1×(2.9/1.6)×(−0.30103/0.041393)=−13.18138 If time is less than 1, its log is negative; then the score can be negative.
6 1.6 1.1 h 0.041393 1.6 0.9 h −0.045757 1×(1.6/1.6)×(0.041393/−0.045757)=−0.904627

See also

References