Rabin fingerprint
The Rabin fingerprinting scheme is a method for implementing fingerprints using polynomials over a finite field. It was proposed by Michael O. Rabin.[1]
Scheme
Given an n-bit message m0,...,mn-1, we view it as a polynomial of degree n-1 over the finite field GF(2).
- [math]\displaystyle{ f(x) = m_0 + m_1 x + \ldots + m_{n-1} x^{n-1} }[/math]
We then pick a random irreducible polynomial [math]\displaystyle{ p(x) }[/math] of degree k over GF(2), and we define the fingerprint of the message m to be the remainder [math]\displaystyle{ r(x) }[/math] after division of [math]\displaystyle{ f(x) }[/math] by [math]\displaystyle{ p(x) }[/math] over GF(2) which can be viewed as a polynomial of degree k − 1 or as a k-bit number.
Applications
Many implementations of the Rabin–Karp algorithm internally use Rabin fingerprints.
The Low Bandwidth Network Filesystem (LBFS) from MIT uses Rabin fingerprints to implement variable size shift-resistant blocks.[2] The basic idea is that the filesystem computes the cryptographic hash of each block in a file. To save on transfers between the client and server, they compare their checksums and only transfer blocks whose checksums differ. But one problem with this scheme is that a single insertion at the beginning of the file will cause every checksum to change if fixed-sized (e.g. 4 KB) blocks are used. So the idea is to select blocks not based on a specific offset but rather by some property of the block contents. LBFS does this by sliding a 48 byte window over the file and computing the Rabin fingerprint of each window. When the low 13 bits of the fingerprint are zero LBFS calls those 48 bytes a breakpoint and ends the current block and begins a new one. Since the output of Rabin fingerprints are pseudo-random the probability of any given 48 bytes being a breakpoint is [math]\displaystyle{ 2^{-13} }[/math] (1 in 8192). This has the effect of shift-resistant variable size blocks. Any hash function could be used to divide a long file into blocks (as long as a cryptographic hash function is then used to find the checksum of each block): but the Rabin fingerprint is an efficient rolling hash, since the computation of the Rabin fingerprint of region B can reuse some of the computation of the Rabin fingerprint of region A when regions A and B overlap.
Note that this is a problem similar to that faced by rsync.[example needed]
See also
References
- ↑ Michael O. Rabin (1981). "Fingerprinting by Random Polynomials". Center for Research in Computing Technology, Harvard University. http://www.xmailserver.org/rabin.pdf. Retrieved 2007-03-22.
- ↑ Athicha Muthitacharoen, Benjie Chen, and David Mazières "A Low-bandwidth Network File System"
External links
- Andrei Z. Broder (1993). "Some applications of Rabin's fingerprinting method". pp. 143–152. http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.6172. Retrieved 2011-09-12.
- David Andersen (2007). "Exploiting Similarity for Multi-Source Downloads using File Handprints". https://www.cs.cmu.edu/~dga/papers/nsdi2007-set-abstract.html. Retrieved 2007-04-12.
- Ross N. Williams (1993). "A painless guide to CRC Error detection algorithms".
Original source: https://en.wikipedia.org/wiki/Rabin fingerprint.
Read more |