Earth:MAC address anonymization

From HandWiki
Short description: MAC address obfuscation methods, generally for public reporting

MAC address anonymization performs a one-way function on a MAC address so that the result may be used in tracking systems for reporting and the general public, while making it nearly impossible to obtain the original MAC address from the result. The idea is that this process allows companies like Google,[1] Apple[2] and iInside[3] - which track users movements via computer hardware to simultaneously preserve the identities of the people they are tracking, as well as the hardware itself.

Examples

An example of MAC address anonymization would be to use a simple hash algorithm. Given an address of 11:22:33:44:55:66, the MD5 hash algorithm produces eb341820cd3a3485461a61b1e97d31b1 (32 hexadecimal digits).[4]

An address only one character different (11:22:33:44:55:67) produces 391907146439938c9821856fa181052e,[5] an entirely different hash due to the avalanche effect.

Why this does not work in practice

Tracking companies rely on the assumption that address anonymization is akin to encryption. Given a message, and an encryption method that is well known to both the encoder and potential decryptor, modern encryption methods (such as Advanced Encryption Standard (AES) or RSA) will yield a result that is unbreakable in practice.

The problem lies in the fact that there are only 248 (281,474,976,710,656) possible MAC addresses. Given the encoding algorithm, an index can easily be created for each possible address. By using rainbow table compression, the index can be made small enough to be portable. Building the index is an embarrassingly parallel problem, and so the work can be accelerated greatly e.g. by renting a large amount of cloud computing resources temporarily.

For example, if a single CPU can compute 1,000,000 encrypted MACs per second, then generating the full table takes 8.9 CPU-years. With a fleet of 1,000 CPUs, this would only take around 78 hours. Using a rainbow table with a "depth" of 1,000,000 hashes per entry, the resulting table would only contain a few hundred million entries (a few GB) and require 0.5 seconds (on average, ignoring I/O time) to reverse any encrypted MAC into its original form.

One approach to mitigate this attack would be to use a deliberately slow one-way function for MAC addresses, such as a slow Key derivation function (KDF). For instance, if the KDF were tuned to require 0.1 seconds per MAC address anonymization operation (on a typical consumer CPU), generating a rainbow table would require 892,000 CPU-years.

Truncating

Where data protection law requires anonymization, the method used should exclude any possibility of the original MAC address to be identified. Some companies truncate IPv4 addresses by removing the final octet, thus in effect retaining information about the user's ISP or subnet, but not directly identifying the individual. The activity could then originate from any of 254 IP addresses. This may not always be enough to guarantee anonymization.[6]

References