Non-malleable codes

From HandWiki

The notion of non-malleable codes was introduced in 2010 by Dziembowski, Pietrzak, and Wichs,[1] for relaxing the notion of error-correction and error-detection. Informally, a code is non-malleable if the message contained in a modified code-word is either the original message, or a completely unrelated value. Non-malleable codes provide a useful and meaningful security guarantee in situations where traditional error-correction and error-detection is impossible; for example, when the attacker can completely overwrite the encoded message. Although such codes do not exist if the family of "tampering functions" F is completely unrestricted, they are known to exist for many broad tampering families F.

Background Knowledge

Tampering Experiment

To know the operation schema of non-malleable code, we have to have a knowledge of the basic experiment it based on. The following is the three step method of tampering experiment.

  1. A source message [math]\displaystyle{ s }[/math] is encoded via a (possibly randomized) procedure [math]\displaystyle{ Enc }[/math], yielding a code-word [math]\displaystyle{ c }[/math] = [math]\displaystyle{ Enc(s) }[/math].
  2. The code-word is modified under some tampering-function [math]\displaystyle{ f\in F }[/math] to an erroneous-code-word [math]\displaystyle{ c^* }[/math]=[math]\displaystyle{ f(c) }[/math].
  3. The erroneous-code-word [math]\displaystyle{ c^* }[/math] is decoded using a procedure [math]\displaystyle{ Dec }[/math], resulting in a decoded-message [math]\displaystyle{ s^* }[/math]= [math]\displaystyle{ Dec(c^*) }[/math].

The tampering experiment can be used to model several interesting real-world settings, such as data transmitted over a noisy channel, or adversarial tampering of data stored in the memory of a physical device. Having this experimental base, we would like to build special encoding/decoding procedures [math]\displaystyle{ (Enc,Dec) }[/math], which give us some meaningful guarantees about the results of the above tampering experiment, for large and interesting families [math]\displaystyle{ F }[/math] of tampering functions. The following are several possibilities for the type of guarantees that we may hope for.[2]

Error correction

One very natural guarantee, called error-correction, would be to require that for any tampering function and any source-message s, the tampering experiment always produces the correct decoded message [math]\displaystyle{ s^* = s }[/math].[3]

Error detection

A weaker guarantee, called error-detection, requires that the tampering-experiment always results in either the correct value [math]\displaystyle{ s^* = s }[/math] or a special symbol [math]\displaystyle{ s^* = \perp }[/math] indicating that tampering has been detected. This notion of error-detection is a weaker guarantee than error-correction, and achievable for larger F of tampering functions.

Algorithm description

A non-malleable code ensures that either the tampering experiment results in a correct decoded-message [math]\displaystyle{ s^* = s }[/math], or the decoded-message [math]\displaystyle{ s^* }[/math] is completely independent of and unrelated to the source-message [math]\displaystyle{ s }[/math]. In other word, the notion of non-malleability for codes is similar, in spirit, to notions of non-malleability for cryptographic primitives (such as encryption2, commitments and zero-knowledge proofs), introduced by the seminal work of Dolev, Dwork and Naor.[4]

Compared to error correction or error detection, the "right" formalization of non-malleable codes is somewhat harder to define. Let [math]\displaystyle{ Tamper^f_s }[/math] be a random variable for the value of the decoded-message, which results when we run the tampering experiment with source-message [math]\displaystyle{ s }[/math] and tampering-function [math]\displaystyle{ f }[/math], over the randomness of the encoding procedure. Intuitively, we wish to say that the distribution of [math]\displaystyle{ Tamper^f_s }[/math] is independent of the encoded message [math]\displaystyle{ s }[/math]. Of course, we also want to allow for the case where the tampering experiment results in [math]\displaystyle{ s^* = s }[/math] (for example, if the tampering function is identity), which clearly depends on [math]\displaystyle{ s }[/math].

Thus, we require that for every tampering-function [math]\displaystyle{ f\in F }[/math], there exists a distribution [math]\displaystyle{ D_f }[/math] which outputs either concrete values [math]\displaystyle{ s^* }[/math] or a special same [math]\displaystyle{ * }[/math] symbol, and faithfully models the distribution of [math]\displaystyle{ Tamper^f_s }[/math] for all [math]\displaystyle{ s }[/math] in the following sense: for every source message [math]\displaystyle{ s }[/math], the distributions of [math]\displaystyle{ Tamper^f_s }[/math] and [math]\displaystyle{ D_f }[/math] are statistically close when the [math]\displaystyle{ * }[/math] symbol is interpreted as [math]\displaystyle{ s }[/math]. That is, [math]\displaystyle{ D_f }[/math] correctly simulates the "outcome" of the tampering-experiment with a function [math]\displaystyle{ f\in F }[/math] without knowing the source-messages [math]\displaystyle{ s }[/math], but it is allowed some ambiguity by outputting a same [math]\displaystyle{ * }[/math] symbol to indicate that the decoded-message should be the same as the source-message, without specifying what the exact value is. The fact that [math]\displaystyle{ D_f }[/math] depends on only [math]\displaystyle{ f }[/math] and not on [math]\displaystyle{ s }[/math], shows that the outcome of [math]\displaystyle{ Tamper^f_s }[/math] is independent of [math]\displaystyle{ s }[/math], exempting equality.

Relation to error correction/detection

Notice that non-malleability is a weaker guarantee than error correction/detection; the latter ensure that any change in the code-word can be corrected or at least detected by the decoding procedure, whereas the former does allow the message to be modified, but only to an unrelated value. However, when studying error correction/detection we usually restrict ourselves to limited forms of tampering which preserve some notion of distance (e.g., usually hamming distance) between the original and tampered code-word. For example, it is already impossible to achieve error correction/detection for the simple family of functions [math]\displaystyle{ F_{const} }[/math] which, for every constant [math]\displaystyle{ c^* }[/math], includes a "constant" function [math]\displaystyle{ f_{c^*} }[/math] that maps all inputs to [math]\displaystyle{ c^* }[/math]. There is always some function in [math]\displaystyle{ F_{const} }[/math] that maps everything to a valid code-word [math]\displaystyle{ c^* }[/math]. In contrast, it is trivial to construct codes that are non-malleable w.r.t [math]\displaystyle{ F_{const} }[/math], as the output of a constant function is clearly independent of its input. The prior works on non-malleable codes show that one can construct non-malleable codes for highly complex tampering function families [math]\displaystyle{ F }[/math] for which error correction/detection can not be achievable.[5]

Application over tampering functions

Bit-wise independent tampering

As one very concrete example, we study non-malleability with respect to the family of functions [math]\displaystyle{ f }[/math] which specify, for each bit of the code-word [math]\displaystyle{ c }[/math], whether to keep it as is, flip it, set it to 0, set it to 1. That is, each bit of the code-word is modified arbitrarily but independently of the value of the other bits of the code-word. We call this the “bit-wise independent tampering” family [math]\displaystyle{ F_{BIT} }[/math]. Note that this family contains constant functions [math]\displaystyle{ F_{const} }[/math] and constant-error functions [math]\displaystyle{ F_{err} }[/math] as subsets. Therefore, as we have mentioned, error-correction and error-detection cannot be achieved w.r.t. this family. Nevertheless, the following can show an efficient non-malleable code for this powerful family.

With [math]\displaystyle{ F_{BIT} }[/math] we denote the family which contains all tampering functions that tamper every bit independently. Formally, this family contains all functions [math]\displaystyle{ f_i: \left\{{0},{1}\right\}^n \to \left\{{0},{1}\right\}^n }[/math] that are defined by n functions[math]\displaystyle{ f_i: \left\{{0},{1}\right\} \to \left\{{0},{1}\right\} }[/math] (for i=1...n) as [math]\displaystyle{ f(c_1..c_n)=f_1(c_1)..f_n(c_n) }[/math]. Note that there are only 4 possible choices for each [math]\displaystyle{ f_i }[/math] (i.e. how to modify a particular bit) and we name these “set to 0”, “set to 1”, “flip”, “keep” where the meanings should be intuitive. We call the above family the bit-wise independent tampering family.

All families of bounded size

  • Probabilistic Method Approach

For any "small enough" function family [math]\displaystyle{ F }[/math], there exists a (possibly inefficient) coding scheme which is non-malleable w.r.t. F. Moreover, for a fixed "small enough" function family [math]\displaystyle{ F }[/math], a random coding scheme is likely to be non-malleable w.r.t. F with overwhelming probability. Unfortunately, random coding schemes cannot be efficiently represented, nor is the encoding/decoding function likely to be efficient. Therefore, this result should merely be thought of as showing "possibility" and providing a target that we should then strive to match constructively. Moreover, this result also highlights the difference between "error-correction/detection" and "non-malleability" since a result of this form could not be true for the former notions.

It is not clear what the bound from the theorem[4] of this type actually implies. For example, it does tell us that non-malleable codes exist with respect to all efficient functions, but this is misleading as we know that efficient non-malleable codes (and ultimately we are only interested in such) cannot be non-malleable w.r.t. this class. Nevertheless, the result by the probabilistic method does give us codes which are non-malleable w.r.t. very general classes of functions in the random oracle model.

Model of tamper-resilient security

In this model, we consider two ways of interacting with the system:

Execute([math]\displaystyle{ x }[/math]): A user can provide the system with Execute(x) queries, for [math]\displaystyle{ x\in \left\{{0},{1}\right\}^u }[/math], in which case the system computes [math]\displaystyle{ (y,s^') \gets G(x,s) }[/math], updates the state of the system to [math]\displaystyle{ s :=s^' }[/math] and outputs [math]\displaystyle{ y }[/math].

Tamper([math]\displaystyle{ f }[/math]): We also consider tampering attacks against the system, modeled by Tamper([math]\displaystyle{ f }[/math]) commands, for functions [math]\displaystyle{ f: \left\{{0},{1}\right\}^n \to \left\{{0},{1}\right\}^n }[/math]. Upon receiving such command, the system state is set to [math]\displaystyle{ s := f(s) }[/math].

An attacker that can also interact with the system via Tamper queries can potentially learn significantly more about the secret state, even recover it entirely. Therefore, we would like to have a general method for securing systems against tampering attacks, so that the ability to issue Tamper queries (at least for functions f in some large family [math]\displaystyle{ F }[/math]) cannot provide the attacker with additional information. By using non-malleable code for this purpose we have the conclusion: Let [math]\displaystyle{ (Enc,Dec) }[/math] be any coding scheme which is non-malleable w.r.t [math]\displaystyle{ F }[/math], then [math]\displaystyle{ (Enc,Dec) }[/math] can also be tamper-simulate w.r.t. [math]\displaystyle{ F }[/math].

Capacity of non-malleable codes

  1. For every family [math]\displaystyle{ F }[/math] with [math]\displaystyle{ |F|\leq 2^{2^{\alpha n}} }[/math], there exist non-malleable codes against [math]\displaystyle{ F }[/math] with rate arbitrarily close to 1 − [math]\displaystyle{ \alpha }[/math] (this is achieved w.h.p. by a randomized construction).[6]
  2. For families of size [math]\displaystyle{ exp( n^{O(1)}2^{\alpha n}) }[/math] against which there is no non-malleable code of rate 1 − [math]\displaystyle{ \alpha }[/math] (in fact this is the case w.h.p for a random family of this size).
  3. 1 − [math]\displaystyle{ \alpha }[/math] is the best achievable rate for the family of functions which are only allowed to tamper the first [math]\displaystyle{ \alpha n }[/math] bits of the code-word, which is of special interest.

References

  1. Dziembowski, Stefan; Pietrzak, Krzysztof; Wichs, Daniel (2009-01-01). Non-Malleable Codes. https://eprint.iacr.org/2009/608. 
  2. Faust, Sebastian; Mukherjee, Pratyay; Venturi, Daniele; Wichs, Daniel. "Efficient Non-Malleable Codes and Key-Derivation for Poly-Size Tampering Circuits". Advances in Cryptology – EUROCRYPT 2014 8441: 111–128. http://infoscience.epfl.ch/record/198993/files/Efficient_Non-Malleable_Codes.pdf. 
  3. E. Shannon, Claude (1949). "Communication theory of secrecy systems". Bell System Technical Journal. 
  4. 4.0 4.1 Dolev, Danny; Dwork, Cynthia; Moni, Naor (Mar 24, 2000). "Non-Malleable Cryptography". SIAM Journal on Computing: http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=9A853A59C3A45DD1B67690F10232D635?doi=10.1.1.26.8267&rep=rep1&type=pdf.+doi:10.1137/s0097539795291562. 
  5. Dziembowski, Stefan; Pietrzak, Krzysztof; Wichs, Daniel. "Non-Malleable Codes". ICS, 2010. http://eprint.iacr.org/2009/608.pdf.. 
  6. Cheraghchi, Mahdi; Guruswami, Venkatesan (2013-09-02). "Capacity of Non-Malleable Codes". arXiv:1309.0458.