Small-bias sample space

From HandWiki

In theoretical computer science, a small-bias sample space (also known as [math]\displaystyle{ \epsilon }[/math]-biased sample space, [math]\displaystyle{ \epsilon }[/math]-biased generator, or small-bias probability space) is a probability distribution that fools parity functions. In other words, no parity function can distinguish between a small-bias sample space and the uniform distribution with high probability, and hence, small-bias sample spaces naturally give rise to pseudorandom generators for parity functions.

The main useful property of small-bias sample spaces is that they need far fewer truly random bits than the uniform distribution to fool parities. Efficient constructions of small-bias sample spaces have found many applications in computer science, some of which are derandomization, error-correcting codes, and probabilistically checkable proofs. The connection with error-correcting codes is in fact very strong since [math]\displaystyle{ \epsilon }[/math]-biased sample spaces are equivalent to [math]\displaystyle{ \epsilon }[/math]-balanced error-correcting codes.

Definition

Bias

Let [math]\displaystyle{ X }[/math] be a probability distribution over [math]\displaystyle{ \{0,1\}^n }[/math]. The bias of [math]\displaystyle{ X }[/math] with respect to a set of indices [math]\displaystyle{ I \subseteq \{1,\dots,n\} }[/math] is defined as[1]

[math]\displaystyle{ \text{bias}_I(X) = \left| \Pr_{x\sim X} \left(\sum_{i\in I} x_i = 0\right) - \Pr_{x\sim X} \left(\sum_{i\in I} x_i = 1\right) \right| = \left| 2 \cdot \Pr_{x\sim X} \left(\sum_{i\in I} x_i = 0\right) -1 \right| \,, }[/math]

where the sum is taken over [math]\displaystyle{ \mathbb F_2 }[/math], the finite field with two elements. In other words, the sum [math]\displaystyle{ \sum_{i\in I} x_i }[/math] equals [math]\displaystyle{ 0 }[/math] if the number of ones in the sample [math]\displaystyle{ x\in\{0,1\}^n }[/math] at the positions defined by [math]\displaystyle{ I }[/math] is even, and otherwise, the sum equals [math]\displaystyle{ 1 }[/math]. For [math]\displaystyle{ I=\emptyset }[/math], the empty sum is defined to be zero, and hence [math]\displaystyle{ \text{bias}_{\emptyset} (X) = 1 }[/math].

ϵ-biased sample space

A probability distribution [math]\displaystyle{ X }[/math] over [math]\displaystyle{ \{0,1\}^n }[/math] is called an [math]\displaystyle{ \epsilon }[/math]-biased sample space if [math]\displaystyle{ \text{bias}_I(X) \leq \epsilon }[/math] holds for all non-empty subsets [math]\displaystyle{ I \subseteq \{1,2,\ldots,n\} }[/math].

ϵ-biased set

An [math]\displaystyle{ \epsilon }[/math]-biased sample space [math]\displaystyle{ X }[/math] that is generated by picking a uniform element from a multiset [math]\displaystyle{ X\subseteq \{0,1\}^n }[/math] is called [math]\displaystyle{ \epsilon }[/math]-biased set. The size [math]\displaystyle{ s }[/math] of an [math]\displaystyle{ \epsilon }[/math]-biased set [math]\displaystyle{ X }[/math] is the size of the multiset that generates the sample space.

ϵ-biased generator

An [math]\displaystyle{ \epsilon }[/math]-biased generator [math]\displaystyle{ G:\{0,1\}^\ell \to \{0,1\}^n }[/math] is a function that maps strings of length [math]\displaystyle{ \ell }[/math] to strings of length [math]\displaystyle{ n }[/math] such that the multiset [math]\displaystyle{ X_G=\{G(y) \;\vert\; y\in\{0,1\}^\ell \} }[/math] is an [math]\displaystyle{ \epsilon }[/math]-biased set. The seed length of the generator is the number [math]\displaystyle{ \ell }[/math] and is related to the size of the [math]\displaystyle{ \epsilon }[/math]-biased set [math]\displaystyle{ X_G }[/math] via the equation [math]\displaystyle{ s=2^\ell }[/math].

Connection with epsilon-balanced error-correcting codes

There is a close connection between [math]\displaystyle{ \epsilon }[/math]-biased sets and [math]\displaystyle{ \epsilon }[/math]-balanced linear error-correcting codes. A linear code [math]\displaystyle{ C:\{0,1\}^n\to\{0,1\}^s }[/math] of message length [math]\displaystyle{ n }[/math] and block length [math]\displaystyle{ s }[/math] is [math]\displaystyle{ \epsilon }[/math]-balanced if the Hamming weight of every nonzero codeword [math]\displaystyle{ C(x) }[/math] is between [math]\displaystyle{ (\frac{1}{2}-\epsilon)s }[/math] and [math]\displaystyle{ (\frac{1}{2}+\epsilon)s }[/math]. Since [math]\displaystyle{ C }[/math] is a linear code, its generator matrix is an [math]\displaystyle{ (n\times s) }[/math]-matrix [math]\displaystyle{ A }[/math] over [math]\displaystyle{ \mathbb F_2 }[/math] with [math]\displaystyle{ C(x)=x \cdot A }[/math].

Then it holds that a multiset [math]\displaystyle{ X\subset\{0,1\}^{n} }[/math] is [math]\displaystyle{ \epsilon }[/math]-biased if and only if the linear code [math]\displaystyle{ C_X }[/math], whose columns are exactly elements of [math]\displaystyle{ X }[/math], is [math]\displaystyle{ \epsilon }[/math]-balanced.[2]

Constructions of small epsilon-biased sets

Usually the goal is to find [math]\displaystyle{ \epsilon }[/math]-biased sets that have a small size [math]\displaystyle{ s }[/math] relative to the parameters [math]\displaystyle{ n }[/math] and [math]\displaystyle{ \epsilon }[/math]. This is because a smaller size [math]\displaystyle{ s }[/math] means that the amount of randomness needed to pick a random element from the set is smaller, and so the set can be used to fool parities using few random bits.

Theoretical bounds

The probabilistic method gives a non-explicit construction that achieves size [math]\displaystyle{ s=O(n/\epsilon^2) }[/math].[2] The construction is non-explicit in the sense that finding the [math]\displaystyle{ \epsilon }[/math]-biased set requires a lot of true randomness, which does not help towards the goal of reducing the overall randomness. However, this non-explicit construction is useful because it shows that these efficient codes exist. On the other hand, the best known lower bound for the size of [math]\displaystyle{ \epsilon }[/math]-biased sets is [math]\displaystyle{ s=\Omega(n/ (\epsilon^2 \log (1/\epsilon)) }[/math], that is, in order for a set to be [math]\displaystyle{ \epsilon }[/math]-biased, it must be at least that big.[2]

Explicit constructions

There are many explicit, i.e., deterministic constructions of [math]\displaystyle{ \epsilon }[/math]-biased sets with various parameter settings:

  • (Naor Naor) achieve [math]\displaystyle{ \displaystyle s= \frac{n}{\text{poly}(\epsilon)} }[/math]. The construction makes use of Justesen codes (which is a concatenation of Reed–Solomon codes with the Wozencraft ensemble) as well as expander walk sampling.
  • (Alon Goldreich) achieve [math]\displaystyle{ \displaystyle s= O\left(\frac{n}{\epsilon \log (n/\epsilon)}\right)^2 }[/math]. One of their constructions is the concatenation of Reed–Solomon codes with the Hadamard code; this concatenation turns out to be an [math]\displaystyle{ \epsilon }[/math]-balanced code, which gives rise to an [math]\displaystyle{ \epsilon }[/math]-biased sample space via the connection mentioned above.
  • Concatenating Algebraic geometric codes with the Hadamard code gives an [math]\displaystyle{ \epsilon }[/math]-balanced code with [math]\displaystyle{ \displaystyle s= O\left(\frac{n}{\epsilon^3 \log (1/\epsilon)}\right) }[/math].[2]
  • (Ben-Aroya Ta-Shma) achieves [math]\displaystyle{ \displaystyle s= O\left(\frac{n}{\epsilon^2 \log (1/\epsilon)}\right)^{5/4} }[/math].
  • (Ta-Shma 2017) achieves [math]\displaystyle{ \displaystyle s= O\left(\frac{n}{\epsilon^{2+o(1)}}\right) }[/math] which is almost optimal because of the lower bound.

These bounds are mutually incomparable. In particular, none of these constructions yields the smallest [math]\displaystyle{ \epsilon }[/math]-biased sets for all settings of [math]\displaystyle{ \epsilon }[/math] and [math]\displaystyle{ n }[/math].

Application: almost k-wise independence

An important application of small-bias sets lies in the construction of almost k-wise independent sample spaces.

k-wise independent spaces

A random variable [math]\displaystyle{ Y }[/math] over [math]\displaystyle{ \{0,1\}^n }[/math] is a k-wise independent space if, for all index sets [math]\displaystyle{ I\subseteq\{1,\dots,n\} }[/math] of size [math]\displaystyle{ k }[/math], the marginal distribution [math]\displaystyle{ Y|_I }[/math] is exactly equal to the uniform distribution over [math]\displaystyle{ \{0,1\}^k }[/math]. That is, for all such [math]\displaystyle{ I }[/math] and all strings [math]\displaystyle{ z\in\{0,1\}^k }[/math], the distribution [math]\displaystyle{ Y }[/math] satisfies [math]\displaystyle{ \Pr_Y (Y|_I = z) = 2^{-k} }[/math].

Constructions and bounds

k-wise independent spaces are fairly well understood.

  • A simple construction by (Joffe 1974) achieves size [math]\displaystyle{ n^k }[/math].
  • (Alon Babai) construct a k-wise independent space whose size is [math]\displaystyle{ n^{k/2} }[/math].
  • (Chor Goldreich) prove that no k-wise independent space can be significantly smaller than [math]\displaystyle{ n^{k/2} }[/math].

Joffe's construction

(Joffe 1974) constructs a [math]\displaystyle{ k }[/math]-wise independent space [math]\displaystyle{ Y }[/math] over the finite field with some prime number [math]\displaystyle{ n\gt k }[/math] of elements, i.e., [math]\displaystyle{ Y }[/math] is a distribution over [math]\displaystyle{ \mathbb F_n^n }[/math]. The initial [math]\displaystyle{ k }[/math] marginals of the distribution are drawn independently and uniformly at random:

[math]\displaystyle{ (Y_0,\dots,Y_{k-1}) \sim\mathbb F_n^k }[/math].

For each [math]\displaystyle{ i }[/math] with [math]\displaystyle{ k \leq i \lt n }[/math], the marginal distribution of [math]\displaystyle{ Y_i }[/math] is then defined as

[math]\displaystyle{ Y_i=Y_0 + Y_1 \cdot i + Y_2 \cdot i^2 + \dots + Y_{k-1} \cdot i^{k-1}\,, }[/math]

where the calculation is done in [math]\displaystyle{ \mathbb F_n }[/math]. (Joffe 1974) proves that the distribution [math]\displaystyle{ Y }[/math] constructed in this way is [math]\displaystyle{ k }[/math]-wise independent as a distribution over [math]\displaystyle{ \mathbb F_n^n }[/math]. The distribution [math]\displaystyle{ Y }[/math] is uniform on its support, and hence, the support of [math]\displaystyle{ Y }[/math] forms a [math]\displaystyle{ k }[/math]-wise independent set. It contains all [math]\displaystyle{ n^k }[/math] strings in [math]\displaystyle{ \mathbb F_n^k }[/math] that have been extended to strings of length [math]\displaystyle{ n }[/math] using the deterministic rule above.

Almost k-wise independent spaces

A random variable [math]\displaystyle{ Y }[/math] over [math]\displaystyle{ \{0,1\}^n }[/math] is a [math]\displaystyle{ \delta }[/math]-almost k-wise independent space if, for all index sets [math]\displaystyle{ I\subseteq\{1,\dots,n\} }[/math] of size [math]\displaystyle{ k }[/math], the restricted distribution [math]\displaystyle{ Y|_I }[/math] and the uniform distribution [math]\displaystyle{ U_k }[/math] on [math]\displaystyle{ \{0,1\}^k }[/math] are [math]\displaystyle{ \delta }[/math]-close in 1-norm, i.e., [math]\displaystyle{ \Big\|Y|_I - U_k\Big\|_1 \leq \delta }[/math].

Constructions

(Naor Naor) give a general framework for combining small k-wise independent spaces with small [math]\displaystyle{ \epsilon }[/math]-biased spaces to obtain [math]\displaystyle{ \delta }[/math]-almost k-wise independent spaces of even smaller size. In particular, let [math]\displaystyle{ G_1:\{0,1\}^h\to\{0,1\}^n }[/math] be a linear mapping that generates a k-wise independent space and let [math]\displaystyle{ G_2:\{0,1\}^\ell \to \{0,1\}^h }[/math] be a generator of an [math]\displaystyle{ \epsilon }[/math]-biased set over [math]\displaystyle{ \{0,1\}^h }[/math]. That is, when given a uniformly random input, the output of [math]\displaystyle{ G_1 }[/math] is a k-wise independent space, and the output of [math]\displaystyle{ G_2 }[/math] is [math]\displaystyle{ \epsilon }[/math]-biased. Then [math]\displaystyle{ G : \{0,1\}^\ell \to \{0,1\}^n }[/math] with [math]\displaystyle{ G(x) = G_1(G_2(x)) }[/math] is a generator of an [math]\displaystyle{ \delta }[/math]-almost [math]\displaystyle{ k }[/math]-wise independent space, where [math]\displaystyle{ \delta=2^{k/2} \epsilon }[/math].[3]

As mentioned above, (Alon Babai) construct a generator [math]\displaystyle{ G_1 }[/math] with [math]\displaystyle{ h=\tfrac{k}{2} \log n }[/math], and (Naor Naor) construct a generator [math]\displaystyle{ G_2 }[/math] with [math]\displaystyle{ \ell=\log s=\log h + O(\log(\epsilon^{-1})) }[/math]. Hence, the concatenation [math]\displaystyle{ G }[/math] of [math]\displaystyle{ G_1 }[/math] and [math]\displaystyle{ G_2 }[/math] has seed length [math]\displaystyle{ \ell = \log k + \log \log n + O(\log(\epsilon^{-1})) }[/math]. In order for [math]\displaystyle{ G }[/math] to yield a [math]\displaystyle{ \delta }[/math]-almost k-wise independent space, we need to set [math]\displaystyle{ \epsilon = \delta 2^{-k/2} }[/math], which leads to a seed length of [math]\displaystyle{ \ell = \log \log n + O(k+\log(\delta^{-1})) }[/math] and a sample space of total size [math]\displaystyle{ 2^\ell \leq \log n \cdot \text{poly}(2^k \cdot\delta^{-1}) }[/math].

Notes

  1. cf., e.g., (Goldreich 2001)
  2. 2.0 2.1 2.2 2.3 cf., e.g., p. 2 of (Ben-Aroya Ta-Shma)
  3. Section 4 in (Naor Naor)

References