Expander walk sampling

From HandWiki

In the mathematical discipline of graph theory, the expander walk sampling theorem intuitively states that sampling vertices in an expander graph by doing relatively short random walk can simulate sampling the vertices independently from a uniform distribution. The earliest version of this theorem is due to (Ajtai Komlós), and the more general version is typically attributed to (Gillman 1998).

Statement

Let [math]\displaystyle{ G=(V,E) }[/math] be an n-vertex expander graph with positively weighted edges, and let [math]\displaystyle{ A\subset V }[/math]. Let [math]\displaystyle{ P }[/math] denote the stochastic matrix of the graph, and let [math]\displaystyle{ \lambda_2 }[/math] be the second largest eigenvalue of [math]\displaystyle{ P }[/math]. Let [math]\displaystyle{ y_0, y_1, \ldots, y_{k-1} }[/math] denote the vertices encountered in a [math]\displaystyle{ (k-1) }[/math]-step random walk on [math]\displaystyle{ G }[/math] starting at vertex [math]\displaystyle{ y_0 }[/math], and let [math]\displaystyle{ \pi (A):= }[/math] [math]\displaystyle{ \lim_{k\rightarrow\infty} \frac{1}{k} \sum_{i = 0}^{k-1} \mathbf{1}_A(y_i) }[/math]. Where [math]\displaystyle{ \mathbf{1}_A(y)\begin{cases} 1, & \text{if }y \in A \\ 0, & \text{otherwise }\end{cases} }[/math]

(It is well known[1] that almost all trajectories [math]\displaystyle{ y_0, y_1, \ldots, y_{k-1} }[/math] converges to some limiting point, [math]\displaystyle{ \pi (A) }[/math], as [math]\displaystyle{ k \rightarrow }[/math][math]\displaystyle{ \infty }[/math].)

The theorem states that for a weighted graph [math]\displaystyle{ G=(V,E) }[/math] and a random walk [math]\displaystyle{ y_0, y_1, \ldots, y_{k-1} }[/math] where [math]\displaystyle{ y_0 }[/math] is chosen by an initial distribution [math]\displaystyle{ \mathbf{q} }[/math], for all [math]\displaystyle{ \gamma \gt 0 }[/math], we have the following bound:

[math]\displaystyle{ \Pr\left[\bigg| \frac{1}{k} \sum_{i=0}^{k-1} \mathbf{1}_A(y_i) - \pi(A)\bigg| \geq \gamma\right] \leq C e^{-\frac{1}{20} (\gamma^2 (1-\lambda_2) k)}. }[/math]

Where [math]\displaystyle{ C }[/math] is dependent on [math]\displaystyle{ \mathbf{q}, G }[/math] and [math]\displaystyle{ A }[/math].

The theorem gives a bound for the rate of convergence to [math]\displaystyle{ \pi(A) }[/math] with respect to the length of the random walk, hence giving a more efficient method to estimate [math]\displaystyle{ \pi(A) }[/math] compared to independent sampling the vertices of [math]\displaystyle{ G }[/math].

Proof

In order to prove the theorem, we provide a few definitions followed by three lemmas.

Let [math]\displaystyle{ \it{w}_{xy} }[/math] be the weight of the edge [math]\displaystyle{ xy\in E(G) }[/math] and let [math]\displaystyle{ \it{w}_x=\sum_{y:xy\in E(G)}\it{w}_{xy}. }[/math] Denote by [math]\displaystyle{ \pi(x):=\it{w}_x/\sum_{y\in V} \it{w}_y }[/math]. Let [math]\displaystyle{ \frac{\mathbf{q}}{\sqrt\pi} }[/math] be the matrix with entries[math]\displaystyle{ \frac{\mathbf{q}(x)}{\sqrt{\pi(x)}} }[/math] , and let [math]\displaystyle{ N_{\pi,\mathbf{q}}=||\frac{\mathbf{q}}{\sqrt\pi}||_{2} }[/math].

Let [math]\displaystyle{ D=\text{diag}(1/\it{w}_i ) }[/math] and [math]\displaystyle{ M=(\it{w}_{ij}) }[/math]. Let [math]\displaystyle{ P(r)=PE_r }[/math] where [math]\displaystyle{ P }[/math] is the stochastic matrix, [math]\displaystyle{ E_r=\text{diag}(e^{r\mathbf{1}_A}) }[/math] and [math]\displaystyle{ r \ge 0 }[/math]. Then:

[math]\displaystyle{ P = \sqrt{D}S\sqrt{D^{-1}} \qquad \text{and} \qquad P(r) = \sqrt{DE_r^{-1}}S(r)\sqrt{E_rD^{-1}} }[/math]

Where [math]\displaystyle{ S:=\sqrt{D}M\sqrt{D} \text{ and } S(r) := \sqrt{DE_r}M\sqrt{DE_r} }[/math]. As [math]\displaystyle{ S }[/math] and [math]\displaystyle{ S(r) }[/math] are symmetric, they have real eigenvalues. Therefore, as the eigenvalues of [math]\displaystyle{ S(r) }[/math] and [math]\displaystyle{ P(r) }[/math] are equal, the eigenvalues of [math]\displaystyle{ P(r) }[/math] are real. Let [math]\displaystyle{ \lambda(r) }[/math] and [math]\displaystyle{ \lambda_2(r) }[/math] be the first and second largest eigenvalue of [math]\displaystyle{ P(r) }[/math] respectively.

For convenience of notation, let [math]\displaystyle{ t_k=\frac{1}{k} \sum_{i=0}^{k-1} \mathbf{1}_A(y_i) }[/math], [math]\displaystyle{ \epsilon=\lambda-\lambda_2 }[/math], [math]\displaystyle{ \epsilon_r=\lambda(r)-\lambda_2(r) }[/math], and let [math]\displaystyle{ \mathbf{1} }[/math] be the all-1 vector.

Lemma 1

[math]\displaystyle{ \Pr\left[t_k- \pi(A) \ge \gamma\right] \leq e^{-rk(\pi(A)+\gamma)+k\log\lambda(r)}(\mathbf{q}P(r)^k\mathbf{1})/\lambda(r)^k }[/math]

Proof:

By Markov's inequality,

[math]\displaystyle{ \begin{alignat}{2} \Pr\left[t_k \ge \pi(A) +\gamma\right] =\Pr[e^{rt_k}\ge e^{rk(\pi(A)+\gamma)}]\leq e^{-rk(\pi(A)+\gamma)}E_\mathbf{q}e^{rt_k} \end{alignat} }[/math]

Where [math]\displaystyle{ E_\mathbf{q} }[/math] is the expectation of [math]\displaystyle{ x_0 }[/math] chosen according to the probability distribution [math]\displaystyle{ \mathbf{q} }[/math]. As this can be interpreted by summing over all possible trajectories [math]\displaystyle{ x_0,x_1,.. .,x_k }[/math], hence:

[math]\displaystyle{ E_{\mathbf{q}}e^{rt}=\sum_{x_1,x_2,...,x_k}e^{rt}\mathbb{q}(x_0)\Pi_{i=1}^kp_{x_{i-1}x_i}=\mathbf{q}P(r)^k\mathbf{1} }[/math]

Combining the two results proves the lemma.

Lemma 2

For [math]\displaystyle{ 0\le r \le 1 }[/math],

[math]\displaystyle{ (\mathbf{q}P(r)^k\mathbf{1})/\lambda(r)^k\le (1+r)N_{\pi,\mathbf{q}} }[/math]

Proof:

As eigenvalues of [math]\displaystyle{ P(r) }[/math] and [math]\displaystyle{ S(r) }[/math] are equal,

[math]\displaystyle{ \begin{align} (\mathbf{q}P(r)^k\mathbf{1})/\lambda(r)^k&= (\mathbf{q}P\sqrt{DE_r^{-1}}S(r)^k \sqrt{D^{-1}E_r}\mathbf{1})/\lambda(r)^k\\ &\le e^{r/2}||\frac{\mathbf{q}}{\sqrt{\pi}}||_2||S(r)^k||_2||\sqrt{\pi}||_2/\lambda(r)^k\\ &\le e^{r/2}N_{\pi,\mathbf{q}}\\ &\le (1+r)N_{\pi,\mathbf{q}}\qquad \square \end{align} }[/math]

Lemma 3

If [math]\displaystyle{ r }[/math] is a real number such that [math]\displaystyle{ 0\le e^r-1\le \epsilon/4 }[/math],

[math]\displaystyle{ \log\lambda(r)\le r\pi(A)+5r^2/\epsilon }[/math]

Proof summary:

We Taylor expand [math]\displaystyle{ \log \lambda(y) }[/math] about point [math]\displaystyle{ r=z }[/math] to get:

[math]\displaystyle{ \log\lambda(r)= \log\lambda(z)+m_z(r-z)+(r-z)^2\int_0^1 (1-t)V_{z+(r-z)t}dt }[/math]

Where [math]\displaystyle{ m_x \text{ and } V_x }[/math] are first and second derivatives of [math]\displaystyle{ \log \lambda(r) }[/math] at [math]\displaystyle{ r=x }[/math]. We show that [math]\displaystyle{ m_0=\lim_{k \to \infty}t_k=\pi(A). }[/math] We then prove that (i) [math]\displaystyle{ \epsilon_r\ge 3\epsilon/4 }[/math] by matrix manipulation, and then prove (ii)[math]\displaystyle{ V_r\le 10/\epsilon }[/math] using (i) and Cauchy's estimate from complex analysis.

The results combine to show that

[math]\displaystyle{ \begin{align} \log\lambda(r)= \log\lambda(0)+m_0r+r^2\int_0^1 (1-t)V_{rt}dt \le r\pi(A)+5r^2/\epsilon \end{align} }[/math]
A line to line proof can be found in Gilman (1998)[1]

Proof of theorem

Combining lemma 2 and lemma 3, we get that

[math]\displaystyle{ \Pr[t_k-\pi(A)\ge \gamma]\le(1+r)N_{\pi,\mathbf{q}}e^{-k(r\gamma-5r^2/\epsilon)} }[/math]

Interpreting the exponent on the right hand side of the inequality as a quadratic in [math]\displaystyle{ r }[/math] and minimising the expression, we see that

[math]\displaystyle{ \Pr[t_k-\pi(A)\ge \gamma]\le(1+\gamma\epsilon/10)N_{\pi,\mathbf{q}}e^{-k\gamma^2\epsilon/20} }[/math]

A similar bound

[math]\displaystyle{ \Pr[t_k-\pi(A)\le - \gamma]\le (1+\gamma\epsilon/10)N_{\pi,\mathbf{q}}e^{-k\gamma^2\epsilon/20} }[/math]

holds, hence setting [math]\displaystyle{ C=2(1+\gamma\epsilon/10)N_{\pi,\mathbf{q}} }[/math] gives the desired result.

Uses

This theorem is useful in randomness reduction in the study of derandomization. Sampling from an expander walk is an example of a randomness-efficient sampler. Note that the number of bits used in sampling [math]\displaystyle{ k }[/math] independent samples from [math]\displaystyle{ f }[/math] is [math]\displaystyle{ k \log n }[/math], whereas if we sample from an infinite family of constant-degree expanders this costs only [math]\displaystyle{ \log n + O(k) }[/math]. Such families exist and are efficiently constructible, e.g. the Ramanujan graphs of Lubotzky-Phillips-Sarnak.

References

  1. Doob, J.L. (1953). Stochastic Processes. Theorem 6.1: Wiley. 
  • Ajtai, M.; Komlós, J.; Szemerédi, E. (1987). "Deterministic simulation in LOGSPACE". STOC '87. pp. 132–140. doi:10.1145/28395.28410. ISBN 0897912217. 
  • Gillman, D. (1998). "A Chernoff Bound for Random Walks on Expander Graphs". SIAM Journal on Computing (Society for Industrial and Applied Mathematics) 27 (4): 1203–1220. doi:10.1137/S0097539794268765. 
  • Doob, J.L. (1953), Stochastic Processes, Theorem 6.1, Wiley