Arbitrarily varying channel

From HandWiki
Short description: Communication channel with unknown parameters that can change over time

An arbitrarily varying channel (AVC) is a communication channel model used in coding theory, and was first introduced by Blackwell, Breiman, and Thomasian. This particular channel has unknown parameters that can change over time and these changes may not have a uniform pattern during the transmission of a codeword. [math]\displaystyle{ \textstyle n }[/math] uses of this channel can be described using a stochastic matrix [math]\displaystyle{ \textstyle W^n: X^n \times }[/math] [math]\displaystyle{ \textstyle S^n \rightarrow Y^n }[/math], where [math]\displaystyle{ \textstyle X }[/math] is the input alphabet, [math]\displaystyle{ \textstyle Y }[/math] is the output alphabet, and [math]\displaystyle{ \textstyle W^n (y | x, s) }[/math] is the probability over a given set of states [math]\displaystyle{ \textstyle S }[/math], that the transmitted input [math]\displaystyle{ \textstyle x = (x_1, \ldots, x_n) }[/math] leads to the received output [math]\displaystyle{ \textstyle y = (y_1, \ldots, y_n) }[/math]. The state [math]\displaystyle{ \textstyle s_i }[/math] in set [math]\displaystyle{ \textstyle S }[/math] can vary arbitrarily at each time unit [math]\displaystyle{ \textstyle i }[/math]. This channel was developed as an alternative to Shannon's Binary Symmetric Channel (BSC), where the entire nature of the channel is known, to be more realistic to actual network channel situations.

Capacities and associated proofs

Capacity of deterministic AVCs

An AVC's capacity can vary depending on the certain parameters.

[math]\displaystyle{ \textstyle R }[/math] is an achievable rate for a deterministic AVC code if it is larger than [math]\displaystyle{ \textstyle 0 }[/math], and if for every positive [math]\displaystyle{ \textstyle \varepsilon }[/math] and [math]\displaystyle{ \textstyle \delta }[/math], and very large [math]\displaystyle{ \textstyle n }[/math], length-[math]\displaystyle{ \textstyle n }[/math] block codes exist that satisfy the following equations: [math]\displaystyle{ \textstyle \frac{1}{n}\log N \gt R - \delta }[/math] and [math]\displaystyle{ \displaystyle \max_{s \in S^n} \bar{e}(s) \leq \varepsilon }[/math], where [math]\displaystyle{ \textstyle N }[/math] is the highest value in [math]\displaystyle{ \textstyle Y }[/math] and where [math]\displaystyle{ \textstyle \bar{e}(s) }[/math] is the average probability of error for a state sequence [math]\displaystyle{ \textstyle s }[/math]. The largest rate [math]\displaystyle{ \textstyle R }[/math] represents the capacity of the AVC, denoted by [math]\displaystyle{ \textstyle c }[/math].

As you can see, the only useful situations are when the capacity of the AVC is greater than [math]\displaystyle{ \textstyle 0 }[/math], because then the channel can transmit a guaranteed amount of data [math]\displaystyle{ \textstyle \leq c }[/math] without errors. So we start out with a theorem that shows when [math]\displaystyle{ \textstyle c }[/math] is positive in an AVC and the theorems discussed afterward will narrow down the range of [math]\displaystyle{ \textstyle c }[/math] for different circumstances.

Before stating Theorem 1, a few definitions need to be addressed:

  • An AVC is symmetric if [math]\displaystyle{ \displaystyle \sum_{s \in S}W(y|x, s)U(s|x') = \sum_{s \in S}W(y|x', s)U(s|x) }[/math] for every [math]\displaystyle{ \textstyle (x, x', y,s) }[/math], where [math]\displaystyle{ \textstyle x,x'\in X }[/math], [math]\displaystyle{ \textstyle y \in Y }[/math], and [math]\displaystyle{ \textstyle U(s|x) }[/math] is a channel function [math]\displaystyle{ \textstyle U: X \rightarrow S }[/math].
  • [math]\displaystyle{ \textstyle X_r }[/math], [math]\displaystyle{ \textstyle S_r }[/math], and [math]\displaystyle{ \textstyle Y_r }[/math] are all random variables in sets [math]\displaystyle{ \textstyle X }[/math], [math]\displaystyle{ \textstyle S }[/math], and [math]\displaystyle{ \textstyle Y }[/math] respectively.
  • [math]\displaystyle{ \textstyle P_{X_r}(x) }[/math] is equal to the probability that the random variable [math]\displaystyle{ \textstyle X_r }[/math] is equal to [math]\displaystyle{ \textstyle x }[/math].
  • [math]\displaystyle{ \textstyle P_{S_r}(s) }[/math] is equal to the probability that the random variable [math]\displaystyle{ \textstyle S_r }[/math] is equal to [math]\displaystyle{ \textstyle s }[/math].
  • [math]\displaystyle{ \textstyle P_{X_{r}S_{r}Y_{r}} }[/math] is the combined probability mass function (pmf) of [math]\displaystyle{ \textstyle P_{X_r}(x) }[/math], [math]\displaystyle{ \textstyle P_{S_r}(s) }[/math], and [math]\displaystyle{ \textstyle W(y|x,s) }[/math]. [math]\displaystyle{ \textstyle P_{X_{r}S_{r}Y_{r}} }[/math] is defined formally as [math]\displaystyle{ \textstyle P_{X_{r}S_{r}Y_{r}}(x,s,y) = P_{X_r}(x)P_{S_r}(s)W(y|x,s) }[/math].
  • [math]\displaystyle{ \textstyle H(X_r) }[/math] is the entropy of [math]\displaystyle{ \textstyle X_r }[/math].
  • [math]\displaystyle{ \textstyle H(X_r|Y_r) }[/math] is equal to the average probability that [math]\displaystyle{ \textstyle X_r }[/math] will be a certain value based on all the values [math]\displaystyle{ \textstyle Y_r }[/math] could possibly be equal to.
  • [math]\displaystyle{ \textstyle I(X_r \land Y_r) }[/math] is the mutual information of [math]\displaystyle{ \textstyle X_r }[/math] and [math]\displaystyle{ \textstyle Y_r }[/math], and is equal to [math]\displaystyle{ \textstyle H(X_r) - H(X_r|Y_r) }[/math].
  • [math]\displaystyle{ \displaystyle I(P) = \min_{Y_r} I(X_r \land Y_r) }[/math], where the minimum is over all random variables [math]\displaystyle{ \textstyle Y_r }[/math] such that [math]\displaystyle{ \textstyle X_r }[/math], [math]\displaystyle{ \textstyle S_r }[/math], and [math]\displaystyle{ \textstyle Y_r }[/math] are distributed in the form of [math]\displaystyle{ \textstyle P_{X_{r}S_{r}Y_{r}} }[/math].

Theorem 1: [math]\displaystyle{ \textstyle c \gt 0 }[/math] if and only if the AVC is not symmetric. If [math]\displaystyle{ \textstyle c \gt 0 }[/math], then [math]\displaystyle{ \displaystyle c = \max_P I(P) }[/math].

Proof of 1st part for symmetry: If we can prove that [math]\displaystyle{ \textstyle I(P) }[/math] is positive when the AVC is not symmetric, and then prove that [math]\displaystyle{ \textstyle c = \max_P I(P) }[/math], we will be able to prove Theorem 1. Assume [math]\displaystyle{ \textstyle I(P) }[/math] were equal to [math]\displaystyle{ \textstyle 0 }[/math]. From the definition of [math]\displaystyle{ \textstyle I(P) }[/math], this would make [math]\displaystyle{ \textstyle X_r }[/math] and [math]\displaystyle{ \textstyle Y_r }[/math] independent random variables, for some [math]\displaystyle{ \textstyle S_r }[/math], because this would mean that neither random variable's entropy would rely on the other random variable's value. By using equation [math]\displaystyle{ \textstyle P_{X_{r}S_{r}Y_{r}} }[/math], (and remembering [math]\displaystyle{ \textstyle P_{X_r} = P }[/math],) we can get,

[math]\displaystyle{ \displaystyle P_{Y_r}(y) = \sum_{x\in X} \sum_{s\in S} P(x)P_{S_r}(s)W(y|x,s) }[/math]
[math]\displaystyle{ \textstyle \equiv ( }[/math]since [math]\displaystyle{ \textstyle X_r }[/math] and [math]\displaystyle{ \textstyle Y_r }[/math] are independent random variables, [math]\displaystyle{ \textstyle W(y|x, s) = W'(y|s) }[/math] for some [math]\displaystyle{ \textstyle W') }[/math]
[math]\displaystyle{ \displaystyle P_{Y_r}(y) = \sum_{x\in X} \sum_{s\in S} P(x)P_{S_r}(s)W'(y|s) }[/math]
[math]\displaystyle{ \textstyle \equiv ( }[/math]because only [math]\displaystyle{ \textstyle P(x) }[/math] depends on [math]\displaystyle{ \textstyle x }[/math] now[math]\displaystyle{ \textstyle ) }[/math]
[math]\displaystyle{ \displaystyle P_{Y_r}(y) = \sum_{s\in S} P_{S_r}(s)W'(y|s) \left[\sum_{x\in X} P(x)\right] }[/math]
[math]\displaystyle{ \textstyle \equiv ( }[/math]because [math]\displaystyle{ \displaystyle \sum_{x\in X} P(x) = 1) }[/math]
[math]\displaystyle{ \displaystyle P_{Y_r}(y) = \sum_{s\in S} P_{S_r}(s)W'(y|s) }[/math]

So now we have a probability distribution on [math]\displaystyle{ \textstyle Y_r }[/math] that is independent of [math]\displaystyle{ \textstyle X_r }[/math]. So now the definition of a symmetric AVC can be rewritten as follows: [math]\displaystyle{ \displaystyle \sum_{s \in S}W'(y|s)P_{S_r}(s) = \sum_{s \in S}W'(y|s)P_{S_r}(s) }[/math] since [math]\displaystyle{ \textstyle U(s|x) }[/math] and [math]\displaystyle{ \textstyle W(y|x, s) }[/math] are both functions based on [math]\displaystyle{ \textstyle x }[/math], they have been replaced with functions based on [math]\displaystyle{ \textstyle s }[/math] and [math]\displaystyle{ \textstyle y }[/math] only. As you can see, both sides are now equal to the [math]\displaystyle{ \textstyle P_{Y_r}(y) }[/math] we calculated earlier, so the AVC is indeed symmetric when [math]\displaystyle{ \textstyle I(P) }[/math] is equal to [math]\displaystyle{ \textstyle 0 }[/math]. Therefore, [math]\displaystyle{ \textstyle I(P) }[/math] can only be positive if the AVC is not symmetric.

Proof of second part for capacity: See the paper "The capacity of the arbitrarily varying channel revisited: positivity, constraints," referenced below for full proof.

Capacity of AVCs with input and state constraints

The next theorem will deal with the capacity for AVCs with input and/or state constraints. These constraints help to decrease the very large range of possibilities for transmission and error on an AVC, making it a bit easier to see how the AVC behaves.

Before we go on to Theorem 2, we need to define a few definitions and lemmas:

For such AVCs, there exists:

- An input constraint [math]\displaystyle{ \textstyle \Gamma }[/math] based on the equation [math]\displaystyle{ \displaystyle g(x) = \frac{1}{n}\sum_{i=1}^n g(x_i) }[/math], where [math]\displaystyle{ \textstyle x \in X }[/math] and [math]\displaystyle{ \textstyle x = (x_1,\dots,x_n) }[/math].
- A state constraint [math]\displaystyle{ \textstyle \Lambda }[/math], based on the equation [math]\displaystyle{ \displaystyle l(s) = \frac{1}{n}\sum_{i=1}^n l(s_i) }[/math], where [math]\displaystyle{ \textstyle s \in X }[/math] and [math]\displaystyle{ \textstyle s = (s_1,\dots,s_n) }[/math].
- [math]\displaystyle{ \displaystyle \Lambda_0(P) = \min \sum_{x \in X, s \in S}P(x)l(s) }[/math]
- [math]\displaystyle{ \textstyle I(P, \Lambda) }[/math] is very similar to [math]\displaystyle{ \textstyle I(P) }[/math] equation mentioned previously, [math]\displaystyle{ \displaystyle I(P, \Lambda) = \min_{Y_r} I(X_r \land Y_r) }[/math], but now any state [math]\displaystyle{ \textstyle s }[/math] or [math]\displaystyle{ \textstyle S_r }[/math] in the equation must follow the [math]\displaystyle{ \textstyle l(s) \leq \Lambda }[/math] state restriction.

Assume [math]\displaystyle{ \textstyle g(x) }[/math] is a given non-negative-valued function on [math]\displaystyle{ \textstyle X }[/math] and [math]\displaystyle{ \textstyle l(s) }[/math] is a given non-negative-valued function on [math]\displaystyle{ \textstyle S }[/math] and that the minimum values for both is [math]\displaystyle{ \textstyle 0 }[/math]. In the literature I have read on this subject, the exact definitions of both [math]\displaystyle{ \textstyle g(x) }[/math] and [math]\displaystyle{ \textstyle l(s) }[/math] (for one variable [math]\displaystyle{ \textstyle x_i }[/math],) is never described formally. The usefulness of the input constraint [math]\displaystyle{ \textstyle \Gamma }[/math] and the state constraint [math]\displaystyle{ \textstyle \Lambda }[/math] will be based on these equations.

For AVCs with input and/or state constraints, the rate [math]\displaystyle{ \textstyle R }[/math] is now limited to codewords of format [math]\displaystyle{ \textstyle x_1,\dots,x_N }[/math] that satisfy [math]\displaystyle{ \textstyle g(x_i) \leq \Gamma }[/math], and now the state [math]\displaystyle{ \textstyle s }[/math] is limited to all states that satisfy [math]\displaystyle{ \textstyle l(s) \leq \Lambda }[/math]. The largest rate is still considered the capacity of the AVC, and is now denoted as [math]\displaystyle{ \textstyle c(\Gamma, \Lambda) }[/math].

Lemma 1: Any codes where [math]\displaystyle{ \textstyle \Lambda }[/math] is greater than [math]\displaystyle{ \textstyle \Lambda_0(P) }[/math] cannot be considered "good" codes, because those kinds of codes have a maximum average probability of error greater than or equal to [math]\displaystyle{ \textstyle \frac{N-1}{2N} - \frac{1}{n}\frac{l_{max}^2}{n(\Lambda - \Lambda_0(P))^2} }[/math], where [math]\displaystyle{ \textstyle l_{max} }[/math] is the maximum value of [math]\displaystyle{ \textstyle l(s) }[/math]. This isn't a good maximum average error probability because it is fairly large, [math]\displaystyle{ \textstyle \frac{N-1}{2N} }[/math] is close to [math]\displaystyle{ \textstyle \frac{1}{2} }[/math], and the other part of the equation will be very small since the [math]\displaystyle{ \textstyle (\Lambda - \Lambda_0(P)) }[/math] value is squared, and [math]\displaystyle{ \textstyle \Lambda }[/math] is set to be larger than [math]\displaystyle{ \textstyle \Lambda_0(P) }[/math]. Therefore, it would be very unlikely to receive a codeword without error. This is why the [math]\displaystyle{ \textstyle \Lambda_0(P) }[/math] condition is present in Theorem 2.

Theorem 2: Given a positive [math]\displaystyle{ \textstyle \Lambda }[/math] and arbitrarily small [math]\displaystyle{ \textstyle \alpha \gt 0 }[/math], [math]\displaystyle{ \textstyle \beta \gt 0 }[/math], [math]\displaystyle{ \textstyle \delta \gt 0 }[/math], for any block length [math]\displaystyle{ \textstyle n \geq n_0 }[/math] and for any type [math]\displaystyle{ \textstyle P }[/math] with conditions [math]\displaystyle{ \textstyle \Lambda_0(P) \geq \Lambda + \alpha }[/math] and [math]\displaystyle{ \displaystyle \min_{x \in X}P(x) \geq \beta }[/math], and where [math]\displaystyle{ \textstyle P_{X_r} = P }[/math], there exists a code with codewords [math]\displaystyle{ \textstyle x_1,\dots,x_N }[/math], each of type [math]\displaystyle{ \textstyle P }[/math], that satisfy the following equations: [math]\displaystyle{ \textstyle \frac{1}{n}\log N \gt I(P,\Lambda) - \delta }[/math], [math]\displaystyle{ \displaystyle \max_{l(s) \leq \Lambda} \bar{e}(s) \leq \exp(-n\gamma) }[/math], and where positive [math]\displaystyle{ \textstyle n_0 }[/math] and [math]\displaystyle{ \textstyle \gamma }[/math] depend only on [math]\displaystyle{ \textstyle \alpha }[/math], [math]\displaystyle{ \textstyle \beta }[/math], [math]\displaystyle{ \textstyle \delta }[/math], and the given AVC.

Proof of Theorem 2: See the paper "The capacity of the arbitrarily varying channel revisited: positivity, constraints," referenced below for full proof.

Capacity of randomized AVCs

The next theorem will be for AVCs with randomized code. For such AVCs the code is a random variable with values from a family of length-n block codes, and these codes are not allowed to depend/rely on the actual value of the codeword. These codes have the same maximum and average error probability value for any channel because of its random nature. These types of codes also help to make certain properties of the AVC more clear.

Before we go on to Theorem 3, we need to define a couple important terms first:

[math]\displaystyle{ \displaystyle W_{\zeta}(y|x) = \sum_{s \in S} W(y|x, s)P_{S_r}(s) }[/math]
[math]\displaystyle{ \textstyle I(P, \zeta) }[/math] is very similar to the [math]\displaystyle{ \textstyle I(P) }[/math] equation mentioned previously, [math]\displaystyle{ \displaystyle I(P, \zeta) = \min_{Y_r} I(X_r \land Y_r) }[/math], but now the pmf [math]\displaystyle{ \textstyle P_{S_r}(s) }[/math] is added to the equation, making the minimum of [math]\displaystyle{ \textstyle I(P, \zeta) }[/math] based a new form of [math]\displaystyle{ \textstyle P_{X_{r}S_{r}Y_{r}} }[/math], where [math]\displaystyle{ \textstyle W_{\zeta}(y|x) }[/math] replaces [math]\displaystyle{ \textstyle W(y|x, s) }[/math].

Theorem 3: The capacity for randomized codes of the AVC is [math]\displaystyle{ \displaystyle c = max_P I(P, \zeta) }[/math].

Proof of Theorem 3: See paper "The Capacities of Certain Channel Classes Under Random Coding" referenced below for full proof.

See also

References