Control variates

From HandWiki
Short description: Technique for increasing the precision of estimates in Monte Carlo experiments

The control variates method is a variance reduction technique used in Monte Carlo methods. It exploits information about the errors in estimates of known quantities to reduce the error of an estimate of an unknown quantity.[1] [2][3]

Underlying principle

Let the unknown parameter of interest be [math]\displaystyle{ \mu }[/math], and assume we have a statistic [math]\displaystyle{ m }[/math] such that the expected value of m is μ: [math]\displaystyle{ \mathbb{E}\left[m\right]=\mu }[/math], i.e. m is an unbiased estimator for μ. Suppose we calculate another statistic [math]\displaystyle{ t }[/math] such that [math]\displaystyle{ \mathbb{E}\left[t\right]=\tau }[/math] is a known value. Then

[math]\displaystyle{ m^\star = m + c\left(t-\tau\right) \, }[/math]

is also an unbiased estimator for [math]\displaystyle{ \mu }[/math] for any choice of the coefficient [math]\displaystyle{ c }[/math]. The variance of the resulting estimator [math]\displaystyle{ m^{\star} }[/math] is

[math]\displaystyle{ \textrm{Var}\left(m^{\star}\right)=\textrm{Var}\left(m\right) + c^2\,\textrm{Var}\left(t\right) + 2c\,\textrm{Cov}\left(m,t\right). }[/math]

By differentiating the above expression with respect to [math]\displaystyle{ c }[/math], it can be shown that choosing the optimal coefficient

[math]\displaystyle{ c^\star = - \frac{\textrm{Cov}\left(m,t\right)}{\textrm{Var}\left(t\right)} }[/math]

minimizes the variance of [math]\displaystyle{ m^{\star} }[/math]. (Note that this coefficient is the same as the coefficient obtained from a linear regression.) With this choice,

[math]\displaystyle{ \begin{align} \textrm{Var}\left(m^{\star}\right) & =\textrm{Var}\left(m\right) - \frac{\left[\textrm{Cov}\left(m,t\right)\right]^2}{\textrm{Var}\left(t\right)} \\ & = \left(1-\rho_{m,t}^2\right)\textrm{Var}\left(m\right) \end{align} }[/math]

where

[math]\displaystyle{ \rho_{m,t}=\textrm{Corr}\left(m,t\right) \, }[/math]

is the correlation coefficient of [math]\displaystyle{ m }[/math] and [math]\displaystyle{ t }[/math]. The greater the value of [math]\displaystyle{ \vert\rho_{m,t}\vert }[/math], the greater the variance reduction achieved.

In the case that [math]\displaystyle{ \textrm{Cov}\left(m,t\right) }[/math], [math]\displaystyle{ \textrm{Var}\left(t\right) }[/math], and/or [math]\displaystyle{ \rho_{m,t}\; }[/math] are unknown, they can be estimated across the Monte Carlo replicates. This is equivalent to solving a certain least squares system; therefore this technique is also known as regression sampling.

When the expectation of the control variable, [math]\displaystyle{ \mathbb{E}\left[t\right]=\tau }[/math], is not known analytically, it is still possible to increase the precision in estimating [math]\displaystyle{ \mu }[/math] (for a given fixed simulation budget), provided that the two conditions are met: 1) evaluating [math]\displaystyle{ t }[/math] is significantly cheaper than computing [math]\displaystyle{ m }[/math]; 2) the magnitude of the correlation coefficient [math]\displaystyle{ |\rho_{m,t}| }[/math] is close to unity. [3]

Example

We would like to estimate

[math]\displaystyle{ I = \int_0^1 \frac{1}{1+x} \, \mathrm{d}x }[/math]

using Monte Carlo integration. This integral is the expected value of [math]\displaystyle{ f(U) }[/math], where

[math]\displaystyle{ f(U) = \frac{1}{1+U} }[/math]

and U follows a uniform distribution [0, 1]. Using a sample of size n denote the points in the sample as [math]\displaystyle{ u_1, \cdots, u_n }[/math]. Then the estimate is given by

[math]\displaystyle{ I \approx \frac{1}{n} \sum_i f(u_i). }[/math]

Now we introduce [math]\displaystyle{ g(U) = 1+U }[/math] as a control variate with a known expected value [math]\displaystyle{ \mathbb{E}\left[g\left(U\right)\right]=\int_0^1 (1+x) \, \mathrm{d}x=\tfrac{3}{2} }[/math] and combine the two into a new estimate

[math]\displaystyle{ I \approx \frac{1}{n} \sum_i f(u_i)+c\left(\frac{1}{n}\sum_i g(u_i) -3/2\right). }[/math]

Using [math]\displaystyle{ n=1500 }[/math] realizations and an estimated optimal coefficient [math]\displaystyle{ c^\star \approx 0.4773 }[/math] we obtain the following results

Estimate Variance
Classical estimate 0.69475 0.01947
Control variates 0.69295 0.00060

The variance was significantly reduced after using the control variates technique. (The exact result is [math]\displaystyle{ I=\ln 2 \approx 0.69314718 }[/math].)

See also


Notes

  1. Lemieux, C. (2017). "Control Variates". Wiley StatsRef: Statistics Reference Online: 1–8. doi:10.1002/9781118445112.stat07947. ISBN 9781118445112. 
  2. Glasserman, P. (2004). Monte Carlo Methods in Financial Engineering. New York: Springer. ISBN:0-387-00451-3 (p. 185)
  3. 3.0 3.1 Botev, Z.; Ridder, A. (2017). "Variance Reduction". Wiley StatsRef: Statistics Reference Online: 1–6. doi:10.1002/9781118445112.stat07975. ISBN 9781118445112. 

References

  • Ross, Sheldon M. (2002) Simulation 3rd edition ISBN:978-0-12-598053-1
  • Averill M. Law & W. David Kelton (2000), Simulation Modeling and Analysis, 3rd edition. ISBN:0-07-116537-1
  • S. P. Meyn (2007) Control Techniques for Complex Networks, Cambridge University Press. ISBN:978-0-521-88441-9. Downloadable draft (Section 11.4: Control variates and shadow functions)