# Control variates

Short description: Technique for increasing the precision of estimates in Monte Carlo experiments

The control variates method is a variance reduction technique used in Monte Carlo methods. It exploits information about the errors in estimates of known quantities to reduce the error of an estimate of an unknown quantity. 

## Underlying principle

Let the unknown parameter of interest be $\displaystyle{ \mu }$, and assume we have a statistic $\displaystyle{ m }$ such that the expected value of m is μ: $\displaystyle{ \mathbb{E}\left[m\right]=\mu }$, i.e. m is an unbiased estimator for μ. Suppose we calculate another statistic $\displaystyle{ t }$ such that $\displaystyle{ \mathbb{E}\left[t\right]=\tau }$ is a known value. Then

$\displaystyle{ m^\star = m + c\left(t-\tau\right) \, }$

is also an unbiased estimator for $\displaystyle{ \mu }$ for any choice of the coefficient $\displaystyle{ c }$. The variance of the resulting estimator $\displaystyle{ m^{\star} }$ is

$\displaystyle{ \textrm{Var}\left(m^{\star}\right)=\textrm{Var}\left(m\right) + c^2\,\textrm{Var}\left(t\right) + 2c\,\textrm{Cov}\left(m,t\right). }$

By differentiating the above expression with respect to $\displaystyle{ c }$, it can be shown that choosing the optimal coefficient

$\displaystyle{ c^\star = - \frac{\textrm{Cov}\left(m,t\right)}{\textrm{Var}\left(t\right)} }$

minimizes the variance of $\displaystyle{ m^{\star} }$, and that with this choice,

\displaystyle{ \begin{align} \textrm{Var}\left(m^{\star}\right) & =\textrm{Var}\left(m\right) - \frac{\left[\textrm{Cov}\left(m,t\right)\right]^2}{\textrm{Var}\left(t\right)} \\ & = \left(1-\rho_{m,t}^2\right)\textrm{Var}\left(m\right) \end{align} }

where

$\displaystyle{ \rho_{m,t}=\textrm{Corr}\left(m,t\right) \, }$

is the correlation coefficient of $\displaystyle{ m }$ and $\displaystyle{ t }$. The greater the value of $\displaystyle{ \vert\rho_{m,t}\vert }$, the greater the variance reduction achieved.

In the case that $\displaystyle{ \textrm{Cov}\left(m,t\right) }$, $\displaystyle{ \textrm{Var}\left(t\right) }$, and/or $\displaystyle{ \rho_{m,t}\; }$ are unknown, they can be estimated across the Monte Carlo replicates. This is equivalent to solving a certain least squares system; therefore this technique is also known as regression sampling.

When the expectation of the control variable, $\displaystyle{ \mathbb{E}\left[t\right]=\tau }$, is not known analytically, it is still possible to increase the precision in estimating $\displaystyle{ \mu }$ (for a given fixed simulation budget), provided that the two conditions are met: 1) evaluating $\displaystyle{ t }$ is significantly cheaper than computing $\displaystyle{ m }$; 2) the magnitude of the correlation coefficient $\displaystyle{ |\rho_{m,t}| }$ is close to unity. 

## Example

We would like to estimate

$\displaystyle{ I = \int_0^1 \frac{1}{1+x} \, \mathrm{d}x }$

using Monte Carlo integration. This integral is the expected value of $\displaystyle{ f(U) }$, where

$\displaystyle{ f(U) = \frac{1}{1+U} }$

and U follows a uniform distribution [0, 1]. Using a sample of size n denote the points in the sample as $\displaystyle{ u_1, \cdots, u_n }$. Then the estimate is given by

$\displaystyle{ I \approx \frac{1}{n} \sum_i f(u_i). }$

Now we introduce $\displaystyle{ g(U) = 1+U }$ as a control variate with a known expected value $\displaystyle{ \mathbb{E}\left[g\left(U\right)\right]=\int_0^1 (1+x) \, \mathrm{d}x=\tfrac{3}{2} }$ and combine the two into a new estimate

$\displaystyle{ I \approx \frac{1}{n} \sum_i f(u_i)+c\left(\frac{1}{n}\sum_i g(u_i) -3/2\right). }$

Using $\displaystyle{ n=1500 }$ realizations and an estimated optimal coefficient $\displaystyle{ c^\star \approx 0.4773 }$ we obtain the following results

 Estimate Variance Classical estimate 0.69475 0.01947 Control variates 0.69295 0.00060

The variance was significantly reduced after using the control variates technique. (The exact result is $\displaystyle{ I=\ln 2 \approx 0.69314718 }$.)