Mean and predicted response

In linear regression, mean response and predicted response are values of the dependent variable calculated from the regression parameters and a given value of the independent variable. The values of these two responses are the same, but their calculated variances are different.

Background

In straight line fitting, the model is

y_{i} = α + β x_{i} + ε_{i}

where $y_{i}$ is the response variable, $x_{i}$ is the explanatory variable, ε_i is the random error, and $α$ and $β$ are parameters. The mean, and predicted, response value for a given explanatory value, x_d, is given by

{\hat{y}}_{d} = \hat{α} + \hat{β} x_{d},

while the actual response would be

y_{d} = α + β x_{d} + ε_{d}

Expressions for the values and variances of $\hat{α}$ and $\hat{β}$ are given in linear regression.

Mean response

Since the data in this context is defined to be (x, y) pairs for every observation, the mean response at a given value of x, say x_d, is an estimate of the mean of the y values in the population at the x value of x_d, that is $\hat{E} (y ∣ x_{d}) \equiv {\hat{y}}_{d}$ . The variance of the mean response is given by

Var (\hat{α} + \hat{β} x_{d}) = Var (\hat{α}) + (Var \hat{β}) x_{d}^{2} + 2 x_{d} Cov (\hat{α}, \hat{β}) .

This expression can be simplified to

Var (\hat{α} + \hat{β} x_{d}) = σ^{2} (\frac{1}{m} + \frac{{(x_{d} - \bar{x})}^{2}}{\sum (x_{i} - \bar{x})^{2}}),

where m is the number of data points.

To demonstrate this simplification, one can make use of the identity

\sum (x_{i} - \bar{x})^{2} = \sum x_{i}^{2} - \frac{1}{m} {(\sum x_{i})}^{2} .

Predicted response

The predicted response distribution is the predicted distribution of the residuals at the given point x_d. So the variance is given by

\begin{aligned} Var (y_{d} - [\hat{α} + \hat{β} x_{d}]) & = Var (y_{d}) + Var (\hat{α} + \hat{β} x_{d}) - 2 Cov (y_{d}, [\hat{α} + \hat{β} x_{d}]) \\ = Var (y_{d}) + Var (\hat{α} + \hat{β} x_{d}) . \end{aligned}

The second line follows from the fact that $Cov (y_{d}, [\hat{α} + \hat{β} x_{d}])$ is zero because the new prediction point is independent of the data used to fit the model. Additionally, the term $Var (\hat{α} + \hat{β} x_{d})$ was calculated earlier for the mean response.

Since $Var (y_{d}) = σ^{2}$ (a fixed but unknown parameter that can be estimated), the variance of the predicted response is given by

\begin{aligned} Var (y_{d} - [\hat{α} + \hat{β} x_{d}]) & = σ^{2} + σ^{2} (\frac{1}{m} + \frac{{(x_{d} - \bar{x})}^{2}}{\sum (x_{i} - \bar{x})^{2}}) \\ = σ^{2} (1 + \frac{1}{m} + \frac{(x_{d} - \bar{x})^{2}}{\sum (x_{i} - \bar{x})^{2}}) . \end{aligned}

Confidence intervals

The $100 (1 - α) %$ confidence intervals are computed as $y_{d} \pm t_{\frac{α}{2}, m - n - 1} \sqrt{Var}$ . Thus, the confidence interval for predicted response is wider than the interval for mean response. This is expected intuitively – the variance of the population of $y$ values does not shrink when one samples from it, because the random variable ε_i does not decrease, but the variance of the mean of the $y$ does shrink with increased sampling, because the variance in $\hat{α}$ and $\hat{β}$ decrease, so the mean response (predicted response value) becomes closer to $α + β x_{d}$ .

This is analogous to the difference between the variance of a population and the variance of the sample mean of a population: the variance of a population is a parameter and does not change, but the variance of the sample mean decreases with increased samples.

General linear regression

The general linear model can be written as

y_{i} = \sum_{j = 1}^{n} X_{i j} β_{j} + ε_{i}

Therefore, since $y_{d} = \sum_{j = 1}^{n} X_{d j} {\hat{β}}_{j}$ the general expression for the variance of the mean response is

Var (\sum_{j = 1}^{n} X_{d j} {\hat{β}}_{j}) = \sum_{i = 1}^{n} \sum_{j = 1}^{n} X_{d i} S_{i j} X_{d j},

where S is the covariance matrix of the parameters, given by

𝐒 = σ^{2} {(𝐗^{T} 𝐗)}^{- 1} .

References

Draper, N. R.; Smith, H. (1998). Applied Regression Analysis (3rd ed.). John Wiley. ISBN 0-471-17082-8.

0.00

(0 votes)

Anonymous

Search

Mean and predicted response

Namespaces

More

Page actions

Contents

Background

Mean response

Predicted response

Confidence intervals

General linear regression

References

Navigation

Navigation

Help

googletranslator

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Mean and predicted response

Background

Mean response

Predicted response

Confidence intervals

General linear regression

References

Navigation

Wiki tools

Page tools

Other projects

Categories