Pseudolikelihood

From HandWiki

In statistical theory, a pseudolikelihood is an approximation to the joint probability distribution of a collection of random variables. The practical use of this is that it can provide an approximation to the likelihood function of a set of observed data which may either provide a computationally simpler problem for estimation, or may provide a way of obtaining explicit estimates of model parameters.

The pseudolikelihood approach was introduced by Julian Besag[1] in the context of analysing data having spatial dependence.

Definition

Given a set of random variables [math]\displaystyle{ X = X_1, X_2, \ldots, X_n }[/math] the pseudolikelihood of [math]\displaystyle{ X = x = (x_1,x_2, \ldots, x_n) }[/math] is

[math]\displaystyle{ L(\theta) := \prod_i \mathrm{Pr}_\theta(X_i = x_i\mid X_j = x_j \text{ for } j \neq i)=\prod_i \mathrm {Pr}_\theta (X_i = x_i \mid X_{-i}=x_{-i}) }[/math]

in discrete case and

[math]\displaystyle{ L(\theta) := \prod_i p_\theta(x_i \mid x_j \text{ for } j \neq i)=\prod_i p _\theta (x_i \mid x_{-i})=\prod _i p_\theta (x_i \mid x_1,\ldots, \hat x_i, \ldots, x_n) }[/math]

in continuous one. Here [math]\displaystyle{ X }[/math] is a vector of variables, [math]\displaystyle{ x }[/math] is a vector of values, [math]\displaystyle{ p_\theta(\cdot \mid \cdot) }[/math] is conditional density and [math]\displaystyle{ \theta =(\theta_1, \ldots, \theta_p) }[/math] is the vector of parameters we are to estimate. The expression [math]\displaystyle{ X = x }[/math] above means that each variable [math]\displaystyle{ X_i }[/math] in the vector [math]\displaystyle{ X }[/math] has a corresponding value [math]\displaystyle{ x_i }[/math] in the vector [math]\displaystyle{ x }[/math] and [math]\displaystyle{ x_{-i}=(x_1, \ldots,\hat x_i, \ldots, x_n) }[/math] means that the coordinate [math]\displaystyle{ x_i }[/math] has been omitted. The expression [math]\displaystyle{ \mathrm {Pr}_\theta(X = x) }[/math] is the probability that the vector of variables [math]\displaystyle{ X }[/math] has values equal to the vector [math]\displaystyle{ x }[/math]. This probability of course depends on the unknown parameter [math]\displaystyle{ \theta }[/math]. Because situations can often be described using state variables ranging over a set of possible values, the expression [math]\displaystyle{ \mathrm {Pr}_\theta(X = x) }[/math] can therefore represent the probability of a certain state among all possible states allowed by the state variables.

The pseudo-log-likelihood is a similar measure derived from the above expression, namely (in discrete case)

[math]\displaystyle{ l(\theta):=\log L(\theta) = \sum_i \log \mathrm{Pr}_\theta(X_i = x_i\mid X_j = x_j \text{ for } j \neq i). }[/math]

One use of the pseudolikelihood measure is as an approximation for inference about a Markov or Bayesian network, as the pseudolikelihood of an assignment to [math]\displaystyle{ X_i }[/math] may often be computed more efficiently than the likelihood, particularly when the latter may require marginalization over a large number of variables.

Properties

Use of the pseudolikelihood in place of the true likelihood function in a maximum likelihood analysis can lead to good estimates, but a straightforward application of the usual likelihood techniques to derive information about estimation uncertainty, or for significance testing, would in general be incorrect.[2]

References

  1. Besag, J. (1975), "Statistical Analysis of Non-Lattice Data", The Statistician 24 (3): 179–195, doi:10.2307/2987782 
  2. Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, Oxford University Press. ISBN 0-19-920613-9[full citation needed]