Scoring algorithm

From HandWiki

Scoring algorithm, also known as Fisher's scoring,[1] is a form of Newton's method used in statistics to solve maximum likelihood equations numerically, named after Ronald Fisher.

Sketch of derivation

Let [math]\displaystyle{ Y_1,\ldots,Y_n }[/math] be random variables, independent and identically distributed with twice differentiable p.d.f. [math]\displaystyle{ f(y; \theta) }[/math], and we wish to calculate the maximum likelihood estimator (M.L.E.) [math]\displaystyle{ \theta^* }[/math] of [math]\displaystyle{ \theta }[/math]. First, suppose we have a starting point for our algorithm [math]\displaystyle{ \theta_0 }[/math], and consider a Taylor expansion of the score function, [math]\displaystyle{ V(\theta) }[/math], about [math]\displaystyle{ \theta_0 }[/math]:

[math]\displaystyle{ V(\theta) \approx V(\theta_0) - \mathcal{J}(\theta_0)(\theta - \theta_0), \, }[/math]

where

[math]\displaystyle{ \mathcal{J}(\theta_0) = - \sum_{i=1}^n \left. \nabla \nabla^{\top} \right|_{\theta=\theta_0} \log f(Y_i ; \theta) }[/math]

is the observed information matrix at [math]\displaystyle{ \theta_0 }[/math]. Now, setting [math]\displaystyle{ \theta = \theta^* }[/math], using that [math]\displaystyle{ V(\theta^*) = 0 }[/math] and rearranging gives us:

[math]\displaystyle{ \theta^* \approx \theta_{0} + \mathcal{J}^{-1}(\theta_{0})V(\theta_{0}). \, }[/math]

We therefore use the algorithm

[math]\displaystyle{ \theta_{m+1} = \theta_{m} + \mathcal{J}^{-1}(\theta_{m})V(\theta_{m}), \, }[/math]

and under certain regularity conditions, it can be shown that [math]\displaystyle{ \theta_m \rightarrow \theta^* }[/math].

Fisher scoring

In practice, [math]\displaystyle{ \mathcal{J}(\theta) }[/math] is usually replaced by [math]\displaystyle{ \mathcal{I}(\theta)= \mathrm{E}[\mathcal{J}(\theta)] }[/math], the Fisher information, thus giving us the Fisher Scoring Algorithm:

[math]\displaystyle{ \theta_{m+1} = \theta_{m} + \mathcal{I}^{-1}(\theta_{m})V(\theta_{m}) }[/math]..

Under some regularity conditions, if [math]\displaystyle{ \theta_m }[/math] is a consistent estimator, then [math]\displaystyle{ \theta_{m+1} }[/math] (the correction after a single step) is 'optimal' in the sense that its error distribution is asymptotically identical to that of the true max-likelihood estimate.[2]

See also

References

  1. Longford, Nicholas T. (1987). "A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects". Biometrika 74 (4): 817–827. doi:10.1093/biomet/74.4.817. 
  2. Li, Bing; Babu, G. Jogesh (2019), "Bayesian Inference", Springer Texts in Statistics (New York, NY: Springer New York): Theorem 9.4, doi:10.1007/978-1-4939-9761-9_6, ISBN 978-1-4939-9759-6, http://dx.doi.org/10.1007/978-1-4939-9761-9_6, retrieved 2023-01-03 

Further reading