Scoring algorithm

Scoring algorithm, also known as Fisher's scoring,^[1] is a form of Newton's method used in statistics to solve maximum likelihood equations numerically, named after Ronald Fisher.

Sketch of derivation

Let [math]\displaystyle{ Y_1,\ldots,Y_n }[/math] be random variables, independent and identically distributed with twice differentiable p.d.f. [math]\displaystyle{ f(y; \theta) }[/math], and we wish to calculate the maximum likelihood estimator (M.L.E.) [math]\displaystyle{ \theta^* }[/math] of [math]\displaystyle{ \theta }[/math]. First, suppose we have a starting point for our algorithm [math]\displaystyle{ \theta_0 }[/math], and consider a Taylor expansion of the score function, [math]\displaystyle{ V(\theta) }[/math], about [math]\displaystyle{ \theta_0 }[/math]:

[math]\displaystyle{ V(\theta) \approx V(\theta_0) - \mathcal{J}(\theta_0)(\theta - \theta_0), \, }[/math]

where

[math]\displaystyle{ \mathcal{J}(\theta_0) = - \sum_{i=1}^n \left. \nabla \nabla^{\top} \right|_{\theta=\theta_0} \log f(Y_i ; \theta) }[/math]

is the observed information matrix at [math]\displaystyle{ \theta_0 }[/math]. Now, setting [math]\displaystyle{ \theta = \theta^* }[/math], using that [math]\displaystyle{ V(\theta^*) = 0 }[/math] and rearranging gives us:

[math]\displaystyle{ \theta^* \approx \theta_{0} + \mathcal{J}^{-1}(\theta_{0})V(\theta_{0}). \, }[/math]

We therefore use the algorithm

[math]\displaystyle{ \theta_{m+1} = \theta_{m} + \mathcal{J}^{-1}(\theta_{m})V(\theta_{m}), \, }[/math]

and under certain regularity conditions, it can be shown that [math]\displaystyle{ \theta_m \rightarrow \theta^* }[/math].

Fisher scoring

In practice, [math]\displaystyle{ \mathcal{J}(\theta) }[/math] is usually replaced by [math]\displaystyle{ \mathcal{I}(\theta)= \mathrm{E}[\mathcal{J}(\theta)] }[/math], the Fisher information, thus giving us the Fisher Scoring Algorithm:

[math]\displaystyle{ \theta_{m+1} = \theta_{m} + \mathcal{I}^{-1}(\theta_{m})V(\theta_{m}) }[/math]..

Under some regularity conditions, if [math]\displaystyle{ \theta_m }[/math] is a consistent estimator, then [math]\displaystyle{ \theta_{m+1} }[/math] (the correction after a single step) is 'optimal' in the sense that its error distribution is asymptotically identical to that of the true max-likelihood estimate.^[2]

References

↑ Longford, Nicholas T. (1987). "A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects". Biometrika 74 (4): 817–827. doi:10.1093/biomet/74.4.817.
↑ Li, Bing; Babu, G. Jogesh (2019), "Bayesian Inference", Springer Texts in Statistics (New York, NY: Springer New York): Theorem 9.4, doi:10.1007/978-1-4939-9761-9_6, ISBN 978-1-4939-9759-6, http://dx.doi.org/10.1007/978-1-4939-9761-9_6, retrieved 2023-01-03

Anonymous

Search

Scoring algorithm

Namespaces

More

Page actions

Contents

Sketch of derivation

Fisher scoring

See also

References

Further reading

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools

Anonymous

Search

Scoring algorithm

Sketch of derivation

Fisher scoring

See also

References

Further reading

Navigation

Wiki tools

Page tools

Other projects

Categories