Scoring algorithm
Scoring algorithm, also known as Fisher's scoring,[1] is a form of Newton's method used in statistics to solve maximum likelihood equations numerically, named after Ronald Fisher.
Sketch of derivation
Let [math]\displaystyle{ Y_1,\ldots,Y_n }[/math] be random variables, independent and identically distributed with twice differentiable p.d.f. [math]\displaystyle{ f(y; \theta) }[/math], and we wish to calculate the maximum likelihood estimator (M.L.E.) [math]\displaystyle{ \theta^* }[/math] of [math]\displaystyle{ \theta }[/math]. First, suppose we have a starting point for our algorithm [math]\displaystyle{ \theta_0 }[/math], and consider a Taylor expansion of the score function, [math]\displaystyle{ V(\theta) }[/math], about [math]\displaystyle{ \theta_0 }[/math]:
- [math]\displaystyle{ V(\theta) \approx V(\theta_0) - \mathcal{J}(\theta_0)(\theta - \theta_0), \, }[/math]
where
- [math]\displaystyle{ \mathcal{J}(\theta_0) = - \sum_{i=1}^n \left. \nabla \nabla^{\top} \right|_{\theta=\theta_0} \log f(Y_i ; \theta) }[/math]
is the observed information matrix at [math]\displaystyle{ \theta_0 }[/math]. Now, setting [math]\displaystyle{ \theta = \theta^* }[/math], using that [math]\displaystyle{ V(\theta^*) = 0 }[/math] and rearranging gives us:
- [math]\displaystyle{ \theta^* \approx \theta_{0} + \mathcal{J}^{-1}(\theta_{0})V(\theta_{0}). \, }[/math]
We therefore use the algorithm
- [math]\displaystyle{ \theta_{m+1} = \theta_{m} + \mathcal{J}^{-1}(\theta_{m})V(\theta_{m}), \, }[/math]
and under certain regularity conditions, it can be shown that [math]\displaystyle{ \theta_m \rightarrow \theta^* }[/math].
Fisher scoring
In practice, [math]\displaystyle{ \mathcal{J}(\theta) }[/math] is usually replaced by [math]\displaystyle{ \mathcal{I}(\theta)= \mathrm{E}[\mathcal{J}(\theta)] }[/math], the Fisher information, thus giving us the Fisher Scoring Algorithm:
- [math]\displaystyle{ \theta_{m+1} = \theta_{m} + \mathcal{I}^{-1}(\theta_{m})V(\theta_{m}) }[/math]..
Under some regularity conditions, if [math]\displaystyle{ \theta_m }[/math] is a consistent estimator, then [math]\displaystyle{ \theta_{m+1} }[/math] (the correction after a single step) is 'optimal' in the sense that its error distribution is asymptotically identical to that of the true max-likelihood estimate.[2]
See also
References
- ↑ Longford, Nicholas T. (1987). "A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects". Biometrika 74 (4): 817–827. doi:10.1093/biomet/74.4.817.
- ↑ Li, Bing; Babu, G. Jogesh (2019), "Bayesian Inference", Springer Texts in Statistics (New York, NY: Springer New York): Theorem 9.4, doi:10.1007/978-1-4939-9761-9_6, ISBN 978-1-4939-9759-6, http://dx.doi.org/10.1007/978-1-4939-9761-9_6, retrieved 2023-01-03
Further reading
- Jennrich, R. I.; Sampson, P. F. (1976). "Newton-Raphson and Related Algorithms for Maximum Likelihood Variance Component Estimation". Technometrics 18 (1): 11–17. doi:10.1080/00401706.1976.10489395. https://www.tandfonline.com/doi/abs/10.1080/00401706.1976.10489395.
Original source: https://en.wikipedia.org/wiki/Scoring algorithm.
Read more |