Scoring algorithm: Difference between revisions

From HandWiki
imported>NBrush
update
 
fix
 
Line 1: Line 1:
{{Short description|Form of Newton's method used in statistics}}
'''Scoring algorithm''', also known as '''Fisher's scoring''',<ref>{{cite journal |first=Nicholas T. |last=Longford |title=A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects |journal=Biometrika |volume=74 |issue=4 |year=1987 |pages=817–827 |doi=10.1093/biomet/74.4.817 }}</ref> is a form of [[Newton's method]] used in [[Statistics|statistics]] to solve maximum likelihood equations [[Numerical analysis|numerically]], named after [[Biography:Ronald Fisher|Ronald Fisher]].
'''Scoring algorithm''', also known as '''Fisher's scoring''',<ref>{{cite journal |first=Nicholas T. |last=Longford |title=A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects |journal=Biometrika |volume=74 |issue=4 |year=1987 |pages=817–827 |doi=10.1093/biomet/74.4.817 }}</ref> is a form of [[Newton's method]] used in [[Statistics|statistics]] to solve maximum likelihood equations [[Numerical analysis|numerically]], named after [[Biography:Ronald Fisher|Ronald Fisher]].
==Sketch of derivation==
==Sketch of derivation==
Let <math>Y_1,\ldots,Y_n</math> be [[Random variable|random variable]]s, independent and identically distributed with twice differentiable [[Physics:Probability density function|p.d.f.]] <math>f(y; \theta)</math>, and we wish to calculate the maximum likelihood estimator (M.L.E.) <math>\theta^*</math> of <math>\theta</math>.  First, suppose we have a starting point for our algorithm <math>\theta_0</math>, and consider a [[Taylor series|Taylor expansion]] of the [[Score (statistics)|score function]], <math>V(\theta)</math>, about <math>\theta_0</math>:
Let <math>Y_1,\ldots,Y_n</math> be [[Random variable|random variable]]s, independent and identically distributed with twice differentiable [[Physics:Probability density function|p.d.f.]] <math>f(y; \theta)</math>, and we wish to calculate the maximum likelihood estimator (M.L.E.) <math>\theta^*</math> of <math>\theta</math>.  First, suppose we have a starting point for our algorithm <math>\theta_0</math>, and consider a [[Taylor series|Taylor expansion]] of the [[Score (statistics)|score function]], <math>V(\theta)</math>, about <math>\theta_0</math>:
Line 25: Line 27:
: <math>\theta_{m+1} = \theta_{m} + \mathcal{I}^{-1}(\theta_{m})V(\theta_{m})</math>..
: <math>\theta_{m+1} = \theta_{m} + \mathcal{I}^{-1}(\theta_{m})V(\theta_{m})</math>..


Under some regularity conditions, if <math>\theta_m</math> is a [[Consistent estimator|consistent estimator]], then <math>\theta_{m+1}</math> (the correction after a single step) is 'optimal' in the sense that its error distribution is asymptotically identical to that of the true max-likelihood estimate.<ref>{{Citation |last1=Li |first1=Bing |title=Bayesian Inference |date=2019 |url=http://dx.doi.org/10.1007/978-1-4939-9761-9_6 |work=Springer Texts in Statistics |place=New York, NY |publisher=Springer New York |isbn=978-1-4939-9759-6 |access-date=2023-01-03 |last2=Babu |first2=G. Jogesh |doi=10.1007/978-1-4939-9761-9_6 |s2cid=239322258 |at=Theorem 9.4}}</ref>
Under some regularity conditions, if <math>\theta_m</math> is a [[Consistent estimator|consistent estimator]], then <math>\theta_{m+1}</math> (the correction after a single step) is 'optimal' in the sense that its error distribution is asymptotically identical to that of the true max-likelihood estimate.<ref>{{Citation |last1=Li |first1=Bing |title=Bayesian Inference |date=2019 |url=http://dx.doi.org/10.1007/978-1-4939-9761-9_6 |work=Springer Texts in Statistics |place=New York, NY |publisher=Springer New York |isbn=978-1-4939-9759-6 |access-date=2023-01-03 |last2=Babu |first2=G. Jogesh |doi=10.1007/978-1-4939-9761-9_6 |s2cid=239322258 |at=Theorem 9.4|url-access=subscription }}</ref>


==See also==
==See also==
Line 36: Line 38:


==Further reading==
==Further reading==
*{{cite journal |last1=Jennrich |first1=R. I. |last2=Sampson |first2=P. F. |name-list-style=amp |year=1976 |title=Newton-Raphson and Related Algorithms for Maximum Likelihood Variance Component Estimation |journal=[[Technometrics]] |volume=18 |issue=1 |pages=11–17 |doi=10.1080/00401706.1976.10489395 |doi-broken-date=1 August 2023 |jstor=1267911 | url=https://www.tandfonline.com/doi/abs/10.1080/00401706.1976.10489395 }}
*{{cite journal |last1=Jennrich |first1=R. I. |last2=Sampson |first2=P. F. |name-list-style=amp |year=1976 |title=Newton-Raphson and Related Algorithms for Maximum Likelihood Variance Component Estimation |journal=[[Technometrics]] |volume=18 |issue=1 |pages=11–17 |doi=10.1080/00401706.1976.10489395 |doi-broken-date=12 July 2025 |jstor=1267911 | url=https://www.tandfonline.com/doi/abs/10.1080/00401706.1976.10489395 |doi-access=free }}


{{Optimization algorithms|unconstrained}}
{{Optimization algorithms|unconstrained}}

Latest revision as of 09:01, 15 April 2026

Short description: Form of Newton's method used in statistics

Scoring algorithm, also known as Fisher's scoring,[1] is a form of Newton's method used in statistics to solve maximum likelihood equations numerically, named after Ronald Fisher.

Sketch of derivation

Let Y1,,Yn be random variables, independent and identically distributed with twice differentiable p.d.f. f(y;θ), and we wish to calculate the maximum likelihood estimator (M.L.E.) θ* of θ. First, suppose we have a starting point for our algorithm θ0, and consider a Taylor expansion of the score function, V(θ), about θ0:

V(θ)V(θ0)𝒥(θ0)(θθ0),

where

𝒥(θ0)=i=1n|θ=θ0logf(Yi;θ)

is the observed information matrix at θ0. Now, setting θ=θ*, using that V(θ*)=0 and rearranging gives us:

θ*θ0+𝒥1(θ0)V(θ0).

We therefore use the algorithm

θm+1=θm+𝒥1(θm)V(θm),

and under certain regularity conditions, it can be shown that θmθ*.

Fisher scoring

In practice, 𝒥(θ) is usually replaced by (θ)=E[𝒥(θ)], the Fisher information, thus giving us the Fisher Scoring Algorithm:

θm+1=θm+1(θm)V(θm)..

Under some regularity conditions, if θm is a consistent estimator, then θm+1 (the correction after a single step) is 'optimal' in the sense that its error distribution is asymptotically identical to that of the true max-likelihood estimate.[2]

See also

References

  1. Longford, Nicholas T. (1987). "A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects". Biometrika 74 (4): 817–827. doi:10.1093/biomet/74.4.817. 
  2. Li, Bing; Babu, G. Jogesh (2019), "Bayesian Inference", Springer Texts in Statistics (New York, NY: Springer New York): Theorem 9.4, doi:10.1007/978-1-4939-9761-9_6, ISBN 978-1-4939-9759-6, http://dx.doi.org/10.1007/978-1-4939-9761-9_6, retrieved 2023-01-03 

Further reading