Kramers–Moyal expansion

From HandWiki

In stochastic processes, Kramers–Moyal expansion refers to a Taylor series expansion of the master equation, named after Hans Kramers and José Enrique Moyal.[1][2][3] In many textbooks, the expansion is used only to derive the Fokker–Planck equation, and never used again. In general, continuous stochastic processes are essentially all Markovian, and so Fokker–Planck equations are sufficient for studying them. The higher-order Kramers–Moyal expansion only come into play when the process is jumpy. This usually means it is a Poisson-like process.[4][5] For a real stochastic process, one can compute its central moment functions from experimental data on the process, from which one can then compute its Kramers–Moyal coefficients, and thus empirically measure its Kolmogorov forward and backward equations. This is implemented as a python package [6]

Statement

Start with the integro-differential master equation

[math]\displaystyle{ \frac{\partial p(x,t)}{\partial t} =\int p(x,t|x_0, t_0)p(x_0, t_0) dx_0 }[/math]

where [math]\displaystyle{ p(x, t|x_0, t_0) }[/math] is the transition probability function, and [math]\displaystyle{ p(x,t) }[/math] is the probability density at time [math]\displaystyle{ t }[/math]. The Kramers–Moyal expansion transforms the above to an infinite order partial differential equation[7][8][9]

[math]\displaystyle{ \partial_t p(x,t) = \sum_{n=1}^\infty (-\partial_x)^n[D_n(x,t) p(x,t)] }[/math]

and also[math]\displaystyle{ \partial_t p(x, t|x_0, t_0) = \sum_{n=1}^\infty (-\partial_x)^n [D_n(x, t) p(x, t|x_0, t_0) ] }[/math]

where [math]\displaystyle{ D_n(x, t) }[/math] are the Kramers–Moyal coefficients, defined by[math]\displaystyle{ D_n(x, t) = \frac{1}{n!}\lim_{\tau\to 0} \frac{1}{\tau} \mu_n(t|x, t-\tau) }[/math]and [math]\displaystyle{ \mu_n }[/math] are the central moment functions, defined by

[math]\displaystyle{ \mu_n(t' | x, t) = \int_{-\infty}^\infty (x'-x)^n p(x', t'\mid x, t) \ dx'. }[/math]

The Fokker–Planck equation is obtained by keeping only the first two terms of the series in which [math]\displaystyle{ D_1 }[/math] is the drift and [math]\displaystyle{ D_2 }[/math] is the diffusion coefficient.[10]

Also, the moments, assuming they exist, evolves as[11]

[math]\displaystyle{ \frac{\partial}{\partial t}\left\langle x^n\right\rangle=\sum_{k=1}^n \frac{n !}{(n-k) !}\left\langle x^{n-k} D^{(k)}(x, t)\right\rangle }[/math]where angled brackets mean taking the expectation: [math]\displaystyle{ \left\langle f\right\rangle = \int f(x) p(x, t)dx }[/math].

n-dimensional version

The above version is the one-dimensional version. It generalizes to n-dimensions. (Section 4.7 [9])

Proof

In usual probability, where the probability density does not change, the moments of a probability density function determines the probability density itself by a Fourier transform (details may be found at the characteristic function page):[math]\displaystyle{ p(x) = \frac{1}{2\pi} \int e^{-ikx}\tilde p(k)dk = \sum_{n=0}^\infty \frac{(-1)^n}{n!}\delta^{(n)}(x)\mu_n }[/math][math]\displaystyle{ \tilde p(k) = \int e^{ikx} p(x) dx = \sum_{n=0}^\infty\frac{(ik)^n}{n!} \mu_n }[/math]Similarly, [math]\displaystyle{ p(x, t| x_0, t_0 ) = \sum_{n=0}^\infty \frac{(-1)^n}{n!}\delta^{(n)}(x-x_0) \mu_n(t|x_0, t_0) }[/math] Now we need to integrate away the Dirac delta function. Fixing a small [math]\displaystyle{ \tau \gt 0 }[/math], we have by the Chapman-Kolmogorov equation,[math]\displaystyle{ \begin{align} p(x, t) &= \int p(x,t|x', t-\tau) p(x', t-\tau) dx' \\ &= \sum_{n=0}^\infty \frac{(-1)^n}{n!}\int p(x', t-\tau) \delta^{(n)}(x-x') \mu_n(t|x', t-\tau) dx' \\ &= \sum_{n=0}^\infty \frac{(-1)^n}{n!} \partial_x^n (p(x, t-\tau) \mu_n(t|x, t-\tau)) \end{align} }[/math]The [math]\displaystyle{ n=0 }[/math] term is just [math]\displaystyle{ p(x, t-\tau) }[/math], so taking derivative with respect to time,[math]\displaystyle{ \partial_t p(x, t) = \lim_{\tau \to 0^+}\frac 1\tau \sum_{n=1}^\infty \frac{(-1)^n}{n!} \partial_x^n (p(x, t-\tau) \mu_n(t|x, t-\tau)) = \sum_{n=1}^\infty (-\partial_x)^n (p(x, t) D_n(x, t)) }[/math]

The same computation with [math]\displaystyle{ p(x, t|x_0, t_0) }[/math] gives the other equation.

Forward and backward equations

The equation can be recast into a linear operator form, using the idea of infinitesimal generator. Define the linear operator [math]\displaystyle{ \mathcal A f := \sum_{n=1}^\infty (-\partial_x)^n[D_n(x,t) f(x,t)] }[/math]then the equation above states [math]\displaystyle{ \begin{align} \partial_t p(x, t) &= \mathcal{A} p(x, t) \\ \partial_t p(x, t|x_0, t_0) &= \mathcal{A} p(x, t|x_0, t_0) \end{align} }[/math]In this form, the equations are precisely in the form of a general Kolmogorov forward equation. The backward equation then states that[math]\displaystyle{ \partial_t p(x_1, t_1|x, t) = -\mathcal{A}^\dagger p(x_1, t_1|x, t) }[/math]where[math]\displaystyle{ \mathcal A^\dagger f := \sum_{n=1}^\infty D_n(x,t) \partial_x^n[f(x,t)] }[/math] is the Hermitian adjoint of [math]\displaystyle{ \mathcal A }[/math].

Computing the Kramers–Moyal coefficients

By definition,[math]\displaystyle{ D_n(x, t) = \frac{1}{n!}\lim_{\tau\to 0} \frac{1}{\tau} \mu_n(t|x, t-\tau) }[/math]This definition works because [math]\displaystyle{ \mu_n(t|x, t) = 0 }[/math], as those are the central moments of the Dirac delta function. Since the even central moments are nonnegative, we have [math]\displaystyle{ D_{2n} \geq 0 }[/math] for all [math]\displaystyle{ n\geq 1 }[/math]. When the stochastic process is the Markov process [math]\displaystyle{ dX = bdt + \sigma dW_t }[/math], we can directly solve for [math]\displaystyle{ p(x, t|x, t-\tau) }[/math] as approximated by a normal distribution with mean [math]\displaystyle{ x + b(x)\tau }[/math] and variance [math]\displaystyle{ \sigma^2\tau }[/math]. This then allows us to compute the central moments, and so[math]\displaystyle{ D_1 = b, \quad D_2 = \frac 12 \sigma^2, \quad D_3=D_4=\cdots = 0 }[/math]This then gives us the 1-dimensional Fokker–Planck equation:[math]\displaystyle{ \partial_t p = -\partial_x(bp) + \frac 12 \partial_x^2(\sigma^2 p) }[/math]

Pawula theorem

Pawula theorem states that either the sequence [math]\displaystyle{ D_1, D_2, D_3, ... }[/math] becomes zero at the third term, or all its even terms are positive.[12][13]

Proof

By Cauchy–Schwarz inequality, the central moment functions satisfy [math]\displaystyle{ \mu_{n+m}^2 \leq \mu_{2n}\mu_{2m} }[/math]. So, taking the limit, we have [math]\displaystyle{ D_{n+m}^2 \leq \frac{(2n)!(2m)!}{(n+m)!^2}D_{2n}D_{2m} }[/math]. If some [math]\displaystyle{ D_{2+n} \neq 0 }[/math] for some [math]\displaystyle{ n \geq 1 }[/math], then [math]\displaystyle{ D_2 D_{2+2n}\gt 0 }[/math]. In particular, [math]\displaystyle{ D_{2+n}, D_{2+2n}, D_{2+4n}, ... \gt 0 }[/math]. So the existence of any nonzero coefficient of order [math]\displaystyle{ \geq 3 }[/math] implies the existence of nonzero coefficients of arbitrarily large order. Also, if [math]\displaystyle{ D_n \neq 0 }[/math], then [math]\displaystyle{ D_2D_{2n-2} \gt 0, D_4D_{2n-4} \gt 0, ... }[/math]. So the existence of any nonzero coefficient of order [math]\displaystyle{ n }[/math] implies all coefficients of order [math]\displaystyle{ 2, 4, ..., 2n-2 }[/math] are positive.

Interpretation

Let the operator [math]\displaystyle{ \mathcal A_m }[/math] be defined such [math]\displaystyle{ \mathcal A_m f := \sum_{n=1}^m (-\partial_x)^n[D_n(x,t) f(x,t)] }[/math]. The probability density evolves by [math]\displaystyle{ \partial_t\rho \approx \mathcal A_m \rho }[/math]. Different order of [math]\displaystyle{ m }[/math] gives different level of approximation.

  • [math]\displaystyle{ m = 0 }[/math]: the probability density does not evolve
  • [math]\displaystyle{ m=1 }[/math]: it evolves by deterministic drift only.
  • [math]\displaystyle{ m=2 }[/math]: it evolves by drift and Brownian motion (Fokker-Planck equation)
  • [math]\displaystyle{ m=\infty }[/math]: the fully exact equation.

Pawula theorem means that if truncating to the second term is not exact, that is, [math]\displaystyle{ \mathcal A_2 \neq \mathcal A }[/math], then truncating to any term is still not exact. Usually, this means that for any truncation [math]\displaystyle{ \mathcal A_m }[/math], there exists a probability density function [math]\displaystyle{ \rho }[/math] that can become negative during its evolution [math]\displaystyle{ \partial_t\rho \approx\mathcal A_m \rho }[/math] (and thus fail to be a probability density function). However, this doesn't mean that Kramers-Moyal expansions truncated at other choices of [math]\displaystyle{ m }[/math] is useless. Though the solution must have negative values at least for sufficiently small times, the resulting approximation probability density may still be better than the [math]\displaystyle{ m=2 }[/math] approximation.

References

  1. Kramers, H. A. (1940). "Brownian motion in a field of force and the diffusion model of chemical reactions". Physica 7 (4): 284–304. doi:10.1016/S0031-8914(40)90098-2. Bibcode1940Phy.....7..284K. 
  2. Moyal, J. E. (1949). "Stochastic processes and statistical physics". Journal of the Royal Statistical Society. Series B (Methodological) 11 (2): 150–210. 
  3. Risken, Hannes (6 December 2012). The Fokker-Planck Equation: Methods of Solution and Applications. ISBN 9783642968075. https://books.google.com/books?id=dXvpCAAAQBAJ&q=Pawula-Theorem&pg=PA70. 
  4. Tabar, M. Reza Rahimi (2019), Rahimi Tabar, M. Reza, ed., "Stochastic Processes with Jumps and Non-vanishing Higher-Order Kramers–Moyal Coefficients" (in en), Analysis and Data-Based Reconstruction of Complex Nonlinear Dynamical Systems: Using the Methods of Stochastic Processes, Understanding Complex Systems (Cham: Springer International Publishing): pp. 99–110, doi:10.1007/978-3-030-18472-8_11, ISBN 978-3-030-18472-8, https://doi.org/10.1007/978-3-030-18472-8_11, retrieved 2023-06-09 
  5. Spinney, Richard E.; Ford, Ian J. (2012-01-01). Fluctuation relations: a pedagogical overview. https://ui.adsabs.harvard.edu/abs/2012arXiv1201.6381S. 
  6. Rydin Gorjão, L.; Meirinhos, F. (2019). "kramersmoyal: Kramers--Moyal coefficients for stochastic processes". Journal of Open Source Software 4 (44): 1693. doi:10.21105/joss.01693. Bibcode2019JOSS....4.1693G. 
  7. Gardiner, C. (2009). Stochastic Methods (4th ed.). Berlin: Springer. ISBN 978-3-642-08962-6. 
  8. Van Kampen, N. G. (1992). Stochastic Processes in Physics and Chemistry. Elsevier. ISBN 0-444-89349-0. 
  9. 9.0 9.1 Risken, H. (1996). The Fokker–Planck Equation. Berlin, Heidelberg: Springer. pp. 63–95. ISBN 3-540-61530-X. 
  10. Paul, Wolfgang; Baschnagel, Jörg (2013). "A Brief Survey of the Mathematics of Probability Theory". Stochastic Processes. Springer. pp. 17–61 [esp. 33–35]. doi:10.1007/978-3-319-00327-6_2. 
  11. Tabar, M. Reza Rahimi (2019), Rahimi Tabar, M. Reza, ed., "Kramers–Moyal Expansion and Fokker–Planck Equation" (in en), Analysis and Data-Based Reconstruction of Complex Nonlinear Dynamical Systems: Using the Methods of Stochastic Processes, Understanding Complex Systems (Cham: Springer International Publishing): pp. 19–29, doi:10.1007/978-3-030-18472-8_3, ISBN 978-3-030-18472-8, https://doi.org/10.1007/978-3-030-18472-8_3, retrieved 2023-06-09 
  12. Pawula, R. F. (1967). "Generalizations and extensions of the Fokker–Planck–Kolmogorov equations". IEEE Transactions on Information Theory 13 (1): 33–41. doi:10.1109/TIT.1967.1053955. https://thesis.library.caltech.edu/8789/2/Pawula_rf_1965.pdf. 
  13. Pawula, R. F. (1967). "Approximation of the linear Boltzmann equation by the Fokker–Planck equation". Physical Review 162 (1): 186–188. doi:10.1103/PhysRev.162.186. Bibcode1967PhRv..162..186P.