Sparse identification of non-linear dynamics

From HandWiki
Short description: Data-driven algorithm


Sparse identification of nonlinear dynamics (SINDy) is a data-driven algorithm for obtaining dynamical systems from data.[1] Given a series of snapshots of a dynamical system and its corresponding time derivatives, SINDy performs a sparsity-promoting regression (such as LASSO and spare Bayesian inference[2]) on a library of nonlinear candidate functions of the snapshots against the derivatives to find the governing equations. This procedure relies on the assumption that most physical systems only have a few dominant terms which dictate the dynamics, given an appropriately selected coordinate system and quality training data.[3] It has been applied to identify the dynamics of fluids, based on proper orthogonal decomposition, as well as other complex dynamical systems, such as biological networks.[4]

Mathematical Overview

First, consider a dynamical system of the form

x˙=ddtx(t)=f(x(t)),

where x(t)n is a state vector (snapshot) of the system at time t and the function f(x(t)) defines the equations of motion and constraints of the system. The time derivative may be either prescribed or numerically approximated from the snapshots.

With x and x˙ sampled at m equidistant points in time (t1,t2,,tm), these can be arranged into matrices of the form

𝐗=[𝐱𝐓(t𝟏)𝐱𝐓(t𝟐)𝐱𝐓(t𝐦)]=[x1(t1)x2(t1)xn(t1)x1(t2)x2(t2)xn(t2)x1(tm)x2(tm)xn(tm)],

and similarly for X˙.

Next, a library Θ(X) of nonlinear candidate functions of the columns of X is constructed, which may be constant, polynomial, or more exotic functions (like trigonometric and rational terms, and so on):

   Θ(𝐗)=[||||||1𝐗𝐗𝟐𝐗𝟑sin(𝐗)cos(𝐗)||||||]

The number of possible model structures from this library is combinatorically high. f(x(t)) is then substituted by Θ(X) and a vector of coefficients Ξ=[ξ𝟏ξ𝟐ξ𝐧] determining the active terms in f(x(t)):

𝐗˙=Θ(𝐗)Ξ

Because only a few terms are expected to be active at each point in time, an assumption is made that f(x(t)) admits a sparse representation in Θ(X). This then becomes an optimization problem in finding a sparse Ξ which optimally embeds X˙. In other words, a parsimonious model is obtained by performing least squares regression on the system (4) with sparsity-promoting (L1) regularization

ξ𝐤=argminξ'𝐤||𝐗˙kΘ(𝐗)ξ'𝐤||𝟐+λ||ξ'𝐤||𝟏,

where λ is a regularization parameter. Finally, the sparse set of ξ𝐤 can be used to reconstruct the dynamical system:

x˙k=Θ(𝐱)ξ𝐤

References

  1. Brunton, Steven L.; Kutz, J. Nathan (2022-05-05) (in en). Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Higher Education from Cambridge University Press. doi:10.1017/9781009089517. ISBN 9781009089517. https://www.cambridge.org/highereducation/books/data-driven-science-and-engineering/6F9A730B7A9A9F43F68CF21A24BEC339. Retrieved 2022-10-25. 
  2. Huang, Yunfei. (2022). "Sparse inference and active learning of stochastic differential equations from data". Scientific Reports 12 (1). doi:10.1038/s41598-022-25638-9. PMID 36522347. Bibcode2022NatSR..1221691H. 
  3. Brunton, Steven L.; Proctor, Joshua L.; Kutz, J. Nathan (2016-04-12). "Discovering governing equations from data by sparse identification of nonlinear dynamical systems" (in en). Proceedings of the National Academy of Sciences 113 (15): 3932–3937. doi:10.1073/pnas.1517384113. ISSN 0027-8424. PMID 27035946. Bibcode2016PNAS..113.3932B. 
  4. Mangan, Niall M.; Brunton, Steven L.; Proctor, Joshua L.; Kutz, J. Nathan (2016-05-26). "Inferring biological networks by sparse identification of nonlinear dynamics". arXiv:1605.08368 [math.DS].