Sinusoidal model

From HandWiki
Short description: Sine wave used to approximate data


In statistics, signal processing, and time series analysis, a sinusoidal model is used to approximate a sequence Yi to a sine function:

[math]\displaystyle{ Y_i = C + \alpha\sin(\omega T_i + \phi) + E_i }[/math]

where C is constant defining a mean level, α is an amplitude for the sine, ω is the angular frequency, Ti is a time variable, φ is the phase-shift, and Ei is the error sequence.

This sinusoidal model can be fit using nonlinear least squares; to obtain a good fit, routines may require good starting values for the unknown parameters. Fitting a model with a single sinusoid is a special case of spectral density estimation and least-squares spectral analysis.

Good starting values

Good starting value for the mean

A good starting value for C can be obtained by calculating the mean of the data. If the data show a trend, i.e., the assumption of constant location is violated, one can replace C with a linear or quadratic least squares fit. That is, the model becomes

[math]\displaystyle{ Y_i = (B_0 + B_1T_i) + \alpha\sin(2\pi\omega T_i + \phi) + E_i }[/math]

or

[math]\displaystyle{ Y_i = (B_0 + B_1T_i+B_2T_i^2) + \alpha\sin(2\pi\omega T_i + \phi) + E_i }[/math]

Good starting value for frequency

The starting value for the frequency can be obtained from the dominant frequency in a periodogram. A complex demodulation phase plot can be used to refine this initial estimate for the frequency.[citation needed]

Good starting values for amplitude

The root mean square of the detrended data can be scaled by the square root of two to obtain an estimate of the sinusoid amplitude. A complex demodulation amplitude plot can be used to find a good starting value for the amplitude. In addition, this plot can indicate whether or not the amplitude is constant over the entire range of the data or if it varies. If the plot is essentially flat, i.e., zero slope, then it is reasonable to assume a constant amplitude in the non-linear model. However, if the slope varies over the range of the plot, one may need to adjust the model to be:

[math]\displaystyle{ Y_i = C + (B_0 + B_1 T_i)\sin(2\pi\omega T_i + \phi) + E_i }[/math]

That is, one may replace α with a function of time. A linear fit is specified in the model above, but this can be replaced with a more elaborate function if needed.

Model validation

As with any statistical model, the fit should be subjected to graphical and quantitative techniques of model validation. For example, a run sequence plot to check for significant shifts in location, scale, start-up effects and outliers. A lag plot can be used to verify the residuals are independent. The outliers also appear in the lag plot, and a histogram and normal probability plot to check for skewness or other non-normality in the residuals.

Extensions

A different method consists in transforming the non-linear regression to a linear regression thanks to a convenient integral equation. Then, there is no need for initial guess and no need for iterative process : the fitting is directly obtained.[1]

See also

References

  1. The method is explained in the chapter "Generalized sinusoidal regression" pp.54-63 in the paper: [1]

External links

 This article incorporates public domain material from the National Institute of Standards and Technology website https://www.nist.gov.