Approximate entropy

From HandWiki

In statistics, an approximate entropy (ApEn) is a technique used to quantify the amount of regularity and the unpredictability of fluctuations over time-series data.[1]

For example, there are two series of data:

series 1: (10,20,10,20,10,20,10,20,10,20,10,20...), which alternates 10 and 20.
series 2: (10,10,20,10,20,20,20,10,10,20,10,20,20...), which has either a value of 10 or 20, chosen randomly, each with probability 1/2.

Moment statistics, such as mean and variance, will not distinguish between these two series. Nor will rank order statistics distinguish between these series. Yet series 1 is "perfectly regular"; knowing one term has the value of 20 enables one to predict with certainty that the next term will have the value of 10. Series 2 is randomly valued; knowing one term has the value of 20 gives no insight into what value the next term will have.

Regularity was originally measured by exact regularity statistics, which has mainly centered on various entropy measures.[1] However, accurate entropy calculation requires vast amounts of data, and the results will be greatly influenced by system noise,[2] therefore it is not practical to apply these methods to experimental data. ApEn was developed by Steve M. Pincus to handle these limitations by modifying an exact regularity statistic, Kolmogorov–Sinai entropy. ApEn was initially developed to analyze medical data, such as heart rate,[1] and later spread its applications in finance,[3] physiology,[4] human factors engineering,[5] and climate sciences.[6]

The algorithm

A comprehensive step-by-step tutorial with an explanation of the theoretical foundations of Approximate Entropy is available.[7] The algorithm is:

Step 1
Form a time series of data [math]\displaystyle{ \ u(1), u(2),\ldots, u(N) }[/math]. These are N raw data values from measurements equally spaced in time.
Step 2
Fix m, a positive integer, and r, a positive real number. The value of m represents the length of each compared run of data (essentially a window), and r specifies a filtering level.
Step 3
Form a sequence of vectors [math]\displaystyle{ \mathbf{x}(1) }[/math],[math]\displaystyle{ \mathbf{x}(2),\ldots,\mathbf{x}(N-m+1) }[/math], in [math]\displaystyle{ \mathbf{R}^{m} }[/math], real [math]\displaystyle{ \ m }[/math]-dimensional space defined by [math]\displaystyle{ \mathbf{x}(i) = [u(i),u(i+1),\ldots,u(i+m-1)] }[/math].
Step 4
Template:Copy edit section Use the sequence [math]\displaystyle{ \mathbf{x}(1) }[/math],[math]\displaystyle{ \mathbf{x}(2),\ldots,\mathbf{x}(N-m+1) }[/math] to construct, for each i, [math]\displaystyle{ 1 \le i \le N-m+1 }[/math]
[math]\displaystyle{ C_i^m (r)=(\text{number of } x(j) \text { such that } d[x(i),x(j)] \leq r)/(N-m+1) }[/math]
in which [math]\displaystyle{ \ d[x, x^*] }[/math] is defined as
[math]\displaystyle{ d[x,x^* ]=\max_a |u(a)-u^*(a)| \, }[/math]
The [math]\displaystyle{ u(a) }[/math] are the m scalar components of [math]\displaystyle{ \mathbf{x} }[/math]. d represents the distance between the vectors [math]\displaystyle{ \mathbf{x}(i) }[/math] and [math]\displaystyle{ \mathbf{x}(j) }[/math], given by the maximum difference in their respective scalar components. Note that [math]\displaystyle{ j }[/math] takes on all values, so the match provided when [math]\displaystyle{ i=j }[/math] will be counted (the subsequence is matched against itself).
Step 5
[math]\displaystyle{ \Phi ^m (r) = (N-m+1)^{-1} \sum_{i=1}^{N-m+1}\log(C_i^m (r)) }[/math],
Step 6
Define approximate entropy [math]\displaystyle{ \ (\mathrm{ApEn}) }[/math] as
[math]\displaystyle{ \mathrm{ApEn} = \Phi ^m (r) - \Phi^{m+1} (r). }[/math]
where [math]\displaystyle{ \log }[/math] is the natural logarithm, for m and r fixed as in Step 2.
Parameter selection
typically choose [math]\displaystyle{ m=2 }[/math] or [math]\displaystyle{ m=3 }[/math], and r depends greatly on the application.

An implementation on Physionet,[8] which is based on Pincus [2] use [math]\displaystyle{ d[x(i),x(j)] \lt r }[/math] whereas the original article uses [math]\displaystyle{ d[x(i),x(j)] \le r }[/math] in Step 4. While a concern for artificially constructed examples, it is usually not a concern in practice.

The interpretation

The presence of repetitive patterns of fluctuation in a time series renders it more predictable than a time series in which such patterns are absent. ApEn reflects the likelihood that similar patterns of observations will not be followed by additional similar observations.[9] A time series containing many repetitive patterns has a relatively small ApEn; a less predictable process has a higher ApEn.

One example

Illustration of the Heart Rate Sequence

Suppose [math]\displaystyle{ \ N=51 }[/math], and the sequence consists of 51 samples of heart rate equally spaced in time:

[math]\displaystyle{ \ S_N = \{85, 80, 89, 85, 80, 89, \ldots\} }[/math]

(i.e., the sequence is periodic with a period of 3). Let's choose [math]\displaystyle{ \ m=2 }[/math] and [math]\displaystyle{ \ r=3 }[/math] (the values of [math]\displaystyle{ \ m }[/math] and [math]\displaystyle{ \ r }[/math] can be varied without affecting the result).

Form a sequence of vectors:

[math]\displaystyle{ \mathbf{ x}(1) = [u(1) \,u(2)]=[85\, 80] }[/math]
[math]\displaystyle{ \mathbf{ x}(2) = [u(2)\, u(3)]=[80\, 89] }[/math]
[math]\displaystyle{ \mathbf{ x}(3) = [u(3)\, u(4)]=[89\, 85] }[/math]
[math]\displaystyle{ \mathbf{ x}(4) = [u(4)\, u(5)]=[85\, 80] }[/math]

Distance is calculated as follows:

[math]\displaystyle{ \ d[\mathbf{x}(1), \mathbf{x}(1)]=\max_a |u(a)-u^*(a)|=0\lt r=3 }[/math]

Note [math]\displaystyle{ \ |u(2)-u(3) |\gt |u(1)-u(2) | }[/math], so

[math]\displaystyle{ \ d[\mathbf{x}(1), \mathbf{x}(2)]=\max_a |u(a)-u^*(a)|=|u(2)-u(3)|=9\gt r=3 }[/math]


[math]\displaystyle{ \ d[\mathbf{x}(1), \mathbf{x}(3)]=|u(2)-u(4) |=5\gt r }[/math]
[math]\displaystyle{ \ d[\mathbf{x}(1), \mathbf{x}(4)]=|u(1)-u(4) |=|u(2)-u(5) |=0\lt r }[/math]

Therefore, [math]\displaystyle{ \mathbf{ x}(j)\text{s} }[/math] such that [math]\displaystyle{ \ d[\mathbf{x}(1), \mathbf{x}(j)]\le r }[/math] include [math]\displaystyle{ \mathbf{x}(1), \mathbf{x}(4), \mathbf{x}(7),\ldots,\mathbf{x}(49) }[/math], and the total number is 17.

[math]\displaystyle{ \ C_1^2 (3)=\frac{17}{50} }[/math]
[math]\displaystyle{ \ C_2^2 (3)=\frac{17}{50} }[/math]
[math]\displaystyle{ \ C_3^2 (3)=\frac{16}{50} }[/math]
[math]\displaystyle{ \ C_4^2 (3)=\frac{17}{50}\ \ldots }[/math]

Note in Step 4, for [math]\displaystyle{ \mathbf{x}(i) }[/math], [math]\displaystyle{ \ 1 \le i \le N-m+1 }[/math]. So the [math]\displaystyle{ \mathbf{x}(j)\text{s} }[/math] such that [math]\displaystyle{ \ d[\mathbf{x}(3), \mathbf{x}(j)] \lt r }[/math] include [math]\displaystyle{ \mathbf{x}(3), \mathbf{x}(6), \mathbf{x}(9),\ldots,\mathbf{x}(48) }[/math], and the total number is 16.

[math]\displaystyle{ \Phi^2 (3)=(50)^{-1} \sum_{i=1}^{50}\log(C_i^2(3))\approx-1.0982 }[/math]

Then we repeat the above steps for m=3. First form a sequence of vectors:

[math]\displaystyle{ \mathbf{ x}(1) = [u(1)\, u(2)\, u(3)]=[85\, 80\, 89] }[/math]
[math]\displaystyle{ \mathbf{ x}(2) = [u(2)\, u(3)\, u(4)]=[80\, 89\, 85] }[/math]
[math]\displaystyle{ \mathbf{ x}(3) = [u(3)\, u(4)\, u(5)]=[89\, 85\, 80] }[/math]
[math]\displaystyle{ \mathbf{ x}(4) = [u(4)\, u(5)\, u(6)]=[85\, 80\, 89] }[/math]

By calculating distances between vector [math]\displaystyle{ \mathbf{x}(i), \mathbf{x}(j), 1 \le i \le 49 }[/math], we find the vectors satisfying the filtering level have the following characteristic:

[math]\displaystyle{ \ d[\mathbf{x}(i), \mathbf{x}(i+3)]=0\lt r }[/math]


[math]\displaystyle{ \ C_1^3 (3)=\frac{17}{49} }[/math]
[math]\displaystyle{ \ C_2^3 (3)=\frac{16}{49} }[/math]
[math]\displaystyle{ \ C_3^3 (3)=\frac{16}{49} }[/math]
[math]\displaystyle{ \ C_4^3 (3)=\frac{17}{49}\ \ldots }[/math]
[math]\displaystyle{ \Phi^3 (3)=(49)^{-1} \sum_{i=1}^{49}\log(C_i^3(3))\approx-1.0982 }[/math]


[math]\displaystyle{ \mathrm{ ApEn}=\Phi^2 (3)-\Phi^3 (3)\approx0.000010997 }[/math]

The value is very small, so it implies the sequence is regular and predictable, which is consistent with the observation.

Python implementation

import numpy as np

def ApEn(U, m, r) -> float:

    def _maxdist(x_i, x_j):
        return max([abs(ua - va) for ua, va in zip(x_i, x_j)])

    def _phi(m):
        x = [[U[j] for j in range(i, i + m - 1 + 1)] for i in range(N - m + 1)]
        C = [
            len([1 for x_j in x if _maxdist(x_i, x_j) <= r]) / (N - m + 1.0)
            for x_i in x
        return (N - m + 1.0) ** (-1) * sum(np.log(C))

    N = len(U)

    return _phi(m) - _phi(m + 1)

Usage example:

>>> U = np.array([85, 80, 89] * 17)
>>> print(ApEn(U, 2, 3))
>>> randU = np.random.choice([85, 80, 89], size=17*3)
>>> print(ApEn(randU, 2, 3))


The advantages of ApEn include:[2]

  • Lower computational demand. ApEn can be designed to work for small data samples (n < 50 points) and can be applied in real time.
  • Less effect from noise. If data is noisy, the ApEn measure can be compared to the noise level in the data to determine what quality of true information may be present in the data.


ApEn has been applied to classify EEG in psychiatric diseases, such as schizophrenia,[10] epilepsy,[11] and addiction.[12]


The ApEn algorithm counts each sequence as matching itself to avoid the occurrence of ln(0) in the calculations. This step might cause bias of ApEn and this bias causes ApEn to have two poor properties in practice:[13]

  1. ApEn is heavily dependent on the record length and is uniformly lower than expected for short records.
  2. It lacks relative consistency. That is, if ApEn of one data set is higher than that of another, it should, but does not, remain higher for all conditions tested.

See also


  1. 1.0 1.1 1.2 Pincus, S. M.; Gladstone, I. M.; Ehrenkranz, R. A. (1991). "A regularity statistic for medical data analysis". Journal of Clinical Monitoring and Computing 7 (4): 335–345. doi:10.1007/BF01619355. PMID 1744678. 
  2. 2.0 2.1 2.2 Pincus, S. M. (1991). "Approximate entropy as a measure of system complexity". Proceedings of the National Academy of Sciences 88 (6): 2297–2301. doi:10.1073/pnas.88.6.2297. PMID 11607165. Bibcode1991PNAS...88.2297P. 
  3. Pincus, S.M.; Kalman, E.K. (2004). "Irregularity, volatility, risk, and financial market time series". Proceedings of the National Academy of Sciences 101 (38): 13709–13714. doi:10.1073/pnas.0405168101. PMID 15358860. Bibcode2004PNAS..10113709P. 
  4. Pincus, S.M.; Goldberger, A.L. (1994). "Physiological time-series analysis: what does regularity quantify?". The American Journal of Physiology 266 (4): 1643–1656. doi:10.1152/ajpheart.1994.266.4.H1643. PMID 8184944. 
  5. McKinley, R.A.; McIntire, L.K.; Schmidt, R; Repperger, D.W.; Caldwell, J.A. (2011). "Evaluation of Eye Metrics as a Detector of Fatigue". Human Factors 53 (4): 403–414. doi:10.1177/0018720811411297. PMID 21901937. 
  6. Delgado-Bonal, Alfonso; Marshak, Alexander; Yang, Yuekui; Holdaway, Daniel (2020-01-22). "Analyzing changes in the complexity of climate in the last four decades using MERRA-2 radiation data" (in en). Scientific Reports 10 (1): 922. doi:10.1038/s41598-020-57917-8. ISSN 2045-2322. PMID 31969616. Bibcode2020NatSR..10..922D. 
  7. Delgado-Bonal, Alfonso; Marshak, Alexander (June 2019). "Approximate Entropy and Sample Entropy: A Comprehensive Tutorial" (in en). Entropy 21 (6): 541. doi:10.3390/e21060541. PMID 33267255. Bibcode2019Entrp..21..541D. 
  9. Ho, K. K.; Moody, G. B.; Peng, C.K.; Mietus, J. E.; Larson, M. G.; levy, D; Goldberger, A. L. (1997). "Predicting survival in heart failure case and control subjects by use of fully automated methods for deriving nonlinear and conventional indices of heart rate dynamics". Circulation 96 (3): 842–848. doi:10.1161/01.cir.96.3.842. PMID 9264491. 
  10. Sabeti, Malihe (2009). "Entropy and complexity measures for EEG signal classification of schizophrenic and control participants". Artificial Intelligence in Medicine 47 (3): 263–274. doi:10.1016/j.artmed.2009.03.003. PMID 19403281. 
  11. Yuan, Qi (2011). "Epileptic EEG classification based on extreme learning machine and nonlinear features". Epilepsy Research 96 (1–2): 29–38. doi:10.1016/j.eplepsyres.2011.04.013. PMID 21616643. 
  12. Yun, Kyongsik (2012). "Decreased cortical complexity in methamphetamine abusers". Psychiatry Research: Neuroimaging 201 (3): 226–32. doi:10.1016/j.pscychresns.2011.07.009. PMID 22445216. 
  13. Richman, J.S.; Moorman, J.R. (2000). "Physiological time-series analysis using approximate entropy and sample entropy". American Journal of Physiology. Heart and Circulatory Physiology 278 (6): 2039–2049. doi:10.1152/ajpheart.2000.278.6.H2039. PMID 10843903.