Sample entropy

From HandWiki

Sample entropy (SampEn) is a modification of approximate entropy (ApEn), used for assessing the complexity of physiological time-series signals, diagnosing diseased states.[1] SampEn has two advantages over ApEn: data length independence and a relatively trouble-free implementation. Also, there is a small computational difference: In ApEn, the comparison between the template vector (see below) and the rest of the vectors also includes comparison with itself. This guarantees that probabilities [math]\displaystyle{ C_{i}'^{m}(r) }[/math] are never zero. Consequently, it is always possible to take a logarithm of probabilities. Because template comparisons with itself lower ApEn values, the signals are interpreted to be more regular than they actually are. These self-matches are not included in SampEn. However, since SampEn makes direct use of the correlation integrals, it is not a real measure of information but an approximation. The foundations and differences with ApEn, as well as a step-by-step tutorial for its application is available at.[2] There is a multiscale version of SampEn as well, suggested by Costa and others.[3] SampEn can be used in biomedical and biomechanical research, for example to evaluate postural control.[4][5]


Like approximate entropy (ApEn), Sample entropy (SampEn) is a measure of complexity.[1] But it does not include self-similar patterns as ApEn does. For a given embedding dimension [math]\displaystyle{ m }[/math], tolerance [math]\displaystyle{ r }[/math] and number of data points [math]\displaystyle{ N }[/math], SampEn is the negative natural logarithm of the probability that if two sets of simultaneous data points of length [math]\displaystyle{ m }[/math] have distance [math]\displaystyle{ \lt r }[/math] then two sets of simultaneous data points of length [math]\displaystyle{ m+1 }[/math] also have distance [math]\displaystyle{ \lt r }[/math]. And we represent it by [math]\displaystyle{ SampEn(m,r,N) }[/math] (or by [math]\displaystyle{ SampEn(m,r,\tau,N) }[/math] including sampling time [math]\displaystyle{ \tau }[/math]).

Now assume we have a time-series data set of length [math]\displaystyle{ N = { \{ x_1 , x_2 , x_3 , . . . , x_N \} } }[/math] with a constant time interval [math]\displaystyle{ \tau }[/math]. We define a template vector of length [math]\displaystyle{ m }[/math], such that [math]\displaystyle{ X_m (i)={ \{ x_i , x_{i+1} , x_{i+2} , . . . , x_{i+m-1} \} } }[/math] and the distance function [math]\displaystyle{ d[X_m(i),X_m(j)] }[/math] (i≠j) is to be the Chebyshev distance (but it could be any distance function, including Euclidean distance). We define the sample entropy to be

[math]\displaystyle{ SampEn=-\ln {A \over B} }[/math]


[math]\displaystyle{ A }[/math] = number of template vector pairs having [math]\displaystyle{ d[X_{m+1}(i),X_{m+1}(j)] \lt r }[/math]

[math]\displaystyle{ B }[/math] = number of template vector pairs having [math]\displaystyle{ d[X_m(i),X_m(j)] \lt r }[/math]

It is clear from the definition that [math]\displaystyle{ A }[/math] will always have a value smaller or equal to [math]\displaystyle{ B }[/math]. Therefore, [math]\displaystyle{ SampEn(m,r,\tau) }[/math] will be always either be zero or positive value. A smaller value of [math]\displaystyle{ SampEn }[/math] also indicates more self-similarity in data set or less noise.

Generally we take the value of [math]\displaystyle{ m }[/math] to be [math]\displaystyle{ 2 }[/math] and the value of [math]\displaystyle{ r }[/math] to be [math]\displaystyle{ 0.2 \times std }[/math]. Where std stands for standard deviation which should be taken over a very large dataset. For instance, the r value of 6 ms is appropriate for sample entropy calculations of heart rate intervals, since this corresponds to [math]\displaystyle{ 0.2 \times std }[/math] for a very large population.

Multiscale SampEn

The definition mentioned above is a special case of multi scale sampEn with [math]\displaystyle{ \delta=1 }[/math], where [math]\displaystyle{ \delta }[/math] is called skipping parameter. In multiscale SampEn template vectors are defined with a certain interval between its elements, specified by the value of [math]\displaystyle{ \delta }[/math]. And modified template vector is defined as [math]\displaystyle{ X_{m,\delta}(i)={x_i,x_{i+\delta},x_{i+2\times\delta},...,x_{i+(m-1)\times\delta} } }[/math] and sampEn can be written as [math]\displaystyle{ SampEn \left ( m,r,\delta \right )=-\ln { A_\delta \over B_\delta } }[/math] And we calculate [math]\displaystyle{ A_\delta }[/math] and [math]\displaystyle{ B_\delta }[/math] like before.


Sample entropy can be implemented easily in many different programming languages. Below lies an example written in Python.

from itertools import combinations
from math import log

def construct_templates(timeseries_data:list, m:int=2):
    num_windows = len(timeseries_data) - m + 1
    return [timeseries_data[x:x+m] for x in range(0, num_windows)]

def get_matches(templates:list, r:float):
    return len(list(filter(lambda x: is_match(x[0], x[1], r), combinations(templates, 2))))

def is_match(template_1:list, template_2:list, r:float):
    return all([abs(x - y) < r for (x, y) in zip(template_1, template_2)])

def sample_entropy(timeseries_data:list, window_size:int, r:float):
    B = get_matches(construct_templates(timeseries_data, window_size), r)
    A = get_matches(construct_templates(timeseries_data, window_size+1), r)
    return -log(A/B)

An example written in other languages can be found:

See also


  1. 1.0 1.1 Richman, JS; Moorman, JR (2000). "Physiological time-series analysis using approximate entropy and sample entropy". American Journal of Physiology. Heart and Circulatory Physiology 278 (6): H2039–49. doi:10.1152/ajpheart.2000.278.6.H2039. PMID 10843903. 
  2. Delgado-Bonal, Alfonso; Marshak, Alexander (June 2019). "Approximate Entropy and Sample Entropy: A Comprehensive Tutorial" (in en). Entropy 21 (6): 541. doi:10.3390/e21060541. PMID 33267255. Bibcode2019Entrp..21..541D. 
  3. Costa, Madalena; Goldberger, Ary; Peng, C.-K. (2005). "Multiscale entropy analysis of biological signals". Physical Review E 71 (2): 021906. doi:10.1103/PhysRevE.71.021906. PMID 15783351. Bibcode2005PhRvE..71b1906C. 
  4. Błażkiewicz, Michalina; Kędziorek, Justyna; Hadamus, Anna (March 2021). "The Impact of Visual Input and Support Area Manipulation on Postural Control in Subjects after Osteoporotic Vertebral Fracture" (in en). Entropy 23 (3): 375. doi:10.3390/e23030375. PMID 33804770. Bibcode2021Entrp..23..375B. 
  5. Hadamus, Anna; Białoszewski, Dariusz; Błażkiewicz, Michalina; Kowalska, Aleksandra J.; Urbaniak, Edyta; Wydra, Kamil T.; Wiaderna, Karolina; Boratyński, Rafał et al. (February 2021). "Assessment of the Effectiveness of Rehabilitation after Total Knee Replacement Surgery Using Sample Entropy and Classical Measures of Body Balance" (in en). Entropy 23 (2): 164. doi:10.3390/e23020164. PMID 33573057. Bibcode2021Entrp..23..164H.