# Sample entropy

Sample entropy (SampEn) is a modification of approximate entropy (ApEn), used for assessing the complexity of physiological time-series signals, diagnosing diseased states.[1] SampEn has two advantages over ApEn: data length independence and a relatively trouble-free implementation. Also, there is a small computational difference: In ApEn, the comparison between the template vector (see below) and the rest of the vectors also includes comparison with itself. This guarantees that probabilities $\displaystyle{ C_{i}'^{m}(r) }$ are never zero. Consequently, it is always possible to take a logarithm of probabilities. Because template comparisons with itself lower ApEn values, the signals are interpreted to be more regular than they actually are. These self-matches are not included in SampEn. However, since SampEn makes direct use of the correlation integrals, it is not a real measure of information but an approximation. The foundations and differences with ApEn, as well as a step-by-step tutorial for its application is available at.[2] There is a multiscale version of SampEn as well, suggested by Costa and others.[3] SampEn can be used in biomedical and biomechanical research, for example to evaluate postural control.[4][5]

## Definition

Like approximate entropy (ApEn), Sample entropy (SampEn) is a measure of complexity.[1] But it does not include self-similar patterns as ApEn does. For a given embedding dimension $\displaystyle{ m }$, tolerance $\displaystyle{ r }$ and number of data points $\displaystyle{ N }$, SampEn is the negative natural logarithm of the probability that if two sets of simultaneous data points of length $\displaystyle{ m }$ have distance $\displaystyle{ \lt r }$ then two sets of simultaneous data points of length $\displaystyle{ m+1 }$ also have distance $\displaystyle{ \lt r }$. And we represent it by $\displaystyle{ SampEn(m,r,N) }$ (or by $\displaystyle{ SampEn(m,r,\tau,N) }$ including sampling time $\displaystyle{ \tau }$).

Now assume we have a time-series data set of length $\displaystyle{ N = { \{ x_1 , x_2 , x_3 , . . . , x_N \} } }$ with a constant time interval $\displaystyle{ \tau }$. We define a template vector of length $\displaystyle{ m }$, such that $\displaystyle{ X_m (i)={ \{ x_i , x_{i+1} , x_{i+2} , . . . , x_{i+m-1} \} } }$ and the distance function $\displaystyle{ d[X_m(i),X_m(j)] }$ (i≠j) is to be the Chebyshev distance (but it could be any distance function, including Euclidean distance). We define the sample entropy to be

$\displaystyle{ SampEn=-\ln {A \over B} }$

Where

$\displaystyle{ A }$ = number of template vector pairs having $\displaystyle{ d[X_{m+1}(i),X_{m+1}(j)] \lt r }$

$\displaystyle{ B }$ = number of template vector pairs having $\displaystyle{ d[X_m(i),X_m(j)] \lt r }$

It is clear from the definition that $\displaystyle{ A }$ will always have a value smaller or equal to $\displaystyle{ B }$. Therefore, $\displaystyle{ SampEn(m,r,\tau) }$ will be always either be zero or positive value. A smaller value of $\displaystyle{ SampEn }$ also indicates more self-similarity in data set or less noise.

Generally we take the value of $\displaystyle{ m }$ to be $\displaystyle{ 2 }$ and the value of $\displaystyle{ r }$ to be $\displaystyle{ 0.2 \times std }$. Where std stands for standard deviation which should be taken over a very large dataset. For instance, the r value of 6 ms is appropriate for sample entropy calculations of heart rate intervals, since this corresponds to $\displaystyle{ 0.2 \times std }$ for a very large population.

## Multiscale SampEn

The definition mentioned above is a special case of multi scale sampEn with $\displaystyle{ \delta=1 }$, where $\displaystyle{ \delta }$ is called skipping parameter. In multiscale SampEn template vectors are defined with a certain interval between its elements, specified by the value of $\displaystyle{ \delta }$. And modified template vector is defined as $\displaystyle{ X_{m,\delta}(i)={x_i,x_{i+\delta},x_{i+2\times\delta},...,x_{i+(m-1)\times\delta} } }$ and sampEn can be written as $\displaystyle{ SampEn \left ( m,r,\delta \right )=-\ln { A_\delta \over B_\delta } }$ And we calculate $\displaystyle{ A_\delta }$ and $\displaystyle{ B_\delta }$ like before.

## Implementation

Sample entropy can be implemented easily in many different programming languages. Below lies an example written in Python.

from itertools import combinations
from math import log

def construct_templates(timeseries_data:list, m:int=2):
num_windows = len(timeseries_data) - m + 1
return [timeseries_data[x:x+m] for x in range(0, num_windows)]

def get_matches(templates:list, r:float):
return len(list(filter(lambda x: is_match(x[0], x[1], r), combinations(templates, 2))))

def is_match(template_1:list, template_2:list, r:float):
return all([abs(x - y) < r for (x, y) in zip(template_1, template_2)])

def sample_entropy(timeseries_data:list, window_size:int, r:float):
B = get_matches(construct_templates(timeseries_data, window_size), r)
A = get_matches(construct_templates(timeseries_data, window_size+1), r)
return -log(A/B)

An example written in other languages can be found: