From HandWiki
Short description: Property of uniformly space-filling movement

In mathematics, ergodicity expresses the idea that a point of a moving system, either a dynamical system or a stochastic process, will eventually visit all parts of the space that the system moves in, in a uniform and random sense. This implies that the average behavior of the system can be deduced from the trajectory of a "typical" point. Equivalently, a sufficiently large collection of random samples from a process can represent the average statistical properties of the entire process. Ergodicity is a property of the system; it is a statement that the system cannot be reduced or factored into smaller components. Ergodic theory is the study of systems possessing ergodicity.

Ergodic systems occur in a broad range of systems in physics and in geometry. This can be roughly understood to be due to a common phenomenon: the motion of particles, that is, geodesics on a hyperbolic manifold are divergent; when that manifold is compact, that is, of finite size, those orbits return to the same general area, eventually filling the entire space.

Ergodic systems capture the common-sense, every-day notions of randomness, such that smoke might come to fill all of a smoke-filled room, or that a block of metal might eventually come to have the same temperature throughout, or that flips of a fair coin may come up heads and tails half the time. A stronger concept than ergodicity is that of mixing, which aims to mathematically describe the common-sense notions of mixing, such as mixing drinks or mixing cooking ingredients.

The proper mathematical formulation of ergodicity is founded on the formal definitions of measure theory and dynamical systems, and rather specifically on the notion of a measure-preserving dynamical system. The origins of ergodicity lie in statistical physics, where Ludwig Boltzmann formulated the ergodic hypothesis.

Informal explanation

Ergodicity occurs in broad settings in physics and mathematics. All of these settings are unified by a common mathematical description, that of the measure-preserving dynamical system. An informal description of this, and a definition of ergodicity with respect to it, is given immediately below. This is followed by a description of ergodicity in stochastic processes. They are one and the same, despite using dramatically different notation and language.

Measure-preserving dynamical systems

The mathematical definition of ergodicity aims to capture ordinary every-day ideas about randomness. This includes ideas about systems that move in such a way as to (eventually) fill up all of space, such as diffusion and Brownian motion, as well as common-sense notions of mixing, such as mixing paints, drinks, cooking ingredients, industrial process mixing, smoke in a smoke-filled room, the dust in Saturn's rings and so on. To provide a solid mathematical footing, descriptions of ergodic systems begin with the definition of a measure-preserving dynamical system. This is written as [math]\displaystyle{ (X, \mathcal{A}, \mu, T). }[/math]

The set [math]\displaystyle{ X }[/math] is understood to be the total space to be filled: the mixing bowl, the smoke-filled room, etc. The measure [math]\displaystyle{ \mu }[/math] is understood to define the natural volume of the space [math]\displaystyle{ X }[/math] and of its subspaces. The collection of subspaces is denoted by [math]\displaystyle{ \mathcal{A} }[/math], and the size of any given subset [math]\displaystyle{ A\subset X }[/math] is [math]\displaystyle{ \mu(A) }[/math]; the size is its volume. Naively, one could imagine [math]\displaystyle{ \mathcal{A} }[/math] to be the power set of [math]\displaystyle{ X }[/math]; this doesn't quite work, as not all subsets of a space have a volume (famously, the Banach-Tarski paradox). Thus, conventionally, [math]\displaystyle{ \mathcal{A} }[/math] consists of the measurable subsets—the subsets that do have a volume. It is always taken to be a Borel set—the collection of subsets that can be constructed by taking intersections, unions and set complements of open sets; these can always be taken to be measurable.

The time evolution of the system is described by a map [math]\displaystyle{ T:X\to X }[/math]. Given some subset [math]\displaystyle{ A\subset X }[/math], its map [math]\displaystyle{ T(A) }[/math] will in general be a deformed version of [math]\displaystyle{ A }[/math] – it is squashed or stretched, folded or cut into pieces. Mathematical examples include the baker's map and the horseshoe map, both inspired by bread-making. The set [math]\displaystyle{ T(A) }[/math] must have the same volume as [math]\displaystyle{ A }[/math]; the squashing/stretching does not alter the volume of the space, only its distribution. Such a system is "measure-preserving" (area-preserving, volume-preserving).

A formal difficulty arises when one tries to reconcile the volume of sets with the need to preserve their size under a map. The problem arises because, in general, several different points in the domain of a function can map to the same point in its range; that is, there may be [math]\displaystyle{ x \ne y }[/math] with [math]\displaystyle{ T(x) = T(y) }[/math]. Worse, a single point [math]\displaystyle{ x \in X }[/math] has no size. These difficulties can be avoided by working with the inverse map [math]\displaystyle{ T^{-1}: \mathcal{A}\to\mathcal{A} }[/math]; it will map any given subset [math]\displaystyle{ A \subset X }[/math] to the parts that were assembled to make it: these parts are [math]\displaystyle{ T^{-1}(A)\in\mathcal{A} }[/math]. It has the important property of not losing track of where things came from. More strongly, it has the important property that any (measure-preserving) map [math]\displaystyle{ \mathcal{A}\to\mathcal{A} }[/math] is the inverse of some map [math]\displaystyle{ X\to X }[/math]. The proper definition of a volume-preserving map is one for which [math]\displaystyle{ \mu(A) = \mu\mathord\left(T^{-1}(A)\right) }[/math] because [math]\displaystyle{ T^{-1}(A) }[/math] describes all the pieces-parts that [math]\displaystyle{ A }[/math] came from.

One is now interested in studying the time evolution of the system. If a set [math]\displaystyle{ A\in\mathcal{A} }[/math] eventually comes to fill all of [math]\displaystyle{ X }[/math] over a long period of time (that is, if [math]\displaystyle{ T^n(A) }[/math] approaches all of [math]\displaystyle{ X }[/math] for large [math]\displaystyle{ n }[/math]), the system is said to be ergodic. If every set [math]\displaystyle{ A }[/math] behaves in this way, the system is a conservative system, placed in contrast to a dissipative system, where some subsets [math]\displaystyle{ A }[/math] wander away, never to be returned to. An example would be water running downhill: once it's run down, it will never come back up again. The lake that forms at the bottom of this river can, however, become well-mixed. The ergodic decomposition theorem states that every ergodic system can be split into two parts: the conservative part, and the dissipative part.

Mixing is a stronger statement than ergodicity. Mixing asks for this ergodic property to hold between any two sets [math]\displaystyle{ A, B }[/math], and not just between some set [math]\displaystyle{ A }[/math] and [math]\displaystyle{ X }[/math]. That is, given any two sets [math]\displaystyle{ A, B\in\mathcal{A} }[/math], a system is said to be (topologically) mixing if there is an integer [math]\displaystyle{ N }[/math] such that, for all [math]\displaystyle{ A, B }[/math] and [math]\displaystyle{ n\gt N }[/math], one has that [math]\displaystyle{ T^n(A) \cap B \ne \varnothing }[/math]. Here, [math]\displaystyle{ \cap }[/math] denotes set intersection and [math]\displaystyle{ \varnothing }[/math] is the empty set. Other notions of mixing include strong and weak mixing, which describe the notion that the mixed substances intermingle everywhere, in equal proportion. This can be non-trivial, as practical experience of trying to mix sticky, gooey substances shows.


The above discussion appeals to a physical sense of a volume. The volume does not have to literally be some portion of 3D space; it can be some abstract volume. This is generally the case in statistical systems, where the volume (the measure) is given by the probability. The total volume corresponds to probability one. This correspondence works because the axioms of probability theory are identical to those of measure theory; these are the Kolmogorov axioms.

The idea of a volume can be very abstract. Consider, for example, the set of all possible coin-flips: the set of infinite sequences of heads and tails. Assigning the volume of 1 to this space, it is clear that half of all such sequences start with heads, and half start with tails. One can slice up this volume in other ways: one can say "I don't care about the first [math]\displaystyle{ n - 1 }[/math] coin-flips; but I want the [math]\displaystyle{ n }[/math]'th of them to be heads, and then I don't care about what comes after that". This can be written as the set [math]\displaystyle{ (*, \cdots, *, h, *, \cdots) }[/math] where [math]\displaystyle{ * }[/math] is "don't care" and [math]\displaystyle{ h }[/math] is "heads". The volume of this space is again (obviously!) one-half.

The above is enough to build up a measure-preserving dynamical system, in its entirety. The sets of [math]\displaystyle{ h }[/math] or [math]\displaystyle{ t }[/math] occurring in the [math]\displaystyle{ n }[/math]'th place are called cylinder sets. The set of all possible intersections, unions and complements of the cylinder sets then form the Borel set [math]\displaystyle{ \mathcal{A} }[/math] defined above. In formal terms, the cylinder sets form the base for a topology on the space [math]\displaystyle{ X }[/math] of all possible infinite-length coin-flips. The measure [math]\displaystyle{ \mu }[/math] has all of the common-sense properties one might hope for: the measure of a cylinder set with [math]\displaystyle{ h }[/math] in the [math]\displaystyle{ m }[/math]'th position, and [math]\displaystyle{ t }[/math] in the [math]\displaystyle{ k }[/math]'th position is obviously 1/4, and so on. These common-sense properties persist for set-complement and set-union: everything except for [math]\displaystyle{ h }[/math] and [math]\displaystyle{ t }[/math] in locations [math]\displaystyle{ m }[/math] and [math]\displaystyle{ k }[/math] obviously has the volume of 3/4. All together, these form the axioms of a sigma-additive measure; measure-preserving dynamical systems always use sigma-additive measures. For coin flips, this measure is called the Bernoulli measure.

For the coin-flip process, the time-evolution operator [math]\displaystyle{ T }[/math] is the shift operator that says "throw away the first coin-flip, and keep the rest". Formally, if [math]\displaystyle{ (x_1, x_2, \cdots) }[/math] is a sequence of coin-flips, then [math]\displaystyle{ T(x_1, x_2, \cdots) = (x_2, x_3, \cdots) }[/math]. The measure is obviously shift-invariant: as long as we are talking about some set [math]\displaystyle{ A\in\mathcal{A} }[/math] where the first coin-flip [math]\displaystyle{ x_1 = * }[/math] is the "don't care" value, then the volume [math]\displaystyle{ \mu(A) }[/math] does not change: [math]\displaystyle{ \mu(A) = \mu(T(A)) }[/math]. In order to avoid talking about the first coin-flip, it is easier to define [math]\displaystyle{ T^{-1} }[/math] as inserting a "don't care" value into the first position: [math]\displaystyle{ T^{-1}(x_1, x_2, \cdots) = (*, x_1, x_2, \cdots) }[/math]. With this definition, one obviously has that [math]\displaystyle{ \mu\mathord\left(T^{-1}(A)\right) = \mu(A) }[/math] with no constraints on [math]\displaystyle{ A }[/math]. This is again an example of why [math]\displaystyle{ T^{-1} }[/math] is used in the formal definitions.

The above development takes a random process, the Bernoulli process, and converts it to a measure-preserving dynamical system [math]\displaystyle{ (X, \mathcal{A}, \mu, T). }[/math] The same conversion (equivalence, isomorphism) can be applied to any stochastic process. Thus, an informal definition of ergodicity is that a sequence is ergodic if it visits all of [math]\displaystyle{ X }[/math]; such sequences are "typical" for the process. Another is that its statistical properties can be deduced from a single, sufficiently long, random sample of the process (thus uniformly sampling all of [math]\displaystyle{ X }[/math]), or that any collection of random samples from a process must represent the average statistical properties of the entire process (that is, samples drawn uniformly from [math]\displaystyle{ X }[/math] are representative of [math]\displaystyle{ X }[/math] as a whole.) In the present example, a sequence of coin flips, where half are heads, and half are tails, is a "typical" sequence.

There are several important points to be made about the Bernoulli process. If one writes 0 for tails and 1 for heads, one gets the set of all infinite strings of binary digits. These correspond to the base-two expansion of real numbers. Explicitly, given a sequence [math]\displaystyle{ (x_1, x_2, \cdots) }[/math], the corresponding real number is

[math]\displaystyle{ y=\sum_{n=1}^\infty \frac{x_n}{2^n} }[/math]

The statement that the Bernoulli process is ergodic is equivalent to the statement that the real numbers are uniformly distributed. The set of all such strings can be written in a variety of ways: [math]\displaystyle{ \{h, t\}^\infty = \{h, t\}^\omega = \{0, 1\}^\omega = 2^\omega = 2^\mathbb{N}. }[/math] This set is the Cantor set, sometimes called the Cantor space to avoid confusion with the Cantor function

[math]\displaystyle{ C(x) = \sum_{n=1}^\infty \frac{x_n}{3^n} }[/math]

In the end, these are all "the same thing".

The Cantor set plays key roles in many branches of mathematics. In recreational mathematics, it underpins the period-doubling fractals; in analysis, it appears in a vast variety of theorems. A key one for stochastic processes is the Wold decomposition, which states that any stationary process can be decomposed into a pair of uncorrelated processes, one deterministic, and the other being a moving average process.

The Ornstein isomorphism theorem states that every stationary stochastic process is equivalent to a Bernoulli scheme (a Bernoulli process with an N-sided (and possibly unfair) gaming die). Other results include that every non-dissipative ergodic system is equivalent to the Markov odometer, sometimes called an "adding machine" because it looks like elementary-school addition, that is, taking a base-N digit sequence, adding one, and propagating the carry bits. The proof of equivalence is very abstract; understanding the result is not: by adding one at each time step, every possible state of the odometer is visited, until it rolls over, and starts again. Likewise, ergodic systems visit each state, uniformly, moving on to the next, until they have all been visited.

Systems that generate (infinite) sequences of N letters are studied by means of symbolic dynamics. Important special cases include subshifts of finite type and sofic systems.

History and etymology

The term ergodic is commonly thought to derive from the Greek words ἔργον (ergon: "work") and ὁδός (hodos: "path", "way"), as chosen by Ludwig Boltzmann while he was working on a problem in statistical mechanics.[1] At the same time it is also claimed to be a derivation of ergomonode, coined by Boltzmann in a relatively obscure paper from 1884. The etymology appears to be contested in other ways as well.[2]

The idea of ergodicity was born in the field of thermodynamics, where it was necessary to relate the individual states of gas molecules to the temperature of a gas as a whole and its time evolution thereof. In order to do this, it was necessary to state what exactly it means for gases to mix well together, so that thermodynamic equilibrium could be defined with mathematical rigor. Once the theory was well developed in physics, it was rapidly formalized and extended, so that ergodic theory has long been an independent area of mathematics in itself. As part of that progression, more than one slightly different definition of ergodicity and multitudes of interpretations of the concept in different fields coexist.

For example, in classical physics the term implies that a system satisfies the ergodic hypothesis of thermodynamics,[3] the relevant state space being position and momentum space.

In dynamical systems theory the state space is usually taken to be a more general phase space. On the other hand in coding theory the state space is often discrete in both time and state, with less concomitant structure. In all those fields the ideas of time average and ensemble average can also carry extra baggage as well—as is the case with the many possible thermodynamically relevant partition functions used to define ensemble averages in physics, back again. As such the measure theoretic formalization of the concept also serves as a unifying discipline. In 1913 Michel Plancherel proved the strict impossibility for ergodicity for a purely mechanical system.


A review of ergodicity in physics, and in geometry follows. In all cases, the notion of ergodicity is exactly the same as that for dynamical systems; there is no difference, except for outlook, notation, style of thinking and the journals where results are published.

In physics

Physical systems can be split into three categories: classical mechanics, which describes machines with a finite number of moving parts, quantum mechanics, which describes the structure of atoms, and statistical mechanics, which describes gases, liquids, solids; this includes condensed matter physics.

The case of classical mechanics is discussed in the next section, on ergodicity in geometry. As to quantum mechanics, there is no universal quantum definition of ergodocity or even chaos (see quantum chaos).[4] However, there is a quantum ergodicity theorem stating that the expectation value of an operator converges to the corresponding microcanonical classical average in the semiclassical limit [math]\displaystyle{ \hbar \rightarrow 0 }[/math]. Nevertheless, the theorem does not imply that all eigenstates of the Hamiltionian whose classical counterpart is chaotic are features and random. For example, the quantum ergodicity theorem do not exclude the existence of non-ergodic states such as quantum scars. In addition to the conventional scarring,[5][6][7][8] there are two other types of quantum scarring, which further illustrate the weak-ergodicity breaking in quantum chaotic systems: perturbation-induced[9][10][11][12][13] and many-body quantum scars.[14]

This section reviews ergodicity in statistical mechanics. The above abstract definition of a volume is required as the appropriate setting for definitions of ergodicity in physics. Consider a container of liquid, or gas, or plasma, or other collection of atoms or particles. Each and every particle [math]\displaystyle{ x_i }[/math] has a 3D position, and a 3D velocity, and is thus described by six numbers: a point in six-dimensional space [math]\displaystyle{ \mathbb{R}^6. }[/math] If there are [math]\displaystyle{ N }[/math] of these particles in the system, a complete description requires [math]\displaystyle{ 6N }[/math] numbers. Any one system is just a single point in [math]\displaystyle{ \mathbb{R}^{6N}. }[/math] The physical system is not all of [math]\displaystyle{ \mathbb{R}^{6N} }[/math], of course; if it's a box of width, height and length [math]\displaystyle{ W\times H\times L }[/math] then a point is in [math]\displaystyle{ \left(W \times H \times L \times \mathbb{R}^3\right)^N. }[/math] Nor can velocities be infinite: they are scaled by some probability measure, for example the Boltzmann–Gibbs measure for a gas. None-the-less, for [math]\displaystyle{ N }[/math] close to Avogadro's number, this is obviously a very large space. This space is called the canonical ensemble.

A physical system is said to be ergodic if any representative point of the system eventually comes to visit the entire volume of the system. For the above example, this implies that any given atom not only visits every part of the box [math]\displaystyle{ W \times H \times L }[/math] with uniform probability, but it does so with every possible velocity, with probability given by the Boltzmann distribution for that velocity (so, uniform with respect to that measure). The ergodic hypothesis states that physical systems actually are ergodic. Multiple time scales are at work: gasses and liquids appear to be ergodic over short time scales. Ergodicity in a solid can be viewed in terms of the vibrational modes or phonons, as obviously the atoms in a solid do not exchange locations. Glasses present a challenge to the ergodic hypothesis; time scales are assumed to be in the millions of years, but results are contentious. Spin glasses present particular difficulties.

Formal mathematical proofs of ergodicity in statistical physics are hard to come by; most high-dimensional many-body systems are assumed to be ergodic, without mathematical proof. Exceptions include the dynamical billiards, which model billiard ball-type collisions of atoms in an ideal gas or plasma. The first hard-sphere ergodicity theorem was for Sinai's billiards, which considers two balls, one of them taken as being stationary, at the origin. As the second ball collides, it moves away; applying periodic boundary conditions, it then returns to collide again. By appeal to homogeneity, this return of the "second" ball can instead be taken to be "just some other atom" that has come into range, and is moving to collide with the atom at the origin (which can be taken to be just "any other atom".) This is one of the few formal proofs that exist; there are no equivalent statements e.g. for atoms in a liquid, interacting via van der Waals forces, even if it would be common sense to believe that such systems are ergodic (and mixing). More precise physical arguments can be made, though.

In geometry

Ergodicity is a wide-spread phenomenon in the study of Riemannian manifolds. A quick sequence of examples, from simple to complicated, illustrates this point.

The geodesic flow of a flat torus following any irrational direction is ergodic; informally this means that when drawing a straight line in a square starting at any point, and with an irrational angle with respect to the sides, if every time one meets a side one starts over on the opposite side with the same angle, the line will eventually meet every subset of positive measure. More generally on any flat surface there are many ergodic directions for the geodesic flow.

There are similar results for negatively curved compact Riemann surfaces; note that in this case the definition of geodesic flow is much more involved since there is no notion of constant direction on a non-flat surface. More generally the geodesic flow on a negatively curved compact Riemannian manifolds is ergodic, in fact it satisfies the stronger property of being an Anosov flow.

In finance

Models used in finance and investment assume ergodicity, explicitly or implicitly. The ergodic assumption is prevalent in modern portfolio theory, discounted cash flow (DCF) models, and aggregate indicator models that infuse macroeconomics, among others.

The situations modeled by these theories can be useful. But often they are only useful during much, but not all, of any particular time period under study. They can therefore miss some of the largest deviations from the standard model, such as financial crises, debt crises and systemic risk in the banking system that occur only infrequently.

Nassim Nicholas Taleb has argued that a very important part of empirical reality in finance and investment is non-ergodic. An even statistical distribution of probabilities, where the system returns to every possible state an infinite number of times, is simply not the case we observe in situations where “absorbing states" are reached, a state where ruin is seen. The death of an individual, or total loss of everything, or the devolution or dismemberment of a nation state and the legal regime that accompanied it, are all absorbing states. Thus, in finance, path dependence matters. A path where an individual, firm or country hits a "stop"—an absorbing barrier, "anything that prevents people with skin in the game from emerging from it, and to which the system will invariably tend. Let us call these situations ruin, as the entity cannot emerge from the condition. The central problem is that if there is a possibility of ruin, cost benefit analyses are no longer possible."[15]—will be non-ergodic. All traditional models based on standard probabilistic statistics break down in these extreme situations.

In Social Sciences

In the social sciences the ergodicity concept surfaces[clarification needed] as group level data often gives a poor indication of individual level variation,[16][17] as individual standard deviations (SDs) tend to be almost eight times larger than group level SDs of the same people.[17] Subsequently a third of the individual observations falls outside a 99.9% confidence interval of group level data.

Definition for discrete-time systems

Formal definition

Let [math]\displaystyle{ (X, \mathcal B) }[/math] be a measurable space. If [math]\displaystyle{ T }[/math] is a measurable function from [math]\displaystyle{ X }[/math] to itself and [math]\displaystyle{ \mu }[/math] a probability measure on [math]\displaystyle{ (X, \mathcal B) }[/math] then we say that [math]\displaystyle{ T }[/math] is [math]\displaystyle{ \mu }[/math]-ergodic or [math]\displaystyle{ \mu }[/math] is an ergodic measure for [math]\displaystyle{ T }[/math] if [math]\displaystyle{ T }[/math] preserves [math]\displaystyle{ \mu }[/math] and the following condition holds:

For any [math]\displaystyle{ A \in \mathcal B }[/math] such that [math]\displaystyle{ T^{-1}(A) = A }[/math] either [math]\displaystyle{ \mu(A) = 0 }[/math] or [math]\displaystyle{ \mu(A) = 1 }[/math].

In other words there are no [math]\displaystyle{ T }[/math]-invariant subsets up to measure 0 (with respect to [math]\displaystyle{ \mu }[/math]). Recall that [math]\displaystyle{ T }[/math] preserving [math]\displaystyle{ \mu }[/math] (or [math]\displaystyle{ \mu }[/math] being [math]\displaystyle{ T }[/math]-invariant) means that [math]\displaystyle{ \mu\mathord\left(T^{-1}(A)\right) = \mu(A) }[/math] for all [math]\displaystyle{ A \in \mathcal B }[/math] (see also measure-preserving dynamical system).

Note that some authors (eg, "An introduction to infinite ergodic theory" by Anderson, p. 21) relax the requirement that [math]\displaystyle{ \mu }[/math] is [math]\displaystyle{ T }[/math]-invariant to the requirement that pullbacks of measure-zero sets are measure-zero, i.e., the pushforward measure [math]\displaystyle{ T_*\mu }[/math] is singular with respect to [math]\displaystyle{ \mu }[/math].


The simplest example is when [math]\displaystyle{ X }[/math] is a finite set and [math]\displaystyle{ \mu }[/math] the counting measure. Then a self-map of [math]\displaystyle{ X }[/math] preserves [math]\displaystyle{ \mu }[/math] if and only if it is a bijection, and it is ergodic if and only if [math]\displaystyle{ T }[/math] has only one orbit (that is, for every [math]\displaystyle{ x, y \in X }[/math] there exists [math]\displaystyle{ k \in \mathbb N }[/math] such that [math]\displaystyle{ y = T^k(x) }[/math]). For example, if [math]\displaystyle{ X = \{1, 2, \ldots, n\} }[/math] then the cycle [math]\displaystyle{ (1\, 2\, \cdots \, n) }[/math] is ergodic, but the permutation [math]\displaystyle{ (1\, 2)(3\, 4\, \cdots\, n) }[/math] is not (it has the two invariant subsets [math]\displaystyle{ \{1, 2\} }[/math] and [math]\displaystyle{ \{3, 4, \ldots, n\} }[/math]).

Equivalent formulations

The definition given above admits the following immediate reformulations:

  • for every [math]\displaystyle{ A \in \mathcal B }[/math] with [math]\displaystyle{ \mu\mathord\left(T^{-1}(A) \bigtriangleup A\right) = 0 }[/math] we have [math]\displaystyle{ \mu(A) = 0 }[/math] or [math]\displaystyle{ \mu(A) = 1\, }[/math] (where [math]\displaystyle{ \bigtriangleup }[/math] denotes the symmetric difference);
  • for every [math]\displaystyle{ A \in \mathcal B }[/math] with positive measure we have [math]\displaystyle{ \mu\mathord\left(\bigcup_{n=1}^\infty T^{-n}(A)\right) = 1 }[/math];
  • for every two sets [math]\displaystyle{ A, B \in \mathcal B }[/math] of positive measure, there exists [math]\displaystyle{ n \gt 0 }[/math] such that [math]\displaystyle{ \mu\mathord\left(\left(T^{-n}(A)\right) \cap B\right) \gt 0 }[/math];
  • Every measurable function [math]\displaystyle{ f: X\to\mathbb{R} }[/math] with [math]\displaystyle{ f \circ T = f }[/math] is constant on a subset of full measure.

Importantly for applications, the condition in the last characterisation can be restricted to square-integrable functions only:

  • If [math]\displaystyle{ f \in L^2(X, \mu) }[/math] and [math]\displaystyle{ f \circ T = f }[/math] then [math]\displaystyle{ f }[/math] is constant almost everywhere.

Further examples

Bernoulli shifts and subshifts

Let [math]\displaystyle{ S }[/math] be a finite set and [math]\displaystyle{ X = S^\mathbb{Z} }[/math] with [math]\displaystyle{ \mu }[/math] the product measure (each factor [math]\displaystyle{ S }[/math] being endowed with its counting measure). Then the shift operator [math]\displaystyle{ T }[/math] defined by [math]\displaystyle{ T\left((s_k)_{k \in \mathbb Z})\right) = (s_{k+1})_{k \in \mathbb Z} }[/math] is [math]\displaystyle{ \mu }[/math]-ergodic.[18]

There are many more ergodic measures for the shift map [math]\displaystyle{ T }[/math] on [math]\displaystyle{ X }[/math]. Periodic sequences give finitely supported measures. More interestingly, there are infinitely-supported ones which are subshifts of finite type.

Irrational rotations

Let [math]\displaystyle{ X }[/math] be the unit circle [math]\displaystyle{ \{z \in \mathbb C,\, |z| = 1\} }[/math], with its Lebesgue measure [math]\displaystyle{ \mu }[/math]. For any [math]\displaystyle{ \theta \in \mathbb R }[/math] the rotation of [math]\displaystyle{ X }[/math] of angle [math]\displaystyle{ \theta }[/math] is given by [math]\displaystyle{ T_\theta(z) = e^{2i\pi\theta}z }[/math]. If [math]\displaystyle{ \theta \in \mathbb Q }[/math] then [math]\displaystyle{ T_\theta }[/math] is not ergodic for the Lebesgue measure as it has infinitely many finite orbits. On the other hand, if [math]\displaystyle{ \theta }[/math] is irrational then [math]\displaystyle{ T_\theta }[/math] is ergodic.[19]

Arnold's cat map

Let [math]\displaystyle{ X = \mathbb{R}^2/\mathbb{Z}^2 }[/math] be the 2-torus. Then any element [math]\displaystyle{ g \in \mathrm{SL}_2(\mathbb Z) }[/math] defines a self-map of [math]\displaystyle{ X }[/math] since [math]\displaystyle{ g\left(\mathbb{Z}^2\right) = \mathbb{Z}^2 }[/math]. When [math]\displaystyle{ g = \left(\begin{array}{cc} 2 & 1 \\ 1 & 1 \end{array}\right) }[/math] one obtains the so-called Arnold's cat map, which is ergodic for the Lebesgue measure on the torus.

Ergodic theorems

If [math]\displaystyle{ \mu }[/math] is a probability measure on a space [math]\displaystyle{ X }[/math] which is ergodic for a transformation [math]\displaystyle{ T }[/math] the pointwise ergodic theorem of G. Birkhoff states that for every measurable functions [math]\displaystyle{ f: X \to \mathbb R }[/math] and for [math]\displaystyle{ \mu }[/math]-almost every point [math]\displaystyle{ x \in X }[/math] the time average on the orbit of [math]\displaystyle{ x }[/math] converges to the space average of [math]\displaystyle{ f }[/math]. Formally this means that [math]\displaystyle{ \lim_{k \to +\infty} \left( \frac 1{k+1} \sum_{i=0}^k f\left(T^i(x)\right) \right) = \int_X fd\mu. }[/math]

The mean ergodic theorem of J. von Neumann is a similar, weaker statement about averaged translates of square-integrable functions.

Related properties

Dense orbits

An immediate consequence of the definition of ergodicity is that on a topological space [math]\displaystyle{ X }[/math], and if [math]\displaystyle{ \mathcal B }[/math] is the σ-algebra of Borel sets, if [math]\displaystyle{ T }[/math] is [math]\displaystyle{ \mu }[/math]-ergodic then [math]\displaystyle{ \mu }[/math]-almost every orbit of [math]\displaystyle{ T }[/math] is dense in the support of [math]\displaystyle{ \mu }[/math].

This is not an equivalence since for a transformation which is not uniquely ergodic, but for which there is an ergodic measure with full support [math]\displaystyle{ \mu_0 }[/math], for any other ergodic measure [math]\displaystyle{ \mu_1 }[/math] the measure [math]\displaystyle{ \frac{1}{2}(\mu_0 + \mu_1) }[/math] is not ergodic for [math]\displaystyle{ T }[/math] but its orbits are dense in the support. Explicit examples can be constructed with shift-invariant measures.[20]


Main page: Mixing (mathematics)

A transformation [math]\displaystyle{ T }[/math] of a probability measure space [math]\displaystyle{ (X, \mu) }[/math] is said to be mixing for the measure [math]\displaystyle{ \mu }[/math] if for any measurable sets [math]\displaystyle{ A, B \subset X }[/math] the following holds: [math]\displaystyle{ \lim_{n \to +\infty} \mu\left(T^{-n}A \cap B\right) = \mu(A)\mu(B) }[/math]

It is immediate that a mixing transformation is also ergodic (taking [math]\displaystyle{ A }[/math] to be a [math]\displaystyle{ T }[/math]-stable subset and [math]\displaystyle{ B }[/math] its complement). The converse is not true, for example a rotation with irrational angle on the circle (which is ergodic per the examples above) is not mixing (for a sufficiently small interval its successive images will not intersect itself most of the time). Bernoulli shifts are mixing, and so is Arnold's cat map.

This notion of mixing is sometimes called strong mixing, as opposed to weak mixing which means that [math]\displaystyle{ \lim_{n \to +\infty} \frac 1 n \sum_{k=1}^n \left|\mu(T^{-n}A \cap B) - \mu(A)\mu(B) \right| = 0 }[/math]

Proper ergodicity

The transformation [math]\displaystyle{ T }[/math] is said to be properly ergodic if it does not have an orbit of full measure. In the discrete case this means that the measure [math]\displaystyle{ \mu }[/math] is not supported on a finite orbit of [math]\displaystyle{ T }[/math].

Definition for continuous-time dynamical systems

The definition is essentially the same for continuous-time dynamical systems as for a single transformation. Let [math]\displaystyle{ (X, \mathcal B) }[/math] be a measurable space and for each [math]\displaystyle{ t \in \mathbb R_+ }[/math], then such a system is given by a family [math]\displaystyle{ T_t }[/math] of measurable functions from [math]\displaystyle{ X }[/math] to itself, so that for any [math]\displaystyle{ t, s \in \mathbb R_+ }[/math] the relation [math]\displaystyle{ T_{s+t} = T_s \circ T_t }[/math] holds (usually it is also asked that the orbit map from [math]\displaystyle{ \mathbb R_+ \times X \to X }[/math] is also measurable). If [math]\displaystyle{ \mu }[/math] is a probability measure on [math]\displaystyle{ (X, \mathcal B) }[/math] then we say that [math]\displaystyle{ T_t }[/math] is [math]\displaystyle{ \mu }[/math]-ergodic or [math]\displaystyle{ \mu }[/math] is an ergodic measure for [math]\displaystyle{ T }[/math] if each [math]\displaystyle{ T_t }[/math] preserves [math]\displaystyle{ \mu }[/math] and the following condition holds:

For any [math]\displaystyle{ A \in \mathcal B }[/math], if for all [math]\displaystyle{ t \in \mathbb R_+ }[/math] we have [math]\displaystyle{ T_t^{-1}(A) \subset A }[/math] then either [math]\displaystyle{ \mu(A) = 0 }[/math] or [math]\displaystyle{ \mu(A) = 1 }[/math].


As in the discrete case the simplest example is that of a transitive action, for instance the action on the circle given by [math]\displaystyle{ T_t(z) = e^{2i\pi t}z }[/math] is ergodic for Lebesgue measure.

An example with infinitely many orbits is given by the flow along an irrational slope on the torus: let [math]\displaystyle{ X = \mathbb S^1 \times \mathbb S^1 }[/math] and [math]\displaystyle{ \alpha \in \mathbb R }[/math]. Let [math]\displaystyle{ T_t(z_1, z_2) = \left(e^{2i\pi t}z_1, e^{2\alpha i\pi t}z_2\right) }[/math]; then if [math]\displaystyle{ \alpha \not\in \mathbb Q }[/math] this is ergodic for the Lebesgue measure.

Ergodic flows

Further examples of ergodic flows are:

  • Billiards in convex Euclidean domains;
  • the geodesic flow of a negatively curved Riemannian manifold of finite volume is ergodic (for the normalised volume measure);
  • the horocycle flow on a hyperbolic manifold of finite volume is ergodic (for the normalised volume measure)

Ergodicity in compact metric spaces

If [math]\displaystyle{ X }[/math] is a compact metric space it is naturally endowed with the σ-algebra of Borel sets. The additional structure coming from the topology then allows a much more detailed theory for ergodic transformations and measures on [math]\displaystyle{ X }[/math].

Functional analysis interpretation

A very powerful alternate definition of ergodic measures can be given using the theory of Banach spaces. Radon measures on [math]\displaystyle{ X }[/math] form a Banach space of which the set [math]\displaystyle{ \mathcal P(X) }[/math] of probability measures on [math]\displaystyle{ X }[/math] is a convex subset. Given a continuous transformation [math]\displaystyle{ T }[/math] of [math]\displaystyle{ X }[/math] the subset [math]\displaystyle{ \mathcal P(X)^T }[/math] of [math]\displaystyle{ T }[/math]-invariant measures is a closed convex subset, and a measure is ergodic for [math]\displaystyle{ T }[/math] if and only if it is an extreme point of this convex.[21]

Existence of ergodic measures

In the setting above it follows from the Banach-Alaoglu theorem that there always exists extremal points in [math]\displaystyle{ \mathcal P(X)^T }[/math]. Hence a transformation of a compact metric space always admits ergodic measures.

Ergodic decomposition

In general an invariant measure need not be ergodic, but as a consequence of Choquet theory it can always be expressed as the barycenter of a probability measure on the set of ergodic measures. This is referred to as the ergodic decomposition of the measure.[22]


In the case of [math]\displaystyle{ X = \{1, \ldots, n\} }[/math] and [math]\displaystyle{ T = (1\, 2)(3\, 4\, \cdots\, n) }[/math] the counting measure is not ergodic. The ergodic measures for [math]\displaystyle{ T }[/math] are the uniform measures [math]\displaystyle{ \mu_1, \mu_2 }[/math] supported on the subsets [math]\displaystyle{ \{1, 2\} }[/math] and [math]\displaystyle{ \{3, \ldots, n\} }[/math] and every [math]\displaystyle{ T }[/math]-invariant probability measure can be written in the form [math]\displaystyle{ t\mu_1 + (1 - t)\mu_2 }[/math] for some [math]\displaystyle{ t \in [0, 1] }[/math]. In particular [math]\displaystyle{ \frac{2}{n}\mu_1 + \frac{n - 2}{n}\mu_2 }[/math] is the ergodic decomposition of the counting measure.

Continuous systems

Everything in this section transfers verbatim to continuous actions of [math]\displaystyle{ \mathbb R }[/math] or [math]\displaystyle{ \mathbb R_+ }[/math] on compact metric spaces.

Unique ergodicity

The transformation [math]\displaystyle{ T }[/math] is said to be uniquely ergodic if there is a unique Borel probability measure [math]\displaystyle{ \mu }[/math] on [math]\displaystyle{ X }[/math] which is ergodic for [math]\displaystyle{ T }[/math].

In the examples considered above, irrational rotations of the circle are uniquely ergodic;[23] shift maps are not.

Probabilistic interpretation: ergodic processes

Main page: Ergodic process

If [math]\displaystyle{ \left(X_n\right)_{n \ge 1} }[/math] is a discrete-time stochastic process on a space [math]\displaystyle{ \Omega }[/math], it is said to be ergodic if the joint distribution of the variables on [math]\displaystyle{ \Omega^\mathbb{N} }[/math] is invariant under the shift map [math]\displaystyle{ \left(x_n\right)_{n \ge 1} \mapsto \left(x_{n+1}\right)_{n \ge 1} }[/math]. This is a particular case of the notions discussed above.

The simplest case is that of an independent and identically distributed process which corresponds to the shift map described above. Another important case is that of a Markov chain which is discussed in detail below.

A similar interpretation holds for continuous-time stochastic processes though the construction of the measurable structure of the action is more complicated.

Ergodicity of Markov chains

The dynamical system associated with a Markov chain

Let [math]\displaystyle{ S }[/math] be a finite set. A Markov chain on [math]\displaystyle{ S }[/math] is defined by a matrix [math]\displaystyle{ P \in [0, 1]^{S \times S} }[/math], where [math]\displaystyle{ P(s_1, s_2) }[/math] is the transition probability from [math]\displaystyle{ s_1 }[/math] to [math]\displaystyle{ s_2 }[/math], so for every [math]\displaystyle{ s \in S }[/math] we have [math]\displaystyle{ \sum_{s' \in S} P(s, s') = 1 }[/math]. A stationary measure for [math]\displaystyle{ P }[/math] is a probability measure [math]\displaystyle{ \nu }[/math] on [math]\displaystyle{ S }[/math] such that [math]\displaystyle{ \nu P = \nu }[/math] ; that is [math]\displaystyle{ \sum_{s' \in S} \nu(s') P(s', s) = \nu(s) }[/math] for all [math]\displaystyle{ s \in S }[/math].

Using this data we can define a probability measure [math]\displaystyle{ \mu_\nu }[/math] on the set [math]\displaystyle{ X = S^\mathbb{Z} }[/math] with its product σ-algebra by giving the measures of the cylinders as follows: [math]\displaystyle{ \mu_\nu(\cdots \times S \times \{(s_n, \ldots, s_m)\} \times S \times \cdots) = \nu(s_n) P(s_n, s_{n+1}) \cdots P(s_{m-1}, s_m). }[/math]

Stationarity of [math]\displaystyle{ \nu }[/math] then means that the measure [math]\displaystyle{ \mu_\nu }[/math] is invariant under the shift map [math]\displaystyle{ T\left(\left(s_k\right)_{k \in \mathbb Z})\right) = \left(s_{k+1}\right)_{k \in \mathbb Z} }[/math].

Criterion for ergodicity

The measure [math]\displaystyle{ \mu_\nu }[/math] is always ergodic for the shift map if the associated Markov chain is irreducible (any state can be reached with positive probability from any other state in a finite number of steps).[24]

The hypotheses above imply that there is a unique stationary measure for the Markov chain. In terms of the matrix [math]\displaystyle{ P }[/math] a sufficient condition for this is that 1 be a simple eigenvalue of the matrix [math]\displaystyle{ P }[/math] and all other eigenvalues of [math]\displaystyle{ P }[/math] (in [math]\displaystyle{ \mathbb C }[/math]) are of modulus <1.

Note that in probability theory the Markov chain is called ergodic if in addition each state is aperiodic (the times where the return probability is positive are not multiples of a single integer >1). This is not necessary for the invariant measure to be ergodic; hence the notions of "ergodicity" for a Markov chain and the associated shift-invariant measure are different (the one for the chain is strictly stronger).[25]

Moreover the criterion is an "if and only if" if all communicating classes in the chain are recurrent and we consider all stationary measures.


Counting measure

If [math]\displaystyle{ P(s, s') = 1/|S| }[/math] for all [math]\displaystyle{ s, s' \in S }[/math] then the stationary measure is the counting measure, the measure [math]\displaystyle{ \mu_P }[/math] is the product of counting measures. The Markov chain is ergodic, so the shift example from above is a special case of the criterion.

Non-ergodic Markov chains

Markov chains with recurring communicating classes are not irreducible are not ergodic, and this can be seen immediately as follows. If [math]\displaystyle{ S_1 \subsetneq S }[/math] are two distinct recurrent communicating classes there are nonzero stationary measures [math]\displaystyle{ \nu_1, \nu_2 }[/math] supported on [math]\displaystyle{ S_1, S_2 }[/math] respectively and the subsets [math]\displaystyle{ S_1^\mathbb{Z} }[/math] and [math]\displaystyle{ S_2^\mathbb{Z} }[/math] are both shift-invariant and of measure 1.2 for the invariant probability measure [math]\displaystyle{ \frac{1}{2}(\nu_1 + \nu_2) }[/math]. A very simple example of that is the chain on [math]\displaystyle{ S = \{1, 2\} }[/math] given by the matrix [math]\displaystyle{ \left(\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\right) }[/math] (both states are stationary).

A periodic chain

The Markov chain on [math]\displaystyle{ S = \{1, 2\} }[/math] given by the matrix [math]\displaystyle{ \left(\begin{array}{cc} 0 & 1 \\ 1 & 0 \end{array}\right) }[/math] is irreducible but periodic. Thus it is not ergodic in the sense of Markov chain though the associated measure [math]\displaystyle{ \mu }[/math] on [math]\displaystyle{ \{1, 2\}^{\mathbb Z} }[/math] is ergodic for the shift map. However the shift is not mixing for this measure, as for the sets [math]\displaystyle{ A = \cdots \times \{1, 2\} \times 1 \times \{1, 2\} \times 1 \times \{1, 2\} \cdots }[/math]

and [math]\displaystyle{ B = \cdots \times \{1, 2\} \times 2 \times \{1, 2\} \times 2 \times \{1, 2\} \cdots }[/math]

we have [math]\displaystyle{ \mu(A) = \frac{1}{2} = \mu(B) }[/math] but [math]\displaystyle{ \mu\left(T^{-n}A \cap B\right) = \begin{cases} \frac{1}{2} \text{ if } n \text{ is odd} \\ 0 \text{ if } n \text{ is even.} \end{cases} }[/math]


The definition of ergodicity also makes sense for group actions. The classical theory (for invertible transformations) corresponds to actions of [math]\displaystyle{ \mathbb Z }[/math] or [math]\displaystyle{ \mathbb R }[/math].

For non-abelian groups there might not be invariant measures even on compact metric spaces. However the definition of ergodicity carries over unchanged if one replaces invariant measures by quasi-invariant measures.

Important examples are the action of a semisimple Lie group (or a lattice therein) on its Furstenberg boundary.

A measurable equivalence relation it is said to be ergodic if all saturated subsets are either null or conull.


  1. Walters 1982, §0.1, p. 2
  2. Gallavotti, Giovanni (1995). "Ergodicity, ensembles, irreversibility in Boltzmann and beyond". Journal of Statistical Physics 78 (5–6): 1571–1589. doi:10.1007/BF02180143. Bibcode1995JSP....78.1571G. 
  3. Feller, William (1 August 2008). An Introduction to Probability Theory and Its Applications (2nd ed.). Wiley India Pvt. Limited. p. 271. ISBN 978-81-265-1806-7. 
  4. Stöckmann, Hans-Jürgen (1999). Quantum Chaos: An Introduction. Cambridge: Cambridge University Press. doi:10.1017/cbo9780511524622. ISBN 978-0-521-02715-1. 
  5. Heller, Eric J. (1984-10-15). "Bound-State Eigenfunctions of Classically Chaotic Hamiltonian Systems: Scars of Periodic Orbits". Physical Review Letters 53 (16): 1515–1518. doi:10.1103/PhysRevLett.53.1515. 
  6. Kaplan, L (1999-03-01). "Scars in quantum chaotic wavefunctions". Nonlinearity 12 (2): R1–R40. doi:10.1088/0951-7715/12/2/009. ISSN 0951-7715. 
  7. Kaplan, L.; Heller, E.J. (April 1998). "Linear and Nonlinear Theory of Eigenfunction Scars" (in en). Annals of Physics 264 (2): 171–206. doi:10.1006/aphy.1997.5773. 
  8. Heller, Eric Johnson (2018) (in English). The semiclassical way to dynamics and spectroscopy. Princeton: Princeton University Press. ISBN 978-1-4008-9029-3. OCLC 1034625177. 
  9. Keski-Rahkonen, J.; Ruhanen, A.; Heller, E. J.; Räsänen, E. (2019-11-21). "Quantum Lissajous Scars". Physical Review Letters 123 (21): 214101. doi:10.1103/PhysRevLett.123.214101. PMID 31809168. 
  10. Luukko, Perttu J. J.; Drury, Byron; Klales, Anna; Kaplan, Lev; Heller, Eric J.; Räsänen, Esa (2016-11-28). "Strong quantum scarring by local impurities" (in en). Scientific Reports 6 (1): 37656. doi:10.1038/srep37656. ISSN 2045-2322. PMID 27892510. 
  11. Keski-Rahkonen, J.; Luukko, P. J. J.; Kaplan, L.; Heller, E. J.; Räsänen, E. (2017-09-20). "Controllable quantum scars in semiconductor quantum dots". Physical Review B 96 (9): 094204. doi:10.1103/PhysRevB.96.094204. 
  12. Keski-Rahkonen, J; Luukko, P J J; Åberg, S; Räsänen, E (2019-01-21). "Effects of scarring on quantum chaos in disordered quantum wells" (in en). Journal of Physics: Condensed Matter 31 (10): 105301. doi:10.1088/1361-648x/aaf9fb. ISSN 0953-8984. PMID 30566927. 
  13. Keski-Rahkonen, Joonas (2020) (in en). Quantum Chaos in Disordered Two-Dimensional Nanostructures. Tampere University. ISBN 978-952-03-1699-0. 
  14. Turner, C. J.; Michailidis, A. A.; Abanin, D. A.; Serbyn, M.; Papić, Z. (July 2018). "Weak ergodicity breaking from quantum many-body scars" (in en). Nature Physics 14 (7): 745–749. doi:10.1038/s41567-018-0137-5. ISSN 1745-2481. 
  15. Taleb, Nassim Nicholas (2019), "Probability, Risk, and Extremes", in Needham, Duncan, Extremes, Cambridge University Press, pp. 46–66, 
  16. Molenaar, P.C. (2004). "A manifesto on psychology as idiographic science: Bringing the person back into scientific psychology, this time forever". Measurement 2 (4): 201–218. doi:10.1207/s15366359mea0204_1. 
  17. 17.0 17.1 Fisher, A.J. (2018). "Lack of group-to-individual generalizability is a threat to human subjects research". PNAS 115 (27): 6106–6115. doi:10.1073/pnas.1711978115. PMID 29915059. 
  18. Walters 1982, p. 32.
  19. Walters 1982, p. 29.
  20. "Example of a measure-preserving system with dense orbits that is not ergodic". September 1, 2011. 
  21. Walters 1982, p. 152.
  22. Walters 1982, p. 153.
  23. Walters 1982, p. 159.
  24. Walters 1982, p. 42.
  25. "Different uses of the word "ergodic"". September 4, 2011. 


External links