Hadamard transform

From HandWiki
Short description: Involutive change of basis in linear algebra
The product of a Boolean function and a Hadamard matrix is its Walsh spectrum:[1]
(1, 0, 1, 0, 0, 1, 1, 0) × H(8) = (4, 2, 0, −2, 0, 2, 0, 2)
Fast Walsh–Hadamard transform, a faster way to calculate the Walsh spectrum of (1, 0, 1, 0, 0, 1, 1, 0).
The original function can be expressed by means of its Walsh spectrum as an arithmetical polynomial.

The Hadamard transform (also known as the Walsh–Hadamard transform, Hadamard–Rademacher–Walsh transform, Walsh transform, or Walsh–Fourier transform) is an example of a generalized class of Fourier transforms. It performs an orthogonal, symmetric, involutive, linear operation on 2m real numbers (or complex, or hypercomplex numbers, although the Hadamard matrices themselves are purely real).

The Hadamard transform can be regarded as being built out of size-2 discrete Fourier transforms (DFTs), and is in fact equivalent to a multidimensional DFT of size 2 × 2 × ⋯ × 2 × 2.[2] It decomposes an arbitrary input vector into a superposition of Walsh functions.

The transform is named for the France mathematician Jacques Hadamard (French: [adamaʁ]), the German-American mathematician Hans Rademacher, and the American mathematician Joseph L. Walsh.

Definition

The Hadamard transform Hm is a 2m × 2m matrix, the Hadamard matrix (scaled by a normalization factor), that transforms 2m real numbers xn into 2m real numbers Xk. The Hadamard transform can be defined in two ways: recursively, or by using the binary (base-2) representation of the indices n and k.

Recursively, we define the 1 × 1 Hadamard transform H0 by the identity H0 = 1, and then define Hm for m > 0 by: [math]\displaystyle{ H_m = \frac{1}{\sqrt2} \begin{pmatrix} H_{m-1} & H_{m-1} \\ H_{m-1} & -H_{m-1} \end{pmatrix} }[/math] where the 1/2 is a normalization that is sometimes omitted.

For m > 1, we can also define Hm by: [math]\displaystyle{ H_m = H_{1} \otimes H_{m-1} }[/math] where [math]\displaystyle{ \otimes }[/math] represents the Kronecker product. Thus, other than this normalization factor, the Hadamard matrices are made up entirely of 1 and −1.

Equivalently, we can define the Hadamard matrix by its (kn)-th entry by writing [math]\displaystyle{ \begin{align} k &= \sum^{m-1}_{i=0} {k_i 2^i} = k_{m-1} 2^{m-1} + k_{m-2} 2^{m-2} + \dots + k_1 2 + k_0 \\ n &= \sum^{m-1}_{i=0} {n_i 2^i} = n_{m-1} 2^{m-1} + n_{m-2} 2^{m-2} + \dots + n_1 2 + n_0 \end{align} }[/math]

where the kj and nj are the bit elements (0 or 1) of k and n, respectively. Note that for the element in the top left corner, we define: [math]\displaystyle{ k = n = 0 }[/math]. In this case, we have: [math]\displaystyle{ (H_m)_{k,n} = \frac{1}{2^{m/2}} (-1)^{\sum_j k_j n_j} }[/math]

This is exactly the multidimensional [math]\displaystyle{ 2 \times 2 \times \cdots \times 2 \times 2 }[/math] DFT, normalized to be unitary, if the inputs and outputs are regarded as multidimensional arrays indexed by the nj and kj, respectively.

Some examples of the Hadamard matrices follow. [math]\displaystyle{ \begin{align} H_0 & = +\begin{pmatrix}1\end{pmatrix}\\[5pt] H_1 & = \frac{1}{\sqrt2} \left(\begin{array}{rr} 1 & 1\\ 1 & -1 \end{array}\right)\\[5pt] H_2 & = \frac{1}{2} \left(\begin{array}{rrrr} 1 & 1 & 1 & 1\\ 1 & -1 & 1 & -1\\ 1 & 1 & -1 & -1\\ 1 & -1 & -1 & 1 \end{array}\right)\\[5pt] H_3 & = \frac{1}{2^{3/2}} \left(\begin{array}{rrrrrrrr} 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1\\ 1 & -1 & 1 & -1 & 1 & -1 & 1 & -1\\ 1 & 1 & -1 & -1 & 1 & 1 & -1 & -1\\ 1 & -1 & -1 & 1 & 1 & -1 & -1 & 1\\ 1 & 1 & 1 & 1 & -1 & -1 & -1 & -1\\ 1 & -1 & 1 & -1 & -1 & 1 & -1 & 1\\ 1 & 1 & -1 & -1 & -1 & -1 & 1 & 1\\ 1 & -1 & -1 & 1 & -1 & 1 & 1 & -1 \end{array}\right)\\[5pt] (H_n)_{i,j} & = \frac{1}{2^{n/2}} (-1)^{i \cdot j} \end{align} }[/math] where [math]\displaystyle{ i \cdot j }[/math] is the bitwise dot product of the binary representations of the numbers i and j. For example, if [math]\displaystyle{ n \;\geq\; 2 }[/math], then [math]\displaystyle{ (H_n)_{3,2} \;=\; (-1)^{3 \cdot 2} \;=\; (-1)^{(1,1) \cdot (1,0)} \;=\; (-1)^{1+0} \;=\; (-1)^1 \;=\; -1 }[/math]agreeing with the above (ignoring the overall constant). Note that the first row, first column element of the matrix is denoted by [math]\displaystyle{ (H_n)_{0,0} }[/math].

H1 is precisely the size-2 DFT. It can also be regarded as the Fourier transform on the two-element additive group of Z/(2).

The rows of the Hadamard matrices are the Walsh functions.

Advantages of the Walsh–Hadamard transform

Real

According to the above definition of matrix H, here we let H = H[m,n] [math]\displaystyle{ H[m,n]=\begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix} }[/math]

In the Walsh transform, only 1 and −1 will appear in the matrix. The numbers 1 and −1 are real numbers so there is no need to perform a complex number calculation.

No multiplication is required

The DFT needs irrational multiplication, while the Hadamard transform does not. Even rational multiplication is not needed, since sign flips is all it takes.

Some properties are similar to those of the DFT

In the Walsh transform matrix, all entries in the first row (and column) are equal to 1. [math]\displaystyle{ H[m,n] = \left(\begin{array}{rrrrrrrr} 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 & -1 & -1 & -1 & -1\\ 1 & 1 & -1 & -1 & -1 & -1 & 1 & 1\\ 1 & 1 & -1 & -1 & 1 & 1 & -1 & -1\\ 1 & -1 & -1 & 1 & 1 & -1 & -1 & 1\\ 1 & -1 & -1 & 1 & -1 & 1 & 1 & -1\\ 1 & -1 & 1 & -1 & -1 & 1 & -1 & 1\\ 1 & -1 & 1 & -1 & 1 & -1 & 1 & -1 \end{array}\right) }[/math]

Discrete Fourier transform: [math]\displaystyle{ e^{-j 2\pi mn/N} }[/math]

In discrete Fourier transform, when m equal to zeros (mean first row), the result of DFT also is 1. At the second row, although it is different from the first row we can observe a characteristic of the matrix that the signal in the first raw matrix is low frequency and it will increase the frequency at second row, increase more frequency until the last row.

If we calculate zero crossing:

First row  = 0 zero crossing
Second row = 1 zero crossing
Third row  = 2 zero crossings
           ⋮
Eight row  = 7 zero crossings

Relation to Fourier transform

The Hadamard transform is in fact equivalent to a multidimensional DFT of size 2 × 2 × ⋯ × 2 × 2.[2]

Another approach is to view the Hadamard transform as a Fourier transform on the Boolean group [math]\displaystyle{ (\Z / 2\Z)^n }[/math].[3] [4] Using the Fourier transform on finite (abelian) groups, the Fourier transform of a function [math]\displaystyle{ f \colon (\Z/2\Z)^n \to \Complex }[/math] is the function [math]\displaystyle{ \widehat f }[/math] defined by [math]\displaystyle{ \widehat{f}(\chi) = \sum_{a \in (\Z/2\Z)^n} f(a) \bar{\chi}(a) }[/math] where [math]\displaystyle{ \chi }[/math] is a character of [math]\displaystyle{ (\Z/2\Z)^n }[/math]. Each character has the form [math]\displaystyle{ \chi_r(a) = (-1)^{a \cdot r} }[/math] for some [math]\displaystyle{ r \in (\Z/2\Z)^n }[/math], where the multiplication is the boolean dot product on bit strings, so we can identify the input to [math]\displaystyle{ \widehat{f} }[/math] with [math]\displaystyle{ r \in (\Z/2\Z)^n }[/math] (Pontryagin duality) and define [math]\displaystyle{ \widehat f \colon (\Z/2\Z)^n \to \Complex }[/math] by [math]\displaystyle{ \widehat{f}(r) = \sum_{a \in (\Z/2\Z)^n} f(a) (-1)^{r \cdot a} }[/math]

This is the Hadamard transform of [math]\displaystyle{ f }[/math], considering the input to [math]\displaystyle{ f }[/math] and [math]\displaystyle{ \widehat{f} }[/math] as boolean strings.

In terms of the above formulation where the Hadamard transform multiplies a vector of [math]\displaystyle{ 2^n }[/math] complex numbers [math]\displaystyle{ v }[/math] on the left by the Hadamard matrix [math]\displaystyle{ H_n }[/math] the equivalence is seen by taking [math]\displaystyle{ f }[/math] to take as input the bit string corresponding to the index of an element of [math]\displaystyle{ v }[/math], and having [math]\displaystyle{ f }[/math] output the corresponding element of [math]\displaystyle{ v }[/math].

Compare this to the usual discrete Fourier transform which when applied to a vector [math]\displaystyle{ v }[/math] of [math]\displaystyle{ 2^n }[/math] complex numbers instead uses characters of the cyclic group [math]\displaystyle{ \Z / 2^n \Z }[/math].

Computational complexity

In the classical domain, the Hadamard transform can be computed in [math]\displaystyle{ n \log n }[/math] operations ([math]\displaystyle{ n = 2^m }[/math]), using the fast Hadamard transform algorithm.

In the quantum domain, the Hadamard transform can be computed in [math]\displaystyle{ O(1) }[/math] time, as it is a quantum logic gate that can be parallelized.

Quantum computing applications

The Hadamard transform is used extensively in quantum computing. The 2 × 2 Hadamard transform [math]\displaystyle{ H_1 }[/math] is the quantum logic gate known as the Hadamard gate, and the application of a Hadamard gate to each qubit of an [math]\displaystyle{ n }[/math]-qubit register in parallel is equivalent to the Hadamard transform [math]\displaystyle{ H_n }[/math].

Hadamard gate

In quantum computing, the Hadamard gate is a one-qubit rotation, mapping the qubit-basis states [math]\displaystyle{ |0 \rangle }[/math] and [math]\displaystyle{ |1 \rangle }[/math] to two superposition states with equal weight of the computational basis states [math]\displaystyle{ |0 \rangle }[/math] and [math]\displaystyle{ |1 \rangle }[/math]. Usually the phases are chosen so that [math]\displaystyle{ H=\frac{|0\rangle+|1\rangle}{\sqrt{2}}\langle0|+\frac{|0\rangle-|1\rangle}{\sqrt{2}}\langle1| }[/math]

in Dirac notation. This corresponds to the transformation matrix [math]\displaystyle{ H_1=\frac{1}{\sqrt{2}}\begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix} }[/math] in the [math]\displaystyle{ |0 \rangle , |1 \rangle }[/math] basis, also known as the computational basis. The states [math]\displaystyle{ \frac{\left|0\right\rangle + \left|1\right\rangle}{\sqrt{2}} }[/math] and [math]\displaystyle{ \frac{\left|0\right\rangle - \left|1\right\rangle}{\sqrt{2}} }[/math] are known as [math]\displaystyle{ \left|\boldsymbol{+}\right\rangle }[/math] and [math]\displaystyle{ \left|\boldsymbol{-}\right\rangle }[/math] respectively, and together constitute the polar basis in quantum computing.

Hadamard gate operations

[math]\displaystyle{ \begin{align} H(|0\rangle) &= \frac{1}{\sqrt 2}|0\rangle + \frac{1}{\sqrt{2}}|1\rangle =: |+\rangle\\ H(|1\rangle) &= \frac{1}{\sqrt 2}|0\rangle - \frac{1}{\sqrt{2}}|1\rangle =: |-\rangle\\ H(|+\rangle) &= H\left( \frac{1}{\sqrt 2}|0\rangle + \frac{1}{\sqrt{2}}|1\rangle \right) = \frac{1}{2}\Big( |0\rangle + |1\rangle\Big) + \frac{1}{2}\Big(|0\rangle - |1\rangle\Big) = |0\rangle\\ H(|-\rangle) &= H\left( \frac{1}{\sqrt 2}|0\rangle - \frac{1}{\sqrt{2}}|1\rangle \right) = \frac{1}{2}\Big( |0\rangle + |1\rangle\Big) - \frac{1}{2}\Big(|0\rangle - |1\rangle\Big) = |1\rangle \end{align} }[/math]

One application of the Hadamard gate to either a 0 or 1 qubit will produce a quantum state that, if observed, will be a 0 or 1 with equal probability (as seen in the first two operations). This is exactly like flipping a fair coin in the standard probabilistic model of computation. However, if the Hadamard gate is applied twice in succession (as is effectively being done in the last two operations), then the final state is always the same as the initial state.

Hadamard transform in quantum algorithms

Computing the quantum Hadamard transform is simply the application of a Hadamard gate to each qubit individually because of the tensor product structure of the Hadamard transform. This simple result means the quantum Hadamard transform requires [math]\displaystyle{ \lg n }[/math] operations, compared to the classical case of [math]\displaystyle{ n\lg n }[/math] operations.

Many quantum algorithms use the Hadamard transform as an initial step, since it maps m qubits initialized with [math]\displaystyle{ |0 \rangle }[/math] to a superposition of all 2m orthogonal states in the [math]\displaystyle{ |0 \rangle , |1 \rangle }[/math] basis with equal weight. For example, this is used in the Deutsch–Jozsa algorithm, Simon's algorithm, the Bernstein–Vazirani algorithm, and in Grover's algorithm. Note that Shor's algorithm uses both an initial Hadamard transform, as well as the quantum Fourier transform, which are both types of Fourier transforms on finite groups; the first on [math]\displaystyle{ (\Z / 2 \Z)^n }[/math] and the second on [math]\displaystyle{ \Z /2^n \Z }[/math].

Molecular phylogenetics (evolutionary biology) applications

The Hadamard transform can be used to estimate phylogenetic trees from molecular data.[5][6][7] Phylogenetics is the subfield of evolutionary biology focused on understanding the relationships among organisms. A Hadamard transform applied to a vector (or matrix) of site pattern frequencies obtained from a DNA multiple sequence alignment can be used to generate another vector that carries information about the tree topology. The invertible nature of the phylogenetic Hadamard transform also allows the calculation of site likelihoods from a tree topology vector, allowing one to use the Hadamard transform for maximum likelihood estimation of phylogenetic trees. However, the latter application is less useful than the transformation from the site pattern vector to the tree vector because there are other ways to calculate site likelihoods[8][9] that are much more efficient. However, the invertible nature of the phylogenetic Hadamard transform does provide an elegant tool for mathematic phylogenetics.[10][11]

The mechanics of the phylogenetic Hadamard transform involve the calculation of a vector [math]\displaystyle{ \gamma(T) }[/math] that provides information about the topology and branch lengths for tree [math]\displaystyle{ T }[/math] using the site pattern vector or matrix [math]\displaystyle{ s(T) }[/math].

[math]\displaystyle{ \gamma(T) = H^{-1}(\ln(Hs(T))) }[/math] where [math]\displaystyle{ H }[/math] is the Hadamard matrix of the appropriate size. This equation can be rewritten as a series of three equations to simplify its interpretation:

[math]\displaystyle{ \begin{align} r &= H s(T) \\ \rho &= \ln r \\ \gamma(T) &= H^{-1}\rho \end{align} }[/math]

The invertible nature of this equation allows one to calculate an expected site pattern vector (or matrix) as follows:

[math]\displaystyle{ s(T)=H^{-1}(\exp(H\gamma(T))) }[/math]

We can use the Cavender–Farris–Neyman (CFN) two-state substitution model for DNA by encoding the nucleotides as binary characters (the purines A and G are encoded as R and the pyrimidines C and T are encoded as Y). This makes it possible to encode the multiple sequence alignment as the site pattern vector [math]\displaystyle{ s(T) }[/math] that can be converted to a tree vector [math]\displaystyle{ \gamma(T) }[/math], as shown in the following example:

Example showing the Hadamard transform for a specific tree (values for worked example adapted from Waddell et al. 1997[12])
Index Binary pattern Alignment patterns [math]\displaystyle{ \gamma(T) }[/math] [math]\displaystyle{ \rho = H^{-1}\gamma(T) }[/math] [math]\displaystyle{ r = \exp(\rho) }[/math] [math]\displaystyle{ s(T) = H^{-1}\rho }[/math]
0 0000 RRRR and YYYY −0.475 0 1 0.6479
1 0001 RRRY and YYYR 0.2 −0.5 0.6065 0.1283
2 0010 RRYR and YYRY 0.025 −0.15 0.8607 0.02
3* 0011 RRYY and YYRR 0.025 −0.45 0.6376 0.0226
4 0100 RYRR and YRYY 0.2 −0.45 0.6376 0.1283
5* 0101 RYRY and YRYR 0 −0.85 0.4274 0.0258
6* 0110 RYYR and YRRY 0 −0.5 0.6065 0.0070
7 0111 RYYY and YRRR 0.025 −0.9 0.4066 0.02

The example shown in this table uses the simplified three equation scheme and it is for a four taxon tree that can be written as ((A,B),(C,D)); in newick format. The site patterns are written in the order ABCD. This particular tree has two long terminal branches (0.2 transversion substitutions per site), two short terminal branches (0.025 transversion substitutions per site), and a short internal branch (0.025 transversion substitutions per site); thus, it would be written as ((A:0.025,B:0.2):0.025,(C:0.025,D:0.2)); in newick format. This tree will exhibit long branch attraction if the data are analyzed using the maximum parsimony criterion (assuming the sequence analyzed is long enough for the observed site pattern frequencies to be close to the expected frequencies shown in the [math]\displaystyle{ s(T)=H^{-1}\rho }[/math] column). The long branch attraction reflects the fact that the expected number of site patterns with index 6 -- which support the tree ((A,C),(B,D)); -- exceed the expected number of site patterns that support the true tree (index 4). Obviously, the invertible nature of the phylogenetic Hadamard transform means that the tree vector means that the tree vector [math]\displaystyle{ \gamma(T) }[/math] corresponds to the correct tree. Parsimony analysis after the transformation is therefore statistically consistent,[13] as would be a standard maximum likelihood analysis using the correct model (in this case the CFN model).

Note that the site pattern with 0 corresponds to the sites that have not changed (after encoding the nucleotides as purines or pyrimidines). The indices with asterisks (3, 5, and 6) are "parsimony-informative", and. the remaining indices represent site patterns where a single taxon differs from the other three taxa (so they are the equivalent of terminal branch lengths in a standard maximum likelihood phylogenetic tree).

If one wishes to use nucleotide data without recoding as R and Y (and ultimately as 0 and 1) it is possible to encode the site patterns as a matrix. If we consider a four-taxon tree there are a total of 256 site patterns (four nucleotides to the 4th power). However, symmetries of the Kimura three-parameter (or K81) model allow us to reduce the 256 possible site patterns for DNA to 64 patterns, making it possible to encode the nucleotide data for a four-taxon tree as an 8 × 8 matrix[14] in a manner similar to the vector of 8 elements used above for transversion (RY) site patterns. This is accomplished by recoding the data using the Klein four-group:

Klein four-group coding for phylogenetic Hadamard transform
Nucleotide 1 Nucleotide 2 Nucleotide 3 Nucleotide 4
A (0,0) G (1,0) C (0,1) T (1,1)
C (0,0) T (1,0) A (0,1) G (1,1)
G (0,0) A (1,0) T (0,1) C (1,1)
T (0,0) C (1,0) G (0,1) A (1,1)

As with RY data, site patterns are indexed relative to the base in the arbitrarily chosen first taxon with the bases in the subsequent taxa encoded relative to that first base. Thus, the first taxon receives the bit pair (0,0). Using those bit pairs one can produce two vectors similar to the RY vectors and then populate the matrix using those vectors. This can be illustrated using the example from Hendy et al. (1994),[14] which is based on a multiple sequence alignment of four primate hemoglobin pseudogenes:

Example of encoded sequence alignment (from Hendy et al. 1994[14]) (values are counts out of 9879 sites)
0 8 16 24 32 40 48 56
0 8988 9 10 12 24 90
1 41 9 **
2 45 13
3 54* 14 3
4 94 20
5 1
6 2 2
7 356 1 1 75

The much larger number of site patterns in column 0 reflects the fact that column 0 corresponds to transition differences, which accumulate more rapidly than transversion differences in virtually all comparisons of genomic regions (and definitely accumulate more rapidly in the hemoglobin pseudogenes used for this worked example[15]). If we consider the site pattern AAGG it would to binary pattern 0000 for the second element of the Klein group bit pair and 0011 for the first element. in this case binary pattern based on the first element first element corresponds to index 3 (so row 3 in column 0; indicated with a single asterisk in the table). The site patterns GGAA, CCTT, and TTCC would be encoded in the exact same way. The site pattern AACT would be encoded with binary pattern 0011 based on the second element and 0001 based on the first element; this yields index 1 for the first element and index 3 for the second. The index based on the second Klein group bit pair is multiplied by 8 to yield the column index (in this case it would be column 24) The cell that would include the count of AACT site patterns is indicated with two asterisks; however, the absence of a number in the example indicates that the sequence alignment include no AACT site patterns (likewise, CCAG, GGTC, and TTGA site patterns, which would be encoded in the same way, are absent).

Other applications

The Hadamard transform is also used in data encryption, as well as many signal processing and data compression algorithms, such as JPEG XR and MPEG-4 AVC. In video compression applications, it is usually used in the form of the sum of absolute transformed differences. It is also a crucial part of significant number of algorithms in quantum computing. The Hadamard transform is also applied in experimental techniques such as NMR, mass spectrometry and crystallography. It is additionally used in some versions of locality-sensitive hashing, to obtain pseudo-random matrix rotations.

See also

External links

References

  1. Compare Figure 1 in Townsend, W. J.; Thornton, M. A.. Walsh Spectrum Computations Using Cayley Graphs. 
  2. 2.0 2.1 Kunz, H.O. (1979). "On the Equivalence Between One-Dimensional Discrete Walsh–Hadamard and Multidimensional Discrete Fourier Transforms". IEEE Transactions on Computers 28 (3): 267–8. doi:10.1109/TC.1979.1675334. 
  3. Fourier Analysis of Boolean Maps– A Tutorial –, pp. 12–13
  4. Lecture 5: Basic quantum algorithms, Rajat Mittal, pp. 4–5
  5. Hendy, Michael D.; Penny, David (December 1989). "A Framework for the Quantitative Study of Evolutionary Trees". Systematic Zoology 38 (4): 297. doi:10.2307/2992396. https://academic.oup.com/sysbio/article-lookup/doi/10.2307/2992396. 
  6. Hendy, Michael D.; Penny, David (January 1993). "Spectral analysis of phylogenetic data" (in en). Journal of Classification 10 (1): 5–24. doi:10.1007/BF02638451. ISSN 0176-4268. http://link.springer.com/10.1007/BF02638451. 
  7. Székely, L. A., Erdős, P. L., Steel, M. A., & Penny, D. (1993). A Fourier inversion formula for evolutionary trees. Applied mathematics letters, 6(2), 13–16.
  8. Felsenstein, Joseph (November 1981). "Evolutionary trees from DNA sequences: A maximum likelihood approach" (in en). Journal of Molecular Evolution 17 (6): 368–376. doi:10.1007/BF01734359. ISSN 0022-2844. PMID 7288891. Bibcode1981JMolE..17..368F. http://link.springer.com/10.1007/BF01734359. 
  9. Stamatakis, Alexandros (2019), Warnow, Tandy, ed., "A Review of Approaches for Optimizing Phylogenetic Likelihood Calculations" (in en), Bioinformatics and Phylogenetics, Computational Biology (Cham: Springer International Publishing) 29: pp. 1–19, doi:10.1007/978-3-030-10837-3_1, ISBN 978-3-030-10836-6, http://link.springer.com/10.1007/978-3-030-10837-3_1, retrieved 2020-10-10 
  10. Chor, Benny; Hendy, Michael D.; Holland, Barbara R.; Penny, David (2000-10-01). "Multiple Maxima of Likelihood in Phylogenetic Trees: An Analytic Approach" (in en). Molecular Biology and Evolution 17 (10): 1529–1541. doi:10.1093/oxfordjournals.molbev.a026252. ISSN 1537-1719. PMID 11018159. http://academic.oup.com/mbe/article/17/10/1529/956181. 
  11. Matsen, Frederick A.; Steel, Mike (2007-10-01). Ané, Cécile; Sullivan, Jack. eds. "Phylogenetic Mixtures on a Single Tree Can Mimic a Tree of Another Topology" (in en). Systematic Biology 56 (5): 767–775. doi:10.1080/10635150701627304. ISSN 1076-836X. PMID 17886146. 
  12. Waddell, Peter J; Steel, M.A (December 1997). "General Time-Reversible Distances with Unequal Rates across Sites: Mixing Γ and Inverse Gaussian Distributions with Invariant Sites" (in en). Molecular Phylogenetics and Evolution 8 (3): 398–414. doi:10.1006/mpev.1997.0452. PMID 9417897. 
  13. Steel, M. A.; Hendy, M. D.; Penny, D. (1993-12-01). "Parsimony Can Be Consistent!" (in en). Systematic Biology 42 (4): 581–587. doi:10.1093/sysbio/42.4.581. ISSN 1063-5157. https://academic.oup.com/sysbio/article-lookup/doi/10.1093/sysbio/42.4.581. 
  14. 14.0 14.1 14.2 Hendy, M. D.; Penny, D.; Steel, M. A. (1994-04-12). "A discrete Fourier analysis for evolutionary trees." (in en). Proceedings of the National Academy of Sciences 91 (8): 3339–3343. doi:10.1073/pnas.91.8.3339. ISSN 0027-8424. PMID 8159749. Bibcode1994PNAS...91.3339H. 
  15. Miyamoto, M. M.; Koop, B. F.; Slightom, J. L.; Goodman, M.; Tennant, M. R. (1988-10-01). "Molecular systematics of higher primates: genealogical relations and classification." (in en). Proceedings of the National Academy of Sciences 85 (20): 7627–7631. doi:10.1073/pnas.85.20.7627. ISSN 0027-8424. PMID 3174657. Bibcode1988PNAS...85.7627M.