Confusion matrix

Short description: Table layout for visualizing performance; also called an error matrix

In machine learning, a confusion matrix, also known as error matrix,^[1] is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one. In unsupervised learning it is usually called a matching matrix. The term is used specifically in the problem of statistical classification.

Each row of the matrix represents the instances in an actual class while each column represents the instances in a predicted class, or vice versa – both variants are found in the literature.^[2] The diagonal of the matrix therefore represents all instances that are correctly predicted.^[3] The name stems from the fact that it makes it easy to identify whether the system is confusing two classes (i.e., commonly mislabeling one class as another). The confusion matrix has its origins in human perceptual studies of auditory stimuli. It was adapted for machine learning studies and used by Frank Rosenblatt, among other early researchers, to compare human and machine classifications of visual (and later auditory) stimuli.^[4]

It is a special kind of contingency table, with two dimensions ("actual" and "predicted"), and identical sets of "classes" in both dimensions (each combination of dimension and class is a variable in the contingency table).

Example

Given a sample of 12 individuals, 8 that have been diagnosed with cancer and 4 that are cancer-free, where individuals with cancer belong to class 1 (positive) and non-cancer individuals belong to class 0 (negative), we can display that data as follows:

Individual number	1	2	3	4	5	6	7	8	9	10	11	12
Actual classification	1	1	1	1	1	1	1	1	0	0	0	0

Assume that we have a classifier that distinguishes between individuals with and without cancer in some way, we can take the 12 individuals and run them through the classifier. The classifier then makes 9 accurate predictions and misses 3: 2 individuals with cancer wrongly predicted as being cancer-free (sample 1 and 2), and 1 person without cancer that is wrongly predicted to have cancer (sample 9).

Individual number	1	2	3	4	5	6	7	8	9	10	11	12
Actual classification	1	1	1	1	1	1	1	1	0	0	0	0
Predicted classification	0	0	1	1	1	1	1	1	1	0	0	0

Notice, that if we compare the actual classification set to the predicted classification set, there are 4 different outcomes that could result in any particular column. One, if the actual classification is positive and the predicted classification is positive (1,1), this is called a true positive result because the positive sample was correctly identified by the classifier. Two, if the actual classification is positive and the predicted classification is negative (1,0), this is called a false negative result because the positive sample is incorrectly identified by the classifier as being negative. Third, if the actual classification is negative and the predicted classification is positive (0,1), this is called a false positive result because the negative sample is incorrectly identified by the classifier as being positive. Fourth, if the actual classification is negative and the predicted classification is negative (0,0), this is called a true negative result because the negative sample gets correctly identified by the classifier.

We can then perform the comparison between actual and predicted classifications and add this information to the table, making correct results appear in green so they are more easily identifiable.

Individual number	1	2	3	4	5	6	7	8	9	10	11	12
Actual classification	1	1	1	1	1	1	1	1	0	0	0	0
Predicted classification	0	0	1	1	1	1	1	1	1	0	0	0
Result	FN	FN	TP	TP	TP	TP	TP	TP	FP	TN	TN	TN

The template for any binary confusion matrix uses the four kinds of results discussed above (true positives, false negatives, false positives, and true negatives) along with the positive and negative classifications. The four outcomes can be formulated in a 2×2 confusion matrix, as follows:

		Predicted condition
	Total population = P + N	Positive (PP)	Negative (PN)
rowspan="2" Script error: No such module "Vertical header".	Positive (P)	True positive (TP)	False negative (FN)
Negative (N)	False positive (FP)	True negative (TN)
^Sources:^[5]^[6]^[2]^[7]^[8]^[9]^[10]

The color convention of the three data tables above were picked to match this confusion matrix, in order to easily differentiate the data.

Now, we can simply total up each type of result, substitute into the template, and create a confusion matrix that will concisely summarize the results of testing the classifier:

		Predicted condition
	Total 8 + 4 = 12	Cancer 7	Non-cancer 5
rowspan="2" Script error: No such module "Vertical header".	Cancer 8	6	2
Non-cancer 4	1	3

In this confusion matrix, of the 8 samples with cancer, the system judged that 2 were cancer-free, and of the 4 samples without cancer, it predicted that 1 did have cancer. All correct predictions are located in the diagonal of the table (highlighted in green), so it is easy to visually inspect the table for prediction errors, as values outside the diagonal will represent them. By summing up the 2 rows of the confusion matrix, one can also deduce the total number of positive (P) and negative (N) samples in the original dataset, i.e. $P = T P + F N$ and $N = F P + T N$ .

Table of confusion

In predictive analytics, a table of confusion (sometimes also called a confusion matrix) is a table with two rows and two columns that reports the number of true positives, false negatives, false positives, and true negatives. This allows more detailed analysis than simply observing the proportion of correct classifications (accuracy). Accuracy will yield misleading results if the data set is unbalanced; that is, when the numbers of observations in different classes vary greatly.

For example, if there were 95 cancer samples and only 5 non-cancer samples in the data, a particular classifier might classify all the observations as having cancer. The overall accuracy would be 95%, but in more detail the classifier would have a 100% recognition rate (sensitivity) for the cancer class but a 0% recognition rate for the non-cancer class. F1 score is even more unreliable in such cases, and here would yield over 97.4%, whereas informedness removes such bias and yields 0 as the probability of an informed decision for any form of guessing (here always guessing cancer).

According to Davide Chicco and Giuseppe Jurman, the most informative metric to evaluate a confusion matrix is the Matthews correlation coefficient (MCC).^[11]

Other metrics can be included in a confusion matrix, each of them having their significance and use.

Template:Diagnostic testing diagram

Some researchers have argued that the confusion matrix, and the metrics derived from it, do not truly reflect a model's knowledge. In particular, the confusion matrix cannot show whether correct predictions were reached through sound reasoning or merely by chance (a problem known in philosophy as epistemic luck). It also does not capture situations where the facts used to make a prediction later change or turn out to be wrong (defeasibility). This means that while the confusion matrix is a useful tool for measuring classification performance, it may give an incomplete picture of a model’s true reliability.^[12]

Confusion matrices with more than two categories

Confusion matrix is not limited to binary classification and can be used in multi-class classifiers as well. The confusion matrices discussed above have only two conditions: positive and negative. For example, the table below summarizes communication of a whistled language between two speakers, with zero values omitted for clarity.^[13]

Template:Diagonal split header 2	i	a	o	u
i	15	1
e	1	1
a		79	5
o		4	15	3
u			2	2

Confusion matrices in multi-label and soft-label classification

Confusion matrices are not limited to single-label classification (where only one class is present) or hard-label settings (where classes are either fully present, 1, or absent, 0). They can also be extended to Multi-label classification (where multiple classes can be predicted at once) and soft-label classification (where classes can be partially present).

One such extension is the Transport-based Confusion Matrix (TCM),^[14] which builds on the theory of optimal transport and the principle of maximum entropy. TCM applies to single-label, multi-label, and soft-label settings. It retains the familiar structure of the standard confusion matrix: a square matrix sized by the number of classes, with diagonal entries indicating correct predictions and off-diagonal entries indicating confusion. In the single-label case, TCM is identical to the standard confusion matrix.

TCM follows the same reasoning as the standard confusion matrix: if class A is overestimated (its predicted value is greater than its label value) and class B is underestimated (its predicted value is less than its label value), A is considered confused with B, and the entry (B, A) is increased. If a class is both predicted and present, it is correctly identified, and the diagonal entry (A, A) increases. Optimal transport and maximum entropy are used to determine the extent to which these entries are updated.^[14]

TCM enables clearer comparison between predictions and labels in complex classification tasks, while maintaining a consistent matrix format across settings.^[14]

References

↑ Stehman, Stephen V. (1997). "Selecting and interpreting measures of thematic classification accuracy". Remote Sensing of Environment 62 (1): 77–89. doi:10.1016/S0034-4257(97)00083-7. Bibcode: 1997RSEnv..62...77S.
↑ ^2.0 ^2.1 Powers, David M. W. (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation". Journal of Machine Learning Technologies 2 (1): 37–63. https://www.researchgate.net/publication/228529307.
↑ Opitz, Juri (2024). "A Closer Look at Classification Evaluation Metrics and a Critical Reflection of Common Evaluation Practice". Transactions of the Association for Computational Linguistics 12: 820–836. doi:10.1162/tacl_a_00675. https://doi.org/10.1162/tacl_a_00675.
↑ Dobson, James (2024). "On the Confusion Matrix". Configurations 32 (4): 331–350. doi:10.1353/con.2024.a942087. https://doi.org/10.1353/con.2024.a942087.
↑ Provost, Foster; Fawcett, Tom (2013). Data science for business: what you need to know about data mining and data-analytic thinking (1. ed., 2. release ed.). Beijing Köln: O'Reilly. ISBN 978-1-4493-6132-7.
↑ Fawcett, Tom (2006). "An Introduction to ROC Analysis". Pattern Recognition Letters 27 (8): 861–874. doi:10.1016/j.patrec.2005.10.010. Bibcode: 2006PaReL..27..861F. http://people.inf.elte.hu/kiss/11dwhdm/roc.pdf.
↑ Ting, Kai Ming (2011). Sammut, Claude. ed. Encyclopedia of machine learning. Springer. doi:10.1007/978-0-387-30164-8. ISBN 978-0-387-30164-8.
↑ Brooks, Harold; Brown, Barb; Ebert, Beth; Ferro, Chris; Jolliffe, Ian; Koh, Tieh-Yong; Roebber, Paul; Stephenson, David (2015-01-26). "WWRP/WGNE Joint Working Group on Forecast Verification Research". World Meteorological Organisation. https://www.cawcr.gov.au/projects/verification/.
↑ "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation". BMC Genomics 21 (1): 6-1–6-13. January 2020. doi:10.1186/s12864-019-6413-7. PMID 31898477.
↑ Tharwat A. (August 2018). "Classification assessment methods". Applied Computing and Informatics 17: 168–192. doi:10.1016/j.aci.2018.08.003.
↑ "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation". BMC Genomics 21 (1): 6-1–6-13. January 2020. doi:10.1186/s12864-019-6413-7. PMID 31898477.
↑ van der Linde, Ian (2025). "Why the confusion matrix fails as a model of knowledge". AI & Society. doi:10.1007/s00146-025-02456-x.
↑ Rialland, Annie (August 2005). "Phonological and phonetic aspects of whistled languages". Phonology 22 (2): 237–271. doi:10.1017/S0952675705000552.
↑ ^14.0 ^14.1 ^14.2 Erbani, Johan; Portier, Pierre-Édouard; Egyed-Zsigmond, Előd; Nurbakova, Diana (2024). "Confusion Matrices: A Unified Theory". IEEE Access (IEEE) 12: 181372–181419. doi:10.1109/ACCESS.2024.3507199. ISSN 2169-3536. Bibcode: 2024IEEEA..12r1372E.

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Confusion matrix. Read more

[1] Stehman, Stephen V. (1997). "Selecting and interpreting measures of thematic classification accuracy". Remote Sensing of Environment 62 (1): 77–89. doi:10.1016/S0034-4257(97)00083-7. Bibcode: 1997RSEnv..62...77S.

[Powers2011-2] 2.0 ^2.1 Powers, David M. W. (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation". Journal of Machine Learning Technologies 2 (1): 37–63. https://www.researchgate.net/publication/228529307.

[3] Opitz, Juri (2024). "A Closer Look at Classification Evaluation Metrics and a Critical Reflection of Common Evaluation Practice". Transactions of the Association for Computational Linguistics 12: 820–836. doi:10.1162/tacl_a_00675. https://doi.org/10.1162/tacl_a_00675.

[4] Dobson, James (2024). "On the Confusion Matrix". Configurations 32 (4): 331–350. doi:10.1353/con.2024.a942087. https://doi.org/10.1353/con.2024.a942087.

[5] Provost, Foster; Fawcett, Tom (2013). Data science for business: what you need to know about data mining and data-analytic thinking (1. ed., 2. release ed.). Beijing Köln: O'Reilly. ISBN 978-1-4493-6132-7.

[6] Fawcett, Tom (2006). "An Introduction to ROC Analysis". Pattern Recognition Letters 27 (8): 861–874. doi:10.1016/j.patrec.2005.10.010. Bibcode: 2006PaReL..27..861F. http://people.inf.elte.hu/kiss/11dwhdm/roc.pdf.

[7] Ting, Kai Ming (2011). Sammut, Claude. ed. Encyclopedia of machine learning. Springer. doi:10.1007/978-0-387-30164-8. ISBN 978-0-387-30164-8.

[8] Brooks, Harold; Brown, Barb; Ebert, Beth; Ferro, Chris; Jolliffe, Ian; Koh, Tieh-Yong; Roebber, Paul; Stephenson, David (2015-01-26). "WWRP/WGNE Joint Working Group on Forecast Verification Research". World Meteorological Organisation. https://www.cawcr.gov.au/projects/verification/.

[9] "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation". BMC Genomics 21 (1): 6-1–6-13. January 2020. doi:10.1186/s12864-019-6413-7. PMID 31898477.

[10] Tharwat A. (August 2018). "Classification assessment methods". Applied Computing and Informatics 17: 168–192. doi:10.1016/j.aci.2018.08.003.

[11] "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation". BMC Genomics 21 (1): 6-1–6-13. January 2020. doi:10.1186/s12864-019-6413-7. PMID 31898477.

[vdl2025-12] van der Linde, Ian (2025). "Why the confusion matrix fails as a model of knowledge". AI & Society. doi:10.1007/s00146-025-02456-x.

[13] Rialland, Annie (August 2005). "Phonological and phonetic aspects of whistled languages". Phonology 22 (2): 237–271. doi:10.1017/S0952675705000552.

[Erbani2024-14] 14.0 ^14.1 ^14.2 Erbani, Johan; Portier, Pierre-Édouard; Egyed-Zsigmond, Előd; Nurbakova, Diana (2024). "Confusion Matrices: A Unified Theory". IEEE Access (IEEE) 12: 181372–181419. doi:10.1109/ACCESS.2024.3507199. ISSN 2169-3536. Bibcode: 2024IEEEA..12r1372E.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

v t e Matrix classes
Explicitly constrained entries	(0,1) Alternant Anti-diagonal Anti-Hermitian Anti-symmetric Arrowhead Band Bidiagonal Binary Bisymmetric Block-diagonal Block Block tridiagonal Boolean Cauchy Centrosymmetric Conference Complex Hadamard Copositive Diagonally dominant Diagonal Discrete Fourier Transform Elementary Equivalent Frobenius Generalized permutation Hadamard Hankel Hermitian Hessenberg Hollow Integer Logical Markov Metzler Monomial Moore Nonnegative Partitioned Parisi Pentadiagonal Permutation Persymmetric Polynomial Positive Quaternionic Sign Signature Skew-Hermitian Skew-symmetric Skyline Sparse Sylvester Symmetric Toeplitz Triangular Tridiagonal Unitary Vandermonde Walsh Z
Constant	Exchange Hilbert Identity Lehmer Of ones Pascal Pauli Redheffer Shift Zero
Conditions on eigenvalues or eigenvectors	Companion Convergent Defective Diagonalizable Hurwitz Positive-definite Stability Stieltjes
Satisfying conditions on products or inverses	Congruent Idempotent or Projection Invertible Involutory Nilpotent Normal Orthogonal Orthonormal Singular Unimodular Unipotent Totally unimodular Weighing
With specific applications	Adjugate Alternating sign Augmented Bézout Carleman Cartan Circulant Cofactor Commutation Confusion Coxeter Derogatory Distance Duplication Elimination Euclidean distance Fundamental (linear differential equation) Generator Gramian Hessian Householder Jacobian Moment Payoff Pick Random Rotation Seifert Shear Similarity Symplectic Totally positive Transformation Wedderburn X–Y–Z
Used in statistics	Bernoulli Centering Correlation Covariance Design Dispersion Doubly stochastic Fisher information Hat Precision Stochastic Transition
Used in graph theory	Adjacency Biadjacency Degree Edmonds Incidence Laplacian Seidel adjacency Skew-adjacency Tutte
Used in science and engineering	Cabibbo–Kobayashi–Maskawa Density Fundamental (computer vision) Fuzzy associative Gamma Gell-Mann Hamiltonian Irregular Overlap S State transition Substitution Z (chemistry)
Related terms	Jordan canonical form Linear independence Matrix exponential Matrix representation of conic sections Perfect matrix Pseudoinverse Quaternionic matrix Row echelon form Wronskian
List of matrices Category:Matrices

Anonymous

Search

Confusion matrix

Namespaces

More

Page actions

Contents

Example

Table of confusion

Confusion matrices with more than two categories

Confusion matrices in multi-label and soft-label classification

See also

References

Navigation

Navigation

Resources

Help

googletranslator

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Confusion matrix

Example

Table of confusion

Confusion matrices with more than two categories

Confusion matrices in multi-label and soft-label classification

See also

References

Navigation

Wiki tools

Page tools

Other projects

Categories