Nearest centroid classifier

Short description: A classification model in machine learning based on centroids

In machine learning, a nearest centroid classifier or nearest prototype classifier is a classification model that assigns to observations the label of the class of training samples whose mean (centroid) is closest to the observation. When applied to text classification using word vectors containing tf*idf weights to represent documents, the nearest centroid classifier is known as the Rocchio classifier because of its similarity to the Rocchio algorithm for relevance feedback.^[1]

An extended version of the nearest centroid classifier has found applications in the medical domain, specifically classification of tumors.^[2]

Algorithm

Training

Given labeled training samples ${({\vec{x}}_{1}, y_{1}), \dots, ({\vec{x}}_{n}, y_{n})}$ with class labels $y_{i} \in 𝐘$ , compute the per-class centroids ${\vec{μ}}_{ℓ} = \frac{1}{| C_{ℓ} |} \sum_{i \in C_{ℓ}} {\vec{x}}_{i}$ where $C_{ℓ}$ is the set of indices of samples belonging to class $ℓ \in 𝐘$ .

Prediction

The class assigned to an observation $\vec{x}$ is $\hat{y} = {\arg \min}_{ℓ \in 𝐘} ‖ {\vec{μ}}_{ℓ} - \vec{x} ‖$ .

References

↑ Manning, Christopher; Raghavan, Prabhakar; Schütze, Hinrich (2008). "Vector space classification". Introduction to Information Retrieval. Cambridge University Press. http://nlp.stanford.edu/IR-book/html/htmledition/rocchio-classification-1.html.
↑ Tibshirani, Robert; Hastie, Trevor; Narasimhan, Balasubramanian; Chu, Gilbert (2002). "Diagnosis of multiple cancer types by shrunken centroids of gene expression". Proceedings of the National Academy of Sciences 99 (10): 6567–6572. doi:10.1073/pnas.082099299. PMID 12011421.

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Nearest centroid classifier. Read more

[1] Manning, Christopher; Raghavan, Prabhakar; Schütze, Hinrich (2008). "Vector space classification". Introduction to Information Retrieval. Cambridge University Press. http://nlp.stanford.edu/IR-book/html/htmledition/rocchio-classification-1.html.

[2] Tibshirani, Robert; Hastie, Trevor; Narasimhan, Balasubramanian; Chu, Gilbert (2002). "Diagnosis of multiple cancer types by shrunken centroids of gene expression". Proceedings of the National Academy of Sciences 99 (10): 6567–6572. doi:10.1073/pnas.082099299. PMID 12011421.

[1]

[2]

Anonymous

Search

Nearest centroid classifier

Namespaces

More

Page actions

Contents

Algorithm

Training

Prediction

See also

References

Navigation

Navigation

Resources

Help

googletranslator

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Nearest centroid classifier

Algorithm

Training

Prediction

See also

References

Navigation

Wiki tools

Page tools

Other projects

Categories