DMelt:Statistics/5 Statistical classification

From HandWiki
Limitted access. First login to DataMelt member area if you are a full DataMelt member.


Statistical classification

Statistical classification is technique used to predict group membership for data instances. DMelt includes several libraries for data classifications. This techniques refers to a procedure of assigning a given input object into one of a given number of categories. It supports K-Nearest Neighbor, Linear Discriminant Analysis (LDA), Fisher's Linear Discriminant (FLD), Quadratic Discriminant analysis (QDA), Regularized Discriminant Analysis (RDA), Logistic Regression (LR), Maximum Entropy Classifier, Multilayer Perceptron Neural Network, Radial Basis Function Networks and many other libraries. Many of such algorithms are included via the Smile Java project (http://haifengl.github.io/). Here is a number of examples using python-like approach.


Bayesian classification

Naive Bayes_classifier is a technique for constructing models that assign class labels to problem instances, represented as vectors of feature values, where the class labels are drawn from some finite set.

This method of classification of data is implemented in the 3rd party jsat.classifiers.bayesian.NaiveBayes jsat.classifiers.bayesian.NaiveBayes Java class. Let us consider an example that classifies IRIS data[1]. It reads data and then it attempts to predict correct labels of such data:

No access. To show this code, login to DataMelt member area

When you run this code, you will see that we can predict the correct category of data within 4% error.


See also

References

  1. Fisher,R.A. "The use of multiple measurements in taxonomic problems" Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to Mathematical Statistics" (John Wiley, NY, 1950)