List of text mining methods

Short description: none

Text mining methods are different forms of text mining whose usage is based on their suitability for a given data set. Text mining is the process of extracting data from unstructured text and finding patterns or relations. Below is a list of text mining methodologies.

Centroid-based Clustering: Unsupervised learning method. Clusters are determined based on data points.^[1]
- Fast Global K-Means: Made to accelerate Global K-Means.^[2]
- Global K-Means: Global K-Means is an algorithm that begins with one cluster, and then divides into multiple clusters based on the number required.^[2]
- K-Means: An algorithm that requires two parameters: K, a number of clusters, and a set of data.^[2]
- FW-K-Means: Used with vector space model. Uses the methodology of weight to decrease noise.^[2]
- Two-Level-K-Means: Regular K-Means algorithm takes place first. Clusters are then selected for subdivision into subclasses if they do not reach the threshold.^[2] thumb
Cluster Algorithm
- Hierarchical Clustering
  - Agglomerative Clustering: Bottom-up approach. Each cluster starts small and then aggregates together to form larger clusters.^[3]
  - Divisive Clustering: Top-down approach. Large clusters are split into smaller clusters.^[3]
- Density-based Clustering: A structure is determined by the density of data points.^[4]
  - DBSCAN
- thumbDistribution-based Clustering: Clusters are formed based on mathematical methods from data.^[1]thumb
  - Expectation-maximization algorithm
Collocation
Stemming Algorithm
- Truncating Methods: Removing the suffix or prefix of a word.
  - Lovins Stemmer: Removes longest suffix.
  - Porters Stemmer: Allows programmers to stem words based on their own criteria.
- Statistical Methods: Statistical procedure is involved and typically results in affixes being removed.
  - N-Gram Stemmer: A set of n characters that are consecutive taken from a word
  - Hidden Markov Model (HMM) Stemmer: Moves between states are based on probability functions.
  - Yet Another Suffix Stripper (YASS) Stemmer: Hierarchal approach in creating clusters. Clusters are then considered a set of elements in classes and their centroids are the stems.
- Inflectional & Derivational Methods
  - Krovetz Stemmer: Changes words to word stems that are valid English words.
  - Xerox Stemmer: Removes prefixes.^[5]
Term Frequency
- Term Frequency Inverse Document Frequency
Topic Modeling
- Latent Semantic Analysis (LSA)
- Latent Dirichlet Allocation (LDA)
- Non-Negative Matrix Factorization (NMF)
- Bidirectional Encoder Representations from Transformers (BERT)
Wordscores: First estimates scores on word types based on a reference text. Then applies wordscores to a text that is not a reference text to get a document score. Lastly, documents that are not referenced are rescaled to then compare to the reference text.^[6]

References

↑ ^1.0 ^1.1 "Different Types of Clustering Algorithm" (in en-US). 2018-01-15. https://www.geeksforgeeks.org/different-types-clustering-algorithm/.
↑ ^2.0 ^2.1 ^2.2 ^2.3 ^2.4 Jalil, Abdennour Mohamed; Hafidi, Imad; Alami, Lamiae; Khouribga, Ensa (2016). "Comparative Study of Clustering Algorithms in Text Mining Context" (in en). International Journal of Interactive Multimedia and Artificial Intelligence 3 (7): 42. doi:10.9781/ijimai.2016.376. ISSN 1989-1660. https://reunir.unir.net/bitstream/123456789/11227/1/ijimai20163_7_6_pdf_27159.pdf.
↑ ^3.0 ^3.1 "Agglomerative Methods in Machine Learning" (in en-US). 2021-02-01. https://www.geeksforgeeks.org/agglomerative-methods-in-machine-learning/.
↑ Hahsler, Michael. "dbscan: Fast Density-based Clustering with R". https://cran.r-project.org/web/packages/dbscan/vignettes/dbscan.pdf.
↑ Ganesh Jivani, Anjali. "A Comparative Study of Stemming Algorithms". https://kenbenoit.net/assets/courses/tcd2014qta/readings/Jivani_ijcta2011020632.pdf.
↑ Lowe, Will (2008). "Understanding Wordscores". Methods and Data Institute, School of Politics and International Relations, University of Nottingham, Nottingham. doi:10.2139/ssrn.1095280. ISSN 1556-5068. https://faculty.washington.edu/jwilker/559/Lowe.pdf.

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/List of text mining methods. Read more

[:0-1] 1.0 ^1.1 "Different Types of Clustering Algorithm" (in en-US). 2018-01-15. https://www.geeksforgeeks.org/different-types-clustering-algorithm/.

[:1-2] 2.0 ^2.1 ^2.2 ^2.3 ^2.4 Jalil, Abdennour Mohamed; Hafidi, Imad; Alami, Lamiae; Khouribga, Ensa (2016). "Comparative Study of Clustering Algorithms in Text Mining Context" (in en). International Journal of Interactive Multimedia and Artificial Intelligence 3 (7): 42. doi:10.9781/ijimai.2016.376. ISSN 1989-1660. https://reunir.unir.net/bitstream/123456789/11227/1/ijimai20163_7_6_pdf_27159.pdf.

[:2-3] 3.0 ^3.1 "Agglomerative Methods in Machine Learning" (in en-US). 2021-02-01. https://www.geeksforgeeks.org/agglomerative-methods-in-machine-learning/.

[4] Hahsler, Michael. "dbscan: Fast Density-based Clustering with R". https://cran.r-project.org/web/packages/dbscan/vignettes/dbscan.pdf.

[5] Ganesh Jivani, Anjali. "A Comparative Study of Stemming Algorithms". https://kenbenoit.net/assets/courses/tcd2014qta/readings/Jivani_ijcta2011020632.pdf.

[6] Lowe, Will (2008). "Understanding Wordscores". Methods and Data Institute, School of Politics and International Relations, University of Nottingham, Nottingham. doi:10.2139/ssrn.1095280. ISSN 1556-5068. https://faculty.washington.edu/jwilker/559/Lowe.pdf.

[1]

[2]

[3]

[4]

[5]

[6]

Anonymous

Search

List of text mining methods

Namespaces

More

Page actions

References

Navigation

Navigation

Help

googletranslator

Navigation

Wiki tools

Wiki tools

Anonymous

Search

List of text mining methods

References

Navigation

Wiki tools

Page tools

Other projects

Categories