Enhanced Text Clustering Approach using Hierarchical Agglomerative Clustering with Principal Components Analysis to Design Document Recommendation System
Keywords:Data Mining, Hierarchical Agglomerative Clustering, Principal Components Analysis, Text Clustering
Considering the increased usage and our increasing dependency in today’s world on electronic data, substantial part of which is in textual form, it becomes necessary to devise scientific methods to infer and extract knowledge from such abundant electronic documents for strategic decision making in any target domain under consideration. The purpose of this study is to develop a common platform where all the similar text from multiple source documents from internet can be fetched and grouped using text mining and document clustering techniques. This chapter elaborates the method of hierarchical agglomerative text clustering approach to identify similar groups within documents. The method of Principal Components Analysis on text data is also further elaborated. Further combination of the two methods is proposed to find suitable clusters in text data and the results obtained show better quality clusters. For the purpose of experiments, plot summaries of movies from Wikipedia are used as the source document corpus. Various document pre-processing techniques are also explained and applied to the documents. The proposed method to get suitable clusters of similar movies can be used for recommendation to users. R programming is used for implementation of algorithms and visualization of the results.
How to Cite
Copyright (c) 2021 Grinrey Publications
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.