Enhanced Text Clustering Approach using Hierarchical Agglomerative Clustering with Principal Components Analysis to Design Document Recommendation System

Authors

  • Gauri Chaudhary Department of Computer Technology, Yeshwantrao Chavan College of Engineering, Nagpur, India
  • Manali Kshirsagar Rajiv Gandhi College of Engineering and Research, Wanadongri, Nagpur, India

Keywords:

Data Mining, Hierarchical Agglomerative Clustering, Principal Components Analysis, Text Clustering

Abstract

Considering the increased usage and our increasing dependency in today’s world on electronic data, substantial part of which is in textual form, it becomes necessary to devise scientific methods to infer and extract knowledge from such abundant electronic documents for strategic decision making in any target domain under consideration. The purpose of this study is to develop a common platform where all the similar text from multiple source documents from internet can be fetched and grouped using text mining and document clustering techniques. This chapter elaborates the method of hierarchical agglomerative text clustering approach to identify similar groups within documents. The method of Principal Components Analysis on text data is also further elaborated. Further combination of the two methods is proposed to find suitable clusters in text data and the results obtained show better quality clusters. For the purpose of experiments, plot summaries of movies from Wikipedia are used as the source document corpus. Various document pre-processing techniques are also explained and applied to the documents. The proposed method to get suitable clusters of similar movies can be used for recommendation to users. R programming is used for implementation of algorithms and visualization of the results.

Downloads

Published

2021-03-30

How to Cite

Chaudhary, G., & Kshirsagar, M. (2021). Enhanced Text Clustering Approach using Hierarchical Agglomerative Clustering with Principal Components Analysis to Design Document Recommendation System. Research Transcripts in Computer, Electrical and Electronics Engineering, 2, 1–18. Retrieved from https://grinrey.com/journals/index.php/rtceee/article/view/9