The increasing volume of customer support tickets in the e-commerce industry creates significant challenges in terms of efficiently managing unstructured text data. Traditional manual categorization methods are no longer efficient or scalable in managing well with growing data. This study proposes a text mining framework that integrates Natural Language Processing (NLP) techniques with Agglomerative Hierarchical Clustering (AHC) to automatically group customer support tickets based on their textual content similarity. The framework includes preprocessing (cleaning, tokenization, stopword removal, and lemmatization), followed by feature extraction using Term Frequency–Inverse Document Frequency (TF-IDF), and dimensionality reduction using Principal Component Analysis (PCA). The clustered data is then visualized through dendrograms and evaluated using silhouette scores to determine the optimal number of clusters. Using a real-world dataset of 8.469 support tickets, the framework identified an optimal two-cluster configuration, distinguishing between general inquiries and specific error-related complaints. Unlike previous studies by using K-Means or DBSCAN, this framework leverage the hierarchical structure to capture nuanced textual similarities without requiring cluster number in the beginning. It also introduces integrated for evaluation and visualization pipeline tailored for operational customer use. However, because AHC has high computational complexity, this approach is more suitable for daily clustering batches than for real-time processing. Alternatives such as Mini-Batch K-Means also need to be considered for more efficient implementation. This study contributes to the development of an automated triage system and strategies for improving customer experience in digital platforms
Copyrights © 2025