The effective segmentation of travel data is crucial for deriving actionable insights in the tourism and hospitality sectors. This study conducts a comprehensive evaluation of four clustering algorithms Agglomerative Clustering, DBSCAN, Gaussian Mixture Models (GMM), and KMeans on a travel dataset, using three widely recognized metrics: Silhouette Score, Davies-Bouldin Index, and Calinski-Harabasz Score. The dataset was preprocessed through standardization and dimensionality reduction via Principal Component Analysis (PCA) to facilitate visualization and ensure computational efficiency. The results highlight significant differences in the performance of these algorithms. Agglomerative Clustering achieved the highest Silhouette Score, indicating superior cluster cohesion and separation, while KMeans recorded the highest Calinski-Harabasz Score, demonstrating strong inter-cluster variance. In contrast, DBSCAN performed poorly, producing low scores across all metrics, attributed to sensitivity to parameter selection and density irregularities in the dataset. Gaussian Mixture Models exhibited moderate performance but struggled with overlapping clusters due to limitations in modeling non-Gaussian data distributions. Visualization of clustering results confirmed these findings, revealing compact clusters for Agglomerative and KMeans, while DBSCAN and GMM showed less defined structures. This study underscores the importance of selecting clustering algorithms based on dataset characteristics and analysis objectives
Copyrights © 2025