Claim Missing Document
Check
Articles

Found 1 Documents
Search

Evaluasi Performa Gaussian Mixture Model dan K-Means terhadap Ketidakseimbangan Data pada Clustering Yuri, Muhammad Farrel Evan; Pasaribu, Farzad Sahnadi; Subuh, Arung Buana; Naibaho, Muhammad Hafif; Piliang, Arnita
Jurnal Ilmu Komputer dan Informatika | E-ISSN : 3063-9026 Vol. 2 No. 4 (2026): April - Juni
Publisher : GLOBAL SCIENTS PUBLISHER

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Data imbalance represents a primary challenge in clustering analysis, particularly in datasets with highly disproportionate class distributions such as the Credit Card Fraud Detection dataset from Kaggle. This study aims to evaluate and compare the performance of the Gaussian Mixture Model (GMM) and K-Means algorithms under such conditions through a systematic literature review of nine prior studies. Clustering quality is evaluated using three internal validation metrics: Silhouette Score, Davies-Bouldin Index (DBI), and Calinski-Harabasz Index (CHI). The findings indicate that GMM consistently produces more stable and flexible clusters in data with overlapping distributions, as its probabilistic approach through the Expectation-Maximization (EM) algorithm allows each data point to hold multiple cluster membership probabilities. In contrast, K-Means produces sharper cluster boundaries with lower computational complexity, yet remains sensitive to outliers and the spherical distribution assumption frequently unmet in imbalanced data. The dominance of the majority class risks distorting K-Means centroids, resulting in suboptimal detection of fraudulent transactions, whereas GMM proves more adaptive for this scenario despite its higher computational cost.