Class imbalance in datasets is a significant challenge in machine learning, often leading to a decline in model performance. This issue is frequently encountered in real-world data, where the proportion between majority and minority classes is highly imbalanced. One common approach to address this problem is oversampling, which aims to balance class distribution by adding synthetic data to the minority class. The most popular oversampling technique is the Synthetic Minority Oversampling Technique (SMOTE), although this method has drawbacks such as producing less diverse data and the potential generation of outliers. As an alternative solution, this study proposes the use of the Latin Hypercube Sampling (LHS) method combined with k-Nearest Neighbor (k-NN) to enhance classification performance on imbalanced datasets. The combination of LHS and k-NN is expected to produce higher quality synthetic data, thereby improving the performance of classification models measured using the confusion matrix. The data used in this study is sourced from various online repositories such as KEEL, Kaggle, UCI, as well as the student specialization of vocational high school (SMK) students in Pekanbaru
Copyrights © 2024