Claim Missing Document
Check
Articles

Found 2 Documents
Search

Evaluating the Impact of Data Balancing Techniques on the k-Nearest Neighbors Algorithm for Microarray Data Classification Febi Nur Salisah; Inggih Permana; Sanusi; Shir Li Wang
Jurnal Inotera Vol. 10 No. 2 (2025): July - December 2025
Publisher : LPPM Politeknik Aceh Selatan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31572/inotera.Vol10.Iss2.2025.ID497

Abstract

Microarray data classification poses significant challenges in bioinformatics due to the nature of the data, which has a very high number of features but a limited number of samples, and an unbalanced class distribution. This condition can cause a decrease in the performance of classification models, including k-Nearest Neighbor (kNN). This study aims to evaluate the performance of the kNN algorithm in classifying unbalanced and balanced data. The balancing techniques used are Random Undersampling (RUS), Random Oversampling (ROS), and Synthetic Minority Over-sampling Technique (SMOTE). The datasets used in this study are three leukemia datasets with different class structures, namely two, three, and four classes. The experimental results show that the ROS and SMOTE techniques consistently improve the performance of kNN, with the best accuracy reaching more than 97%. In the two-class dataset, ROS gave the best performance (99.4%), while in the three-class dataset, SMOTE showed the most optimal results (98.5%). In the four-class dataset, the performance improvement due to balancing was very significant; SMOTE and ROS were able to improve the accuracy from 89.7% (without balancing) to 99.0% and 98.8%, respectively. Although RUS recorded perfect accuracy of 100%, the results were anomalous and inconsistent. RUS showed less stable performance and was often lower than the condition without balancing, especially on datasets with four classes. Overall, the SMOTE technique proved to be the most stable and effective for various class structures. This study shows the importance of balancing strategies in the classification of complex and imbalanced microarray data.
Sensitivity Analysis of Parameter Control in Leukemia Classification Using Variable-Length Particle Swarm Optimization Ramadhani, Siti; Handayani, Lestari Handayani; Muhammad Fikri; Theam Foo Ng; Sumayyah Dzulkifly; Roziana Ariffin; Shir Li Wang
Digital Zone: Jurnal Teknologi Informasi dan Komunikasi Vol. 16 No. 2 (2025): Digital Zone: Jurnal Teknologi Informasi dan Komunikasi
Publisher : Publisher: Fakultas Ilmu Komputer, Institution: Universitas Lancang Kuning

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31849/digitalzone.v16i2.27473

Abstract

Machine learning has the potential to support hematologists in classifying leukemia by identifying abnormal chromosomes and specific gene markers. One effective technique for feature selection is Variable-Length Particle Swarm Optimization (VLPSO), where its performance depends heavily on parameter control, specifically the inertia weight (w) and acceleration factors (c), which regulate the search process. In previous VLPSO, static types of parameter control were applied to the  Factor, and time-varying types were used by the   Factor. Although its results showed good performance in VLPSO, there was no separation in the treatment of training data and test data, leaving a gap in understanding their impacts for real-world applications.  This study explores how different parameter control strategies (static, time-varying, and adaptive) affect the performance of VLPSO with two comparison adaptive parameter control approaches, Adaptive 1 and Adaptive 2, in the VLPSO framework, each designed to dynamically adjust the control parameters w and c in different ways. The 10-fold cross-validation shows that VLPSO with an Adaptive one-parameter setting achieves better generalization with low train-test differences, especially in Decision Tree and Naïve Bayes classifiers, though with higher variability. Adaptive 2-parameter setting of VLPSO offers more consistent results with narrower variability across different settings. Static methods are the least reliable, while time-varying controls show moderate but unstable performance. Adaptive parameter tuning is recommended to improve VLPSO's stability, flexibility, and classification accuracy in biomedical applications. The results provide recommendations for parameter settings using an adaptive approach that has been proven to enhance the performance of VLPSO