Simangunsong, Daisy Sere Damara
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Enhancing Single Nucleotide Polymorphisms Detection from Imbalanced Data: A Study of Resampling Techniques in Machine Learning Algorithms Nurhasanah, Rossy; Arisandi, Dedy; Purnamasari, Fanindia; Hayatunnufus, Hayatunnufus; Simangunsong, Daisy Sere Damara; Pulungan, Aflah Mutsanni
Indonesian Journal of Artificial Intelligence and Data Mining Vol 8, No 1 (2025): March 2025
Publisher : Universitas Islam Negeri Sultan Syarif Kasim Riau

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24014/ijaidm.v8i1.32942

Abstract

Identifying the actual Single Nucleotide Polymorphisms (SNPs) by sourcing Next Generation Sequencing (NGS) data emerges an imbalanced problem due to the inherent high error rate of NGS technology. The imbalance problem has been found to have a negative impact on machine learning algorithms because it produces biased models and poor performance, particularly in detecting actual SNP that belong to the underrepresented class in question.   This study evaluates the effectiveness of several resampling techniques, including Borderline-SMOTE, Random Undersampling, and Tomek-Link, in enhancing the performance of machine learning algorithms, specifically Random Forest (RF) and Artificial Neural Networks (ANN). Furthermore, we compare these techniques to determine the most effective approach. Our results indicate that Borderline-SMOTE improves the F-Measure of RF from 69.72 to 91.52 (a 31.2% increase) and ANN from 79.75 to 91.32 (a 14.5% increase) and outperforms other resampling methods. These findings highlight the crucial role of resampling techniques and the careful selection of algorithms in improving classification accuracy for imbalanced datasets.