Jurnal INFOTEL
Vol 17 No 4 (2025): November

Machine Learning-Based Malware Detection: A Critical Comparative Analysis of Random Forest, Naive Bayes, and Neural Network on Imbalanced Datasets

Muhamad Hanif Rafiq Sulaeman (Telkom University, Indonesia)
Rakhmad Maulidi (Telkom University, Indonesia)



Article Info

Publish Date
18 Dec 2025

Abstract

Malware detection remains a major challenge in cybersecurity as threats become increasingly complex. This study critically compares three machine learning algorithms Random Forest, Naive Bayes, and Neural Network for automated malware detection using a large, imbalanced dataset (131,574 samples, 57 features). Class imbalance is addressed with SMOTE (Synthetic Minority Oversampling Technique), and preprocessing includes feature selection (SelectKBest), normalization (StandardScaler), and outlier handling. Evaluation metrics include accuracy, Precision, recall, F1-score, and AUC-ROC, using 5-fold cross-validation. Results show Random Forest achieves the highest accuracy (98%, AUC-ROC 0.998), followed by Neural Network (95%, AUC-ROC 0.95), and Naive Bayes (93%, minority class recall 0.80). Feature analysis identifies ImageBase and ResourcesMinSize as key contributors. This study highlights the effectiveness of ensemble methods and the critical importance of addressing class imbalance for robust malware detection. Limitations and implications for real-world deployment are discussed.

Copyrights © 2025






Journal Info

Abbrev

infotel

Publisher

Subject

Computer Science & IT Electrical & Electronics Engineering

Description

Jurnal INFOTEL is a scientific journal published by Lembaga Penelitian dan Pengabdian Masyarakat (LPPM) of Institut Teknologi Telkom Purwokerto, Indonesia. Jurnal INFOTEL covers the field of informatics, telecommunication, and electronics. First published in 2009 for a printed version and published ...