This Author published in this journals
All Journal Jurnal INFOTEL
Muhamad Hanif Rafiq Sulaeman
Telkom University, Indonesia

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Machine Learning-Based Malware Detection: A Critical Comparative Analysis of Random Forest, Naive Bayes, and Neural Network on Imbalanced Datasets Muhamad Hanif Rafiq Sulaeman; Rakhmad Maulidi
JURNAL INFOTEL Vol 17 No 4 (2025): November
Publisher : LPPM INSTITUT TEKNOLOGI TELKOM PURWOKERTO

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.20895/infotel.v17i4.1413

Abstract

Malware detection remains a major challenge in cybersecurity as threats become increasingly complex. This study critically compares three machine learning algorithms Random Forest, Naive Bayes, and Neural Network for automated malware detection using a large, imbalanced dataset (131,574 samples, 57 features). Class imbalance is addressed with SMOTE (Synthetic Minority Oversampling Technique), and preprocessing includes feature selection (SelectKBest), normalization (StandardScaler), and outlier handling. Evaluation metrics include accuracy, Precision, recall, F1-score, and AUC-ROC, using 5-fold cross-validation. Results show Random Forest achieves the highest accuracy (98%, AUC-ROC 0.998), followed by Neural Network (95%, AUC-ROC 0.95), and Naive Bayes (93%, minority class recall 0.80). Feature analysis identifies ImageBase and ResourcesMinSize as key contributors. This study highlights the effectiveness of ensemble methods and the critical importance of addressing class imbalance for robust malware detection. Limitations and implications for real-world deployment are discussed.