Malware detection remains a major challenge in cybersecurity as threats become increasingly complex. This study critically compares three machine learning algorithms Random Forest, Naive Bayes, and Neural Network for automated malware detection using a large, imbalanced dataset (131,574 samples, 57 features). Class imbalance is addressed with SMOTE (Synthetic Minority Oversampling Technique), and preprocessing includes feature selection (SelectKBest), normalization (StandardScaler), and outlier handling. Evaluation metrics include accuracy, Precision, recall, F1-score, and AUC-ROC, using 5-fold cross-validation. Results show Random Forest achieves the highest accuracy (98%, AUC-ROC 0.998), followed by Neural Network (95%, AUC-ROC 0.95), and Naive Bayes (93%, minority class recall 0.80). Feature analysis identifies ImageBase and ResourcesMinSize as key contributors. This study highlights the effectiveness of ensemble methods and the critical importance of addressing class imbalance for robust malware detection. Limitations and implications for real-world deployment are discussed.
Copyrights © 2025