Malware detection is a significant challenge in cybersecurity due to the complex and evolving nature of threats. This study evaluates the effectiveness of machine learning algorithms, specifically XGBoost and LightGBM, in detecting malware. The approach includes data cleaning, normalization, feature selection, and the use of the Interquartile Range (IQR) technique to select relevant features. The initial dataset contained 21,752 files, evenly split between malicious and benign files. After data cleaning, the number of samples decreased to 19,256 files, with numerous features that were reduced after applying IQR. Results show that XGBoost outperforms other algorithms, achieving 99.20% accuracy, an improvement over the 98.99% accuracy without IQR. The IQR technique enhances data quality by filtering out features with significant differences between malware and benign files, improving model performance. Additionally, reducing the feature set helps prevent overfitting and strengthens the model's generalization ability. The study concludes that machine learning, particularly with algorithms like XGBoost and LightGBM, can effectively improve malware detection. By using IQR in feature selection, model performance is enhanced, leading to reduced false positives and increased detection efficiency. The research highlights the importance of feature selection techniques like IQR in boosting the predictive power of machine learning models, making them more efficient in identifying malware. Future work will explore additional feature selection methods to further improve malware detection accuracy.
Copyrights © 2026