Malware poses a significant threat to cybersecurity, particularly for Android users. Each type of malware is categorized into distinct categories and families, each exhibiting unique malicious capabilities. Accurately identifying these categories and families is crucial for developing effective prevention and mitigation strategies, allowing for the control of threats before they worsen. Throughout the years, numerous techniques have been proposed for detecting malware families, with system calls emerging as a vital feature. Collected through dynamic analysis, system calls offer in-depth insights into the activities executed by malware, making them a powerful classification tool. This study aims to enhance the detection of Android malware families and categories by analyzing system calls with feature selection method. Using the Gain Ratio algorithm, significant system calls are identified to improve detection accuracy and reduce the complexity of the feature set. The study assesses machine learning algorithms, particularly Random Forest, J48, Naïve Bayes, and Decision Table. The findings show that Random Forest consistently outperforms other algorithms, achieving an accuracy of 88.01% for malware family detection and 89.65% for category detection, with high precision and recall across most metrics. The application of the Gain Ratio feature selection method led to a 68.83% feature reduction and improved model-building speed by 50.26%. This integration of feature selection and machine learning provides a more effective approach to detecting malware families and categories, thus contributing to enhanced Android security.
Copyrights © 2024