Danendra, Ardian
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Deteksi Malware Android Berbasis Ensemble Soft Voting LightGBM, Logistic Regression dan CatBoost Danendra, Ardian; Pramudya, Elkaf Rahmawan
Building of Informatics, Technology and Science (BITS) Vol 7 No 4 (2026): March 2026
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i4.8865

Abstract

The Android operating system faces serious challenges with increasingly complex and diverse malware evolution. This research proposes an Android malware detection system based on soft voting ensemble that integrates three algorithms (LightGBM, Logistic Regression, and CatBoost) to improve detection accuracy while maintaining computational efficiency. The dataset used is CCCS-CIC-AndMal-2020, which is highly imbalanced with over 400,000 Android application samples. The proposed model leverages hybrid features that combine static information (permissions, intents, API calls from the AndroidManifest) with dynamic behavior (memory activities, runtime API calls, logcat, and network traffic in an emulated environment), balancing low extraction cost with improved robustness against obfuscation. The methodology includes multi-stage preprocessing (IQR capping 40×, StandardScaler, RFE 150 features, SMOTE 30%) to improve data quality and reduce dimensionality by 56% without losing important information. The ensemble model is trained with F1-Macro-based weights (33.46% LightGBM, 30.99% Logistic Regression, 35.55% CatBoost) approximating 1:1:1 proportion. Evaluation results on the testing set demonstrate very high performance: Accuracy 95.58%, Balanced Accuracy 92.21%, F1-Macro 0.9208, True Positive Rate 100%, and False Alarm Rate 0.00%. The combination of these metrics indicates that the model can detect all malware samples without false positives on benign applications, making it suitable for production deployment. This research contributes by demonstrating the effectiveness of an efficient soft voting ensemble (only 3 models) for Android malware detection with multi-dimensional evaluation metrics representative of imbalanced data.