Claim Missing Document
Check
Articles

Found 1 Documents
Search

Optimizing Malware Detection and Prevention on Proxy Servers Through Random Forest and Lexical Feature Analysis Andalas Saputra, Meitro Hartanto; Pebrianti, Dwi; Bayuaji, Luhur; Rusdah
Indonesian Journal of Computing, Engineering, and Design (IJoCED) Vol. 7 No. 1 (2025): IJoCED
Publisher : Faculty of Engineering and Technology, Sampoerna University

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35806/ijoced.v7i1.485

Abstract

Malware has become a significant concern due to the increase in malicious websites hosting spam, phishing, malware, and other threats. This research aims to predict malware URLs using lexical features for feature extraction and random forest for classification. The dataset, sourced from kaggle.com, includes benign, phishing, spam, malware, and defacement URLs. To address data imbalance, random oversampling was applied for balanced training. Recursive feature elimination was used to optimize lexical features, testing various sets of features (10, 15, 19, 23, 29, 35) for classification accuracy, achieving 98% accuracy using 23 features. Validation tests with actual university network data confirmed this model’s effectiveness, classifying malicious URLs in 9 minutes using 11,566 samples. URL filtering involved log analyzer tools capturing internet traffic during working hours over one month. Results suggest that this approach can efficiently classify malicious URLs and could be implemented for real-time detection in proxy server logs, aiding IT departments in preventing malware spread via web traffic.