Indonesian Journal of Computing, Engineering, and Design
Vol. 7 No. 1 (2025): IJoCED

Optimizing Malware Detection and Prevention on Proxy Servers Through Random Forest and Lexical Feature Analysis

Andalas Saputra, Meitro Hartanto (Unknown)
Pebrianti, Dwi (Unknown)
Bayuaji, Luhur (Unknown)
Rusdah (Unknown)



Article Info

Publish Date
14 Apr 2025

Abstract

Malware has become a significant concern due to the increase in malicious websites hosting spam, phishing, malware, and other threats. This research aims to predict malware URLs using lexical features for feature extraction and random forest for classification. The dataset, sourced from kaggle.com, includes benign, phishing, spam, malware, and defacement URLs. To address data imbalance, random oversampling was applied for balanced training. Recursive feature elimination was used to optimize lexical features, testing various sets of features (10, 15, 19, 23, 29, 35) for classification accuracy, achieving 98% accuracy using 23 features. Validation tests with actual university network data confirmed this model’s effectiveness, classifying malicious URLs in 9 minutes using 11,566 samples. URL filtering involved log analyzer tools capturing internet traffic during working hours over one month. Results suggest that this approach can efficiently classify malicious URLs and could be implemented for real-time detection in proxy server logs, aiding IT departments in preventing malware spread via web traffic.

Copyrights © 2025






Journal Info

Abbrev

IJOCED

Publisher

Subject

Arts Chemical Engineering, Chemistry & Bioengineering Computer Science & IT Engineering Industrial & Manufacturing Engineering Materials Science & Nanotechnology Mechanical Engineering

Description

Indonesian Journal of Computing, Engineering and Design (IJoCED) is an international and open access peer-reviewed journal, published by Faculty of Engineering and Technology, Sampoerna University. IJoCED published original research papers, state of the art reviews and innovative projects on topics ...