Ngah, Syahrulanuar
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

A Heterogeaneous Dataset–Driven Ensemble Learning Framework for Malicious URL Detection Sukarno, Parman; Ngah, Syahrulanuar
International Journal of Advances in Data and Information Systems Vol. 7 No. 1 (2026): April 2026 - International Journal of Advances in Data and Information Systems
Publisher : Indonesian Scientific Journal

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Modern cyberattacks are increasingly associated with phishing campaign, malware distribution, and website defacement, which are often delivered through malicious Uniform Resource Locator (URL) originating from diverse source. This paper examine malicious URL detection using an ensemble learning framework evaluated on large scale heterogeneous dataset composed of URL aggregated from multiple public threat intelligence source. The dataset include benign, phishing, malware, and defacement URL, thereby reflecting real world variability in attack pattern and data distribution. Three ensemble based classifier, namely Decision Tree (DT), Random Forest (RF), and Gradient Boosting (GB), are evaluated with respect to detection accuracy and computational efficiency. In addition to classification performance, this study present a detailed analysis of training and detection time in order to identify most suitable model for practical deployment. Experimental results indicate that the DT model achieves a training time of 4.14 seconds with macro and weighted accuracies of 94.11% and 91.71%, respectively, and a per category detection time of 0.2162 seconds. The RF model attains macro and weighted accuracies of 93.64% and 90.94%, with training and detection times of 9.73 seconds and 0.2420 seconds, respectively. Although the GB model exhibits the longest training time of 45.38 seconds, it achieves the fastest per category detection time of 0.2151 seconds. Despite its comparatively lower overall accuracy of 92.48% for macro averaging and 89.42s% for weighted averaging, the rapid inference capability of GB makes it a strong candidate for real time malicious URL detection in heterogeneous cybersecurity environments.