Garuda - Garba Rujukan Digital

International Journal of Engineering, Technology and Natural Sciences (IJETS)

Vol 7 No 2 (2025): International Journal of Engineering, Technology and Natural Sciences

Arta, Yudhi (Unknown)
Samuri, Suzani Mohamad (Unknown)
Syafitri, Nesi (Unknown)
Hanafiah, Anggi (Unknown)
Oktaria, Wina (Unknown)
Maripati, Maripati (Unknown)

Publish Date
31 Dec 2025

The exponential growth of malicious web content has created an urgent demand for intelligent systems capable of accurately classifying cyber threats based on URL patterns. This study investigates the effectiveness of two widely used supervised learning algorithms, Random Forest and Naïve Bayes, in probabilistic classification tasks involving multiclass URL data. A synthetic dataset simulating 547,775 URLs was constructed to reflect realistic threat distribution: benign (65.74%), phishing (14.46%), defacement (14.81%), and malware (4.99%). Each instance was characterized by basic structural features such as length, dot count, HTTPS presence, and keyword indicators. To ensure fairness, both models were evaluated using identical stratified train-test splits across varying sample sizes, including a focused experiment on 15,000 and 100,000 entries. Results consistently revealed that both models exhibited high recall and precision only for the benign class, while failing entirely to detect minority classes. For Random Forest, precision and recall values reached 1.00 for benign URLs, yet dropped to 0.00 for phishing, defacement, and malware across all test sets. Naïve Bayes showed similar performance degradation, highlighting the severe impact of class imbalance and limited feature expressiveness. These findings emphasize the inadequacy of conventional classifiers in highly skewed, security-sensitive environments without preprocessing interventions. The study concludes that while Random Forest and Naïve Bayes offer computational simplicity, their default behavior is biased toward majority classes, rendering them unsuitable for detecting cyber threats without employing resampling techniques (e.g., SMOTE), cost-sensitive learning, or feature augmentation strategies. Future work will explore adaptive hybrid models with contextual features and deep learning frameworks to improve multiclass detection in real-world cybersecurity applications.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

International Journal of Engineering, Technology and Natural Sciences (IJETS)

Website

Abbrev

IJETS

Publisher

Universitas Teknologi Yogyakarta

Subject

Civil Engineering, Building, Construction & Architecture Computer Science & IT Decision Sciences, Operations Research & Management Electrical & Electronics Engineering Industrial & Manufacturing Engineering

Description

Journal IJETS concern in publishing the original research articles, review articles from contributors, and the current issues related to engineering, technology and natural sciences. The main objective of IJETS is to provide a platform for the international scholars, academicians and ...

Article Info

Abstract

Application of Machine Learning for Classifying and Identifying Security Threats Using a Supervised Learning Algorithm Approach

Article Info

Abstract