Building of Informatics, Technology and Science
Vol 7 No 1 (2025): June (2025)

Deteksi URL Phishing Menggunakan Natural Language Processing Dan Support Vector Machine Berbasis Machine Learning

Nabila, Nabila (Unknown)
Hesti, Emilia (Unknown)
Aryanti, Aryanti (Unknown)



Article Info

Publish Date
23 Jun 2025

Abstract

Phishing represents a significant danger in cybersecurity, using malicious URLs to mislead users into revealing critical information. This research seeks to create a phishing URL detection model using machine learning via the integration of structural URL feature extraction, Natural Language Processing (NLP) methodologies, and the Support Vector Machine (SVM) classification algorithm. Indicators of phishing trends are derived from features such as URL length, the quantity of dots, and slashes, while URL content is quantified as numerical vectors using Term Frequency-Inverse Document Frequency (TF-IDF). All characteristics are subsequently integrated as input into a support vector machine model with a linear kernel for classification. The evaluation results from the classification report indicate that the integration of TF-IDF and linear kernel SVM achieves optimal performance, with 90% accuracy, 92% precision, 89% recall, and 90% F1-score. Conversely, the confusion matrix reveals 90.29% accuracy, 91.66% precision, 88.62% recall, and 90.12% F1-score. This study primarily contributes by integrating NLP and SVM into a unified adaptive phishing detection model via the amalgamation of structural and textual aspects of URLs. This strategy facilitates enhanced phishing detection relative to techniques reliant only on manual characteristics. This model, unlike other research that concentrated on particular instances or excluded NLP, is engineered to identify many categories of phishing URLs broadly, hence enhancing its relevance in tackling the dynamic nature of assaults.

Copyrights © 2025






Journal Info

Abbrev

bits

Publisher

Subject

Computer Science & IT

Description

Building of Informatics, Technology and Science (BITS) is an open access media in publishing scientific articles that contain the results of research in information technology and computers. Paper that enters this journal will be checked for plagiarism and peer-rewiew first to maintain its quality. ...