Scientific Journal of Informatics
Vol. 11 No. 2: May 2024

Performance Analysis of Support Vector Classification and Random Forest in Phishing Email Classification

Chaerul Umam (Universitas Dian Nuswantoro)
Lekso Budi Handoko (Universitas Dian Nuswantoro)
Folasade Olubusola Isinkaye (Ekiti State University)



Article Info

Publish Date
22 May 2024

Abstract

Purpose: This study aims to conduct a performance analysis of phishing email classification system using machine learning algorithms, specifically Random Forest and Support Vector Classification (SVC). Methods/Study design/approach: The study employed a systematic approach to develop a phishing email classification system utilizing machine learning algorithms. Implementation of the system was conducted within the Jupyter Notebook IDE using the Python programming language. The dataset, sourced from kaggle.com, comprised 18,650 email samples categorized into secure and phishing emails. Prior to model training, the dataset was divided into training and testing sets using three distinct split percentages: 60:40, 70:30, and 80:20. Subsequently, parameters for both the Random Forest and Support Vector Classification models were carefully selected to optimize performance. The TF-IDF Vectorizer method was employed to convert text data into vector form, facilitating structured data processing. Result/Findings: The study's findings reveal notable performance accuracies for both the Random Forest model and Support Vector Classification across varying data split percentages. Specifically, the Support Vector Classification consistently outperforms the Random Forest model, achieving higher accuracy rates. At a 70:30 split percentage, the Support Vector Classification attains the highest accuracy of 97.52%, followed closely by 97.37% at a 60:40 split percentage. Novelty/Originality/Value: Comparisons with previous studies underscored the superiority of the Support Vector Classification model. Therefore, this research contributes novel insights into the effectiveness of this machine learning algorithms in phishing email classification, emphasizing its potential in enhancing cybersecurity measures.

Copyrights © 2024






Journal Info

Abbrev

sji

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management Electrical & Electronics Engineering Engineering

Description

Scientific Journal of Informatics (p-ISSN 2407-7658 | e-ISSN 2460-0040) published by the Department of Computer Science, Universitas Negeri Semarang, a scientific journal of Information Systems and Information Technology which includes scholarly writings on pure research and applied research in the ...