Claim Missing Document
Check
Articles

Found 2 Documents
Search

Enhancing Spam Detection Using Hybrid of Harris Hawks and Firefly Optimization Algorithms Abualhaj, Mosleh M.; Shambour, Qusai Y.; Alsaaidah, Adeeb; Abu-Shareha, Ahmad; Al-Khatib, Sumaya; Hiari, Mohammad O.
Journal of Applied Data Sciences Vol 5, No 3: SEPTEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i3.279

Abstract

The emergence of the modern Internet has presented numerous opportunities for attackers to profit illegally by distributing spam mail. Spam refers to irrelevant or inappropriate messages that are sent on the Internet to numerous recipients. Many researchers use many classification methods in machine learning to filter spam messages. However, more research is still needed to assess using metaheuristic optimization algorithms to classify spam emails in feature selection. In this paper, we endorse fighting spam emails by employing a union of Firefly Optimization Algorithm (FOA) and Harris Hawks Optimization (HHO) algorithms to classify spam emails, along with one of the most well-known and efficient methods in this area, the Random Forest (RF) classifier. In this process, the experimental studies on the ISCX-URL2016 spam dataset yield promising results. For instance, the union of HHO and FOA, along with using an RF classifier, achieved an accuracy of 99.83% in detecting spam emails.
Spam Feature Selection Using Firefly Metaheuristic Algorithm Abualhaj, Mosleh M; Hiari, Mohammad O; Alsaaidah, Adeeb; Al-Zyoud, Mahran; Al-Khatib, Sumaya
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.336

Abstract

This paper presents a novel method for improving spam detection by utilizing the Firefly Algorithm (FA) for feature selection. The FA, a bio-inspired metaheuristic optimization algorithm, is applied to identify the most relevant features from the ISCX-URL2016 dataset, which contains 72 features. By balancing exploration (searching for new solutions) and exploitation (focusing on the best solutions), FA is able to effectively reduce the feature space from 72 to 31 features. This reduction improves model efficiency without sacrificing performance, as only the most impactful features are retained for the classification task. The selected features were then used to train three machine learning classifiers: Decision Tree (DT), Gradient Boost Tree (GBT), and Naive Bayes (NB). Each classifier's performance was evaluated based on accuracy, with DT achieving the highest accuracy of 99.81%, GBT achieving 99.70%, and NB scoring 90.33%. The superior performance of the DT algorithm is attributed to its ability to handle non-linear relationships and high-dimensional data, making it particularly well-suited for the FA-selected features. This combination of FA for feature selection and DT for classification demonstrates significant improvements in spam detection performance, highlighting the importance of selecting the most relevant features. The results show that by reducing the dimensionality of the dataset, the FA algorithm not only accelerates the classification process but also enhances detection accuracy.