Ranah Research : Journal of Multidisciplinary Research and Development
Vol. 6 No. 4 (2024): Ranah Research : Journal Of Multidisciplinary Research and Development (Mei 202

Deteksi Email Spam dengan Continuous Bag-Of-Words dan Random Forest

Michiavelly Rustam (Master of Science and Information Technology, President University 17550, Indonesia)
Agung Brotokuncoro (Master of Science and Information Technology, President University 17550, Indonesia)
Rusdianto Roestam (Doctor of Philosophy, President University 17550, Indonesia)



Article Info

Publish Date
05 Jun 2024

Abstract

Spam email poses a significant cyber threat, as scammers employ various tactics to deceive individuals into divulging sensitive information or downloading harmful content. For instance, in June 2023, Indonesia encountered approximately 6.51 thousand spam attacks, underscoring the widespread nature of this issue. These attacks frequently involve deceptive strategies, such as impersonation or false promises of rewards, to ensnare unsuspecting victims. Succumbing to spam can result in financial losses and other grave repercussions. To address this concern, this research addresses this pressing problem by focusing on email content classification to detect phishing attempts. The proposed solution leverages runtime platforms such as Google Colab and uses Continuous Bag of Words (CBOW) analysis and Random Forest methods. CBOW is selected for its effectiveness in capturing semantic relationships between words, allowing the model to extract meaningful features from the email content. Random Forest, on the other hand, is chosen for its ability to handle imbalanced datasets commonly encountered in email classification tasks, ensuring fair representation of both spam and ham emails during model training. By combining these two techniques, we aim to develop a robust classification model capable of accurately distinguishing between phishing (spam) and legitimate (ham) emails, thus enhancing email security measures. Through our approach, we aim to classify the SpamAssassin dataset into ham or spam categories, with an anticipated precision rate of 0.98, demonstrating the model's effectiveness in accurately identifying phishing emails.

Copyrights © 2024






Journal Info

Abbrev

R2J

Publisher

Subject

Chemical Engineering, Chemistry & Bioengineering Civil Engineering, Building, Construction & Architecture Economics, Econometrics & Finance Law, Crime, Criminology & Criminal Justice Public Health Social Sciences Transportation Other

Description

Ranah Research : Journal of Multidisciplinary Research and Development adalah jurnal multidisiplin ilmiah yang diterbitkan oleh inasti Research di bawah naungan Yayasan Dharma Indonesia Tercinta (DINASTI). Perbitan jurnal ini 4 kali dalam setahun yaitu November, Februari, Mei, dan Agustus. Ruang ...