Jurnal Infra
Vol 8, No 2 (2020)

Penerapan Random Forest dalam Email Filtering untuk Mendeteksi spam

Billy Christanto (Program Studi Informatika)
Djoni Haryadi Setiabudi (Program Studi Informatika)



Article Info

Publish Date
03 Oct 2020

Abstract

Email became an integral part of the internet experience. As users increase, marketing via email also became more popular. These emails often annoy users, hence the name “spam”. Because of its excessive number, the need to separate important messages from unimportant ones emerges. Up until this point, there’s no optimal solution to this problem. Among the methods being used, machine learning based solutions show the most promising results.  The method being tested is Random Forest, which is often regarded as superior compared to Naïve Bayesian, a popular algorithm for email filtering. Both of the algorithms are to be subjected to tests and compared for their accuracy, recall and precision. The effects of pre-processing and stemming to the dataset will also be tested. This research shows that both models produce similar accuracy, recall and precision that reach 96% for each category. Tests also show that Random Forest needs around  80 times more time to train it’s model compared to Naive Bayesian so it became not suitable for email filtering purposes

Copyrights © 2020