Jurnal Infra
Vol 10, No 2 (2022)

Pengaruh Feature Selection terhadap Kinerja C5.0, XGBoost, dan Random Forest dalam Mengklasifikasikan Website Phishing

Michael Jonathan (Program Studi Teknik Informatika, Universitas Kristen Petra Surabaya)
Silvia Rostianingsih (Program Studi Teknik Informatika, Universitas Kristen Petra Surabaya)
Henry Novianus Palit (Program Studi Teknik Informatika, Universitas Kristen Petra Surabaya)



Article Info

Publish Date
29 Aug 2022

Abstract

With the increase in internet users, especially websites, it provides an opportunity for phishing actors to obtain or steal personal information from users. On each website there will be a lot of information that will be used as a feature, this feature will be used to classify phishing websites. Features will be divided into 3, namely feature url, content feature, and external feature. In this study, three methods will be used, namely C5.0, XGBoost, and Random Forest. The three methods will be tested for their performance to find the best method for classifying phishing websites. In addition, this research will also utilize feature selection with the aim of removing features that have no effect so that training time can be shortened. Based on the test results obtained, it shows that C5.0 is able to provide accuracy, precision, recall, & f1-score values with an average of 93.5%, XGBoost with an average of 96.6%, and Random Forest with an average of 95.7 %. The use of feature selection in the three algorithms also shows that training time can be shortened by an average of about 3.53 times faster by using only 15 feature importance. However, with the use of feature selection, the performance on accuracy, precision, recall, & f1- score values decreased slightly even though the given decrease was not significant or had no major impact on the classification process.

Copyrights © 2022