With the increase in internet users, especially websites, it provides an opportunity for phishing actors to obtain or steal personal information from users. On each website there will be a lot of information that will be used as a feature, this feature will be used to classify phishing websites. Features will be divided into 3, namely feature url, content feature, and external feature. In this study, three methods will be used, namely C5.0, XGBoost, and Random Forest. The three methods will be tested for their performance to find the best method for classifying phishing websites. In addition, this research will also utilize feature selection with the aim of removing features that have no effect so that training time can be shortened. Based on the test results obtained, it shows that C5.0 is able to provide accuracy, precision, recall, & f1-score values with an average of 93.5%, XGBoost with an average of 96.6%, and Random Forest with an average of 95.7 %. The use of feature selection in the three algorithms also shows that training time can be shortened by an average of about 3.53 times faster by using only 15 feature importance. However, with the use of feature selection, the performance on accuracy, precision, recall, & f1- score values decreased slightly even though the given decrease was not significant or had no major impact on the classification process.
Copyrights © 2022