The application of ANOVA's P-Value-based feature selection method, namely the F-test, in phishing detection with the Random Forest algorithm indicates that a configuration of 25 features yields the quickest inference time, rendering it appropriate for scenarios demanding great computational efficiency and responsiveness. However, if the user's primary priority is to achieve the highest level of detection accuracy, the 29-feature configuration is more feasible because it exhibits higher accuracy performance and better prediction stability. Consequently, there is no definitive trade-off between 25 or 29 features, there exists a selection of solutions that can be tailored to the application's requirements. This methodology enables users to achieve an optimal equilibrium between superior performance and minimal inference time in a phishing detection system, contingent upon the implementation context and operational priorities. This study successfully shows that a simple statistical approach such as P-Value is not only competitive but also provides superior results compared to more complex methods, offering a practical and efficient solution for real-world implementation.
Copyrights © 2025