The increasing use of Virtual Private Networks (VPNs) in modern networks poses significant challenges for network monitoring and traffic management, particularly in accurately and efficiently distinguishing VPN and non-VPN traffic. This study aims to analyze the effectiveness of the SelectKBest feature selection method in improving VPN traffic classification performance using Random Forest and Support Vector Machine (SVM) algorithms. The dataset used in this study is the CIC VPN-NonVPN Traffic Dataset provided by the Canadian Institute for Cybersecurity (CIC), which is widely recognized as a standard benchmark in network security research. Feature selection was performed using SelectKBest with the ANOVA (f_classif) scoring function, reducing the original feature set to 15 most relevant features. Experimental results show that the Random Forest classifier achieved a test accuracy of 84.94%, along with high F1-score and ROC-AUC values, and an average cross-validation accuracy of 95.18% with low variance. In contrast, the SVM model demonstrated relatively poor performance, with a test accuracy of approximately 62%, indicating its limitation in capturing the complex patterns of network traffic data. Further analysis using ROC curves, Precision–Recall curves, confusion matrices, and learning curves confirms that Random Forest exhibits superior generalization capability compared to SVM. These findings indicate that the combination of SelectKBest and Random Forest not only delivers high classification performance but also improves computational efficiency through feature dimensionality reduction, making it suitable for large-scale VPN traffic classification scenarios.
Copyrights © 2026