The Internet of Things (IoT) represents a complex network of embedded devices that exchange data through heterogeneous communication technologies, making them increasingly vulnerable to sophisticated cyber attacks. This paper presents a hybrid Intrusion Detection System (HIDS) that integrates Extra Trees (ExtraTreesClassifier) for feature selection with four ensemble classifiers: XGBoost, CatBoost, AdaBoost, and Gradient Boosting. Our approach performs supervised feature selection exclusively on training data to prevent information leakage, applies class balancing for imbalanced datasets, and evaluates each hybrid model using comprehensive metrics including ROC-AUC, PR-AUC, false positive/negative rates, and Matthews Correlation Coefficient. We validate our methodology on three benchmark datasets with contrasting characteristics: UNSW-NB15 (real-world network traffic, 175K samples), IoTNet24 (laboratory-controlled traffic, 23K samples), and BoTNeTIoT-L01 (large-scale laboratory traffic, 2.4M samples). On UNSW-NB15, our best model (EXT-GB) achieves 87.73% accuracy, 0.90 F1-score, 0.98 ROC-AUC, and 98.58% recall with 1.42% false negative rate, representing realistic performance for production IDS. On laboratory datasets after addressing class imbalance, models achieve near-perfect performance (IoTNet24: 99.96%, BoTNeTIoT: 99.99%). The 12-percentage-point performance gap between real-world and laboratory data highlights a critical finding: controlled laboratory datasets significantly overestimate real-world IDS capability, underscoring the importance of evaluation on realistic traffic captures for assessing production deployment readiness.
Copyrights © 2026