Traditional IoT anomaly detection systems lack the ability to cope with the increase in dimensionality, the constraints related to processing big data and the problem of non-interpretable features extraction. This article describes a complete flow integrating Apache Spark data preparation, PCA for dimensionality reduction (from 744 to 12 components that retain 92.7% variance), and CatBoost gradient boosting for classification. Performing a thorough benchmarking of six algorithms on the Intel Berkeley Research lab dataset (n=30, 221 instances) demonstrates CatBoost as the best method obtaining F1-score=0.97, precision=0.97, accuracy=98.7% with 3-8% margin of improvements over XGBoost, LightGBM, Random Forest, and SVM methods. Temperature changes (PC1:0.37 factor) and humidity variations (PC2:0.29) became the major indicators of anomalies. The proof of computational feasibility by training finished in 45.2 seconds and making predictions under 35 seconds per batch on consumer Intel i7/16GB hardware, production level for environmental monitoring and industrial IoT applications is confirmed.
Copyrights © 2026