Air pollution, particularly benzene (C6H6), is a serious urban environmental issue with significant public health impacts. Benzene is a carcinogenic compound originating from motor vehicle emissions and industrial processes. This study aims to develop a prediction model for benzene concentration using PT08.S1 (CO) and PT08.S2 (NMHC) gas sensor data along with meteorological factors (temperature, relative humidity, absolute humidity). Data was obtained from the UCI Machine Learning Repository, totaling 9,357 samples collected from five metal oxide sensors in an urban area. Preprocessing was performed by removing -200 values representing missing data, resulting in 8,779 valid samples. The methods employed are Multiple Linear Regression and Random Forest Regressor. Evaluation results show that Random Forest outperforms with MAE of 0.0155, RMSE of 0.1311, and R² of 0.9997, while Linear Regression yields MAE of 0.9966, RMSE of 1.3864, and R² of 0.9666. Feature importance analysis reveals that absolute humidity (AH) is the most dominant predictor with a weight of 0.9049, followed by PT08.S2(NMHC) with 0.0276. This study demonstrates that gas sensor data can be reliably used for benzene estimation and Random Forest is more accurate than linear regression due to its ability to capture non-linear relationships among variables.
Copyrights © 2026