Particulate matter (PM₂.₅) is one of the most dangerous air pollutants because it can penetrate the respiratory system and cause serious health problems. Amidst the limitations of a real-time and comprehensive air quality monitoring system, a data-driven predictive approach is needed that can accurately project PM₂.₅ concentrations. This study aims to develop a PM₂ concentration prediction model using the Random Forest Regressor (RFR) algorithm optimised through a series of data pre-processing techniques. The pre-processing techniques include outlier detection with four methods (Isolation Forest, Autoencoder ANN, OCSVM, IQR) and missing value handling using three approaches (Spline Cubic Interpolation, Nearest Point Interpolation, Data Removal). The daily data used covered 12 environmental variables (including rainfall, temperature, relative humidity, AOD, and NDVI) from the period of March 2022 to March 2023, with PM₂.₅ as the target. The RFR model was built with 100 decision trees and 10-fold cross-validation to improve accuracy. Results showed the combination of IQR (outlier detection) and data deletion (missing values) produced the best performance with RMSE 0.082, MAE 0.027, and R² 0.886. The most influential variables were temperature (TEMP), relative humidity (RHU), and evapotranspiration (ET). This research contributes to the development of an accurate air quality prediction model, supporting the mitigation of PM₂.₅ pollution impacts on public health