Claim Missing Document
Check
Articles

Found 2 Documents
Search

Air Pollution Forecasting in Almaty using Ensemble Machine Learning Models Naizabayeva, Lyazat; Sembina, Gulbakyt; Aliman, Alibek; Satymbekov, Maxatbek; Barlykbay, Nazym; Seilova, Nurgul
Journal of Applied Data Sciences Vol 6, No 4: December 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i4.821

Abstract

This study develops an advanced forecasting methodology for air pollution levels in Almaty, Kazakhstan, focusing on fine Particulate Matter (PM2.5) and carbon monoxide concentrations. Air pollution poses significant risks to public health, and Almaty’s basin location exacerbates the problem. Addressing the limitations of traditional statistical forecasting methods, we propose an ensemble machine learning approach that integrates Seasonal-Trend decomposition with gradient boosting algorithms to capture complex temporal and nonlinear patterns. The objective is to develop and validate an effective methodology for forecasting atmospheric air pollution in Almaty using machine learning methods, in particular STL decomposition, XGBoost, LightGBM models, and their ensemble combination. The novelty lies in the integration of STL decomposition with an ensemble of gradient boosting models for high-accuracy air pollution forecasting in the complex urban environment of Almaty. The dataset includes hourly measurements from over 20 monitoring stations, enabling seasonal and spatial analysis. Rigorous preprocessing techniques were applied, including outlier removal, normalization, and time series decomposition into seasonal, trend, and residual components. Two gradient boosting models, XGBoost and LightGBM, were trained separately and combined into a weighted ensemble, with optimal weights determined through cross-validation. Figures and tables illustrate data preprocessing flow, model architectures, feature importance analysis, and evaluation of predictive performance. The ensemble outperformed individual models, achieving high accuracy with coefficient of determination values exceeding 0.98 for PM2.5 and 0.83 for carbon monoxide. The findings demonstrate that integrating Seasonal-Trend decomposition with ensemble learning provides a robust and effective approach to forecasting air pollution in complex urban environments. The methodology shows strong potential for practical application in real-time air quality monitoring and warning systems, aiding policymakers and public health authorities. Future research will expand the dataset by incorporating additional factors such as traffic flow, industrial emissions, and satellite remote sensing data to enhance predictive accuracy and model interpretability.
Clustering Player Performance in Pokémon TCG Tournaments: A K-means Approach to Identifying Performance Groups Based on Wins, Losses, and Tournament Statistics Sembina, Gulbakyt; Naizabayeva, Lyazat
International Journal Research on Metaverse Vol. 2 No. 4 (2025): Regular Issue December 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/ijrm.v2i4.38

Abstract

This study applies K-means clustering to analyze player performance in competitive Pokémon TCG tournaments, categorizing players into four distinct performance groups based on metrics such as wins, losses, and ties. Using a dataset comprising over 186,000 players, the study identifies key clusters representing varying levels of success in the game. The data was preprocessed by handling missing values and standardizing features to ensure uniform contribution across metrics. MiniBatchKMeans was employed to optimize clustering for large datasets, resulting in a model that groups players into low, moderate, and high-performance categories. The clustering results provide valuable insights into the distribution of player performance and help identify trends in competitive dynamics. A Silhouette Score of 0.4582 indicates that the clustering is moderately effective, with some overlap between clusters, suggesting that further refinement may be needed. Visualizations, including scatter plots, box plots, and heatmaps, were used to interpret the cluster characteristics, showing that top-performing players cluster into smaller groups, while a large majority of players exhibit moderate performance. The findings offer important implications for both players and tournament organizers: players can refine strategies based on their cluster profiles, while organizers can use clustering insights to design more balanced and engaging tournament formats. Future research could explore alternative clustering methods and incorporate additional performance features to further optimize player segmentation and enhance tournament design.