This study presents a systematic review of spatial air temperature modeling based on remote sensing data using machine learning approaches during the period 2016–2025. Using the PRISMA framework, we conducted literature searches in Google Scholar (998 articles) and Scopus (489 articles).. After merging the datasets, removing duplicates, and applying inclusion–exclusion criteria, 12 articles were retained for in-depth analysis. The findings indicate a marked increase in publications since 2021, reflecting growing global interest in integrating remote sensing and machine learning for air temperature estimation. Ensemble algorithms such as Random Forest and XGBoost dominate due to their balance of accuracy and computational efficiency, while temporal deep learning approaches such as LSTM and TCN are emerging as powerful tools for capturing complex atmospheric dynamics. Among remote sensing predictors, Land Surface Temperature (LST) is the most frequently used, often complemented by NDVI, albedo, and elevation to improve spatial accuracy. Geographical context strongly influences methodological performance. XGBoost proves effective in heterogeneous urban areas, Random Forest performs well in mountainous regions, and artificial neural networks demonstrate higher adaptability in extreme environments such as the Greenland ice sheet. Nonetheless, limited ground-based observations and sparse station networks remain key challenges, particularly across tropical and archipelagic regions. This review identifies three major directions for future research: (1) expanding studies to underrepresented tropical regions, (2) leveraging temporal deep learning methods for detecting extreme events, and (3) integrating multisensor data with innovative validation strategies to enhance the robustness and reliability of air temperature modeling.