Time series analysis has evolved to include forecasting and anomaly detection, which can be applied in various fields. Machine learning methods, such as long short-term memory (LSTM) and extreme gradient boosting (XGBoost), are widely developed because they are considered superior to conventional methods. Both use a forecasting approach for anomaly detection. However, the limitations of both methods on anomalies, such as data length, labeling method, and number of anomalies have not been explored. Therefore, this study aims to identify factors that affect the performance of LSTM and XGBoost in forecasting and anomaly detection through various scenarios and compare their metrics evaluation. The study utilizes Jakarta's air quality index data for 2018–2023, which was preprocessed and augmented for simulation purposes. The study shows that the LSTM method is superior to XGBoost, as shown by the lower MAPE (14.7024%), lower RMSE (13.9909), and higher balanced accuracy (0.9935). These results are reinforced by the significant Mann-Whitney test between the two methods, indicating a difference in the method's accuracy. In addition, the Kruskal-Wallis test for each combination of method and treatment showed significant results. These results indicate that data length, labeling method, and number of anomalies affect the method's accuracy
Copyrights © 2025