Indonesia, as a country with the highest seismicity in the world, requires an accurate earthquake prediction system through the use of the BMKG earthquake catalogue. This research aims to implement ETL-based data pipeline engineering to process 92,887 earthquake catalog entries for the 2008-2023 period into ready-to-use daily time series for the LSTM seismicity forecasting model. The ETL process includes raw data extraction, cleaning of 97% missing values columns on focal mechanism parameters, datetime conversion, daily resampling producing 5,200 entries with earthquake count, total magnitude, and average magnitude features, as well as Min-Max Scaler normalization for LSTM compatibility. The dataset was processed using Google Colab with a stacked LSTM architecture of two layers of 50 and 25 units, dropout 0.2, Adam optimizer, and a sequence window of 30 days to predict the daily earthquake count. The model trained for 100 epochs shows the ability to capture stable seismic activity trends with a consistent decrease in MSE loss, although it shows deviations in extreme spikes due to aftershock sequences. The ETL pipeline proved crucial in ensuring temporal consistency, 100% data completeness, and relevant physics representation, resulting in a reproducible end-to-end framework for disaster mitigation.
Copyrights © 2026