Blumea balsamifera (sembung) is a medicinal plant with well-documented antibacterial, anti-inflammatory, and analgesic properties. However, the systematic identification of its bioactive compounds remains a significant challenge due to the complexity and high dimensionality of LC–MS (Liquid Chromatography–Mass Spectrometry) data. This study aims to develop a robust computational framework for automated compound identification using a hybrid modeling approach.A hybrid model integrating Long Short-Term Memory (LSTM) and Extreme Gradient Boosting (XGBoost) was employed to enhance feature extraction and classification performance. The LSTM component was utilized to capture sequential dependencies in spectral data, while XGBoost performed optimized classification through gradient boosting. This integration enables efficient handling of complex spectral patterns and improves predictive accuracy.The proposed model achieved an accuracy of 91%, demonstrating strong performance in classifying and identifying bioactive compounds. Feature importance analysis identified several key compounds contributing to the model predictions, including Luteolin-7-methyl-ether, Umbelliferone, Blumeatin, Dihydroquercetin-7,4′-dimethylether, Chrysosplenol C, Blumealactone B, and Blumeaene E. These compounds are associated with known pharmacological activities, supporting the therapeutic relevance of B. balsamifera.The proposed hybrid LSTM–XGBoost framework provides an effective and scalable approach for LC–MS-based compound identification. This method reduces analytical complexity, enhances classification reliability, and offers a data-driven strategy for accelerating phytochemical research and bioactive compound validation
Copyrights © 2026