Sinkron : Jurnal dan Penelitian Teknik Informatika
Vol. 10 No. 1 (2026): Article Research January 2026

Comparing XGBoost and LightGBM for Optimizing Health Content Categories

Oktaviana, Nanda (Unknown)
Andrianingsih, Andrianingsih (Unknown)



Article Info

Publish Date
03 Jan 2026

Abstract

Indonesia’s social media platforms contain large amounts of unverified health information. Research on Indonesian health-text mining still rarely focuses on disease-based classification, leaving a gap compared with studies that only address sentiment or general topic categorization. This study proposes a multi-class classification approach that uses IndoBERT embeddings combined with gradient-boosting classifiers (XGBoost and LightGBM) to categorize tweets into diabetes, hypertension, and heart disease. The dataset comprises 4,075 tweets collected from platform X (Twitter). Preprocessing involves text cleaning, anonymization, normalization, and the extraction of 768-dimensional IndoBERT embeddings. Experiments are conducted in Google Colab (Intel Xeon CPU, 13 GB RAM, optional NVIDIA T4 GPU) using stratified five-fold cross-validation.The best results are obtained by the IndoBERT × LightGBM pipeline, which achieves an accuracy of 0.8526 and a macro-averaged F1-score of 0.8527, outperforming the IndoBERT × XGBoost model (accuracy 0.8325 and macro F1-score 0.8326). Feature-importance analysis shows that contextual terms related to blood sugar, the heart, and blood pressure strongly influence the predictions. Overall, the proposed method provides an effective baseline for monitoring health-related text and supporting disease-oriented analytics in Indonesian-language social media.

Copyrights © 2026






Journal Info

Abbrev

sinkron

Publisher

Subject

Computer Science & IT

Description

Scope of SinkrOns Scientific Discussion 1. Machine Learning 2. Cryptography 3. Steganography 4. Digital Image Processing 5. Networking 6. Security 7. Algorithm and Programming 8. Computer Vision 9. Troubleshooting 10. Internet and E-Commerce 11. Artificial Intelligence 12. Data Mining 13. Artificial ...