In the digital era, the overwhelming volume of online news far exceeds readers’ ability to manually filter information, necessitating automated text classification. However, achieving high classification accuracy remains challenging, especially in low-resource languages like IndonesianThe C5.0 decision tree and K-Nearest Neighbors (KNN) offer complementary strengths but have not yet been jointly utilized for Indonesian news classification; therefore, this study proposes a hybrid C5.0–KNN model designed to enhance news classification performance. A dataset of 1.700 articles was collected from four Indonesian online news, namely CNN Indonesia, Okezone, Tribun Jakarta, and Tribun Jabar, covering five topical categories, namely economy/ekonomi, technology/teknologi, sport/olahraga, entertainment/hiburan, or life style/gaya hidup). The data underwent preprocessing and TF-IDF weighing before classification with the hybrid model. In this approach, C5.0 first generates interpretable decision rules, and KNN then refines borderline cases, combining rule-based and instance-based methods. The findings revealed that the hybrid model achieved a highest accuracy of 0.8847 (using 25% test data and k=5), outperforming standalone C5.0 (0.7426) and KNN (0.8735). Notably, it attained 100% recall for “sport/olahraga” and an F1-score of 0.89 for “entertainment/hiburan”. These results demonstrate the model’s novelty, efficiency, and strong potential for real-world news classification in low-resource language contexts, offering practical value for journalists, analysts, and media monitoring systems.
Copyrights © 2025