Kalempouw , Miracle
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : JOURNAL OF APPLIED INFORMATICS AND COMPUTING

Improving News Text Classification Using a Hybrid C5.0-KNN Model Wikarsa, Liza; Ngenget, Algy; Tumewu , Andrew; Kalempouw , Miracle; Oley , Edgard
Journal of Applied Informatics and Computing Vol. 9 No. 6 (2025): December 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i6.11478

Abstract

In the digital era, the overwhelming volume of online news far exceeds readers’ ability to manually filter information, necessitating automated text classification. However, achieving high classification accuracy remains challenging, especially in low-resource languages like IndonesianThe C5.0 decision tree and K-Nearest Neighbors (KNN) offer complementary strengths but have not yet been jointly utilized for Indonesian news classification; therefore, this study proposes a hybrid C5.0–KNN model designed to enhance news classification performance. A dataset of 1.700 articles was collected from four Indonesian online news, namely CNN Indonesia, Okezone, Tribun Jakarta, and Tribun Jabar, covering five topical categories, namely economy/ekonomi, technology/teknologi, sport/olahraga, entertainment/hiburan, or life style/gaya hidup). The data underwent preprocessing and TF-IDF weighing before classification with the hybrid model. In this approach, C5.0 first generates interpretable decision rules, and KNN then refines borderline cases, combining rule-based and instance-based methods. The findings revealed that the hybrid model achieved a highest accuracy of 0.8847 (using 25% test data and k=5), outperforming standalone C5.0 (0.7426) and KNN (0.8735). Notably, it attained 100% recall for “sport/olahraga” and an F1-score of 0.89 for “entertainment/hiburan”. These results demonstrate the model’s novelty, efficiency, and strong potential for real-world news classification in low-resource language contexts, offering practical value for journalists, analysts, and media monitoring systems.