This study aims to develop an automatic news text classification system using the K-Nearest Neighbor (KNN) algorithm with a hyperparameter tuning approach. Manual classification by editors is considered inefficient, so an accurate and lightweight automated approach is needed. News datasets were obtained through web scraping of bbc.com sites with five main categories, namely business, technology, entertainment, science, and health. This research follows the CRISP-DM methodology which consists of six stages: business understanding, data understanding, data preparation, modeling, evaluation, and deployment. Feature representation is done using TF-IDF and preprocessing includes stopword removal as well as pattern-based noise cleaning. Two experimental scenarios were performed: first, using complete data without balancing; Second, using more balanced undersampling data. Hyperparameter tuning was performed with k-value variations from 1 to 50 and validated with 5-fold cross-validation. The results showed that the model with balanced data and a value of k=11 produced an accuracy, precision, recall, and F1-score of 95%. The system was also successfully implemented into a Flask-based web application that can be used by news editors for real-time text classification. This study emphasizes the importance of parameter optimization and preprocessing in text classification and shows that simple algorithms such as KNN remain competitive if supported by good data processing.
Copyrights © 2025