Garuda - Garba Rujukan Digital

Jurnal Informasi dan Teknologi

2025, Vol. 7, No. 2

Muhamad Nur Gunawan (Unknown)
Nuryasin (Unknown)
Syopiansyah Jaya Putra (Unknown)
Sarah Arhami (Unknown)

Publish Date
17 Jul 2025

This study aims to develop an automatic news text classification system using the K-Nearest Neighbor (KNN) algorithm with a hyperparameter tuning approach. Manual classification by editors is considered inefficient, so an accurate and lightweight automated approach is needed. News datasets were obtained through web scraping of bbc.com sites with five main categories, namely business, technology, entertainment, science, and health. This research follows the CRISP-DM methodology which consists of six stages: business understanding, data understanding, data preparation, modeling, evaluation, and deployment. Feature representation is done using TF-IDF and preprocessing includes stopword removal as well as pattern-based noise cleaning. Two experimental scenarios were performed: first, using complete data without balancing; Second, using more balanced undersampling data. Hyperparameter tuning was performed with k-value variations from 1 to 50 and validated with 5-fold cross-validation. The results showed that the model with balanced data and a value of k=11 produced an accuracy, precision, recall, and F1-score of 95%. The system was also successfully implemented into a Flask-based web application that can be used by news editors for real-time text classification. This study emphasizes the importance of parameter optimization and preprocessing in text classification and shows that simple algorithms such as KNN remain competitive if supported by good data processing.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Jurnal Informasi dan Teknologi

Website

Abbrev

jidt

Publisher

Universitas Putra Indonesia YPTK Padang

Subject

Computer Science & IT

Description

Jurnal Informasi & Teknologi media kajian ilmiah hasil penelitian, pemikiran dan kajian analisis-kritis mengenai penelitian Rekayasa Sistem, Teknik Informatika/Teknologi Informasi, Manajemen Informatika dan Sistem Informasi. Sebagai bagian dari semangat menyebarluaskan ilmu pengetahuan hasil dari ...

Article Info

Abstract

Machine Learning-Based News Classification: Comparison of KNN Accuracy with Hyperparameter Tuning

Article Info

Abstract