News disseminated through internet-based systems or news portals is generally classified into specific categories, such as politics, sports, economy, entertainment, technology, health, and others. Currently, this categorization is performed manually, requiring a thorough reading of the entire news content. To address this inefficiency, an automatic classification system for Indonesian news articles is necessary to categorize them based on predetermined categories. This research employs a Natural Language Processing (NLP) approach and implements the Long Short-Term Memory (LSTM) architecture. The study was conducted using several testing scenarios, including (1) hyperparameter tuning of the learning rate to 0.01 and 0.001, (2) the application and omission of stemming, and (3) various dataset comparison ratios of 60:40, 70:30, 80:20, and 90:10. The evaluation utilized a dataset of 10,000 articles across 5 categories and was measured using accuracy, precision, recall, and f-measure metrics. From the three scenarios, seven training models were generated. The second model, with a learning rate of 0.001, without stemming, and a 90:10 dataset ratio, achieved the highest accuracy of 90.7%, with average precision, recall, and f-measure scores of 91%. The third and fourth models, which applied stemming, did not demonstrate a performance improvement, both yielding an accuracy of 89%. The fifth model, with a 60:40 dataset ratio, produced an accuracy of 90%, while the sixth and seventh models, with 70:30 and 80:20 ratios, resulted in accuracies of 79% and 88%, respectively.
Copyrights © 2025