Engineering Science Letter
Vol. 5 No. 02 (2026): In Press - Engineering Science Letter

Subject Area Classification of Journal Articles Based on Metadata Using Bag of Words and Naïve Bayes

Ainunna’imah (Universitas Ahmad Dahlan)
Herman Yuliansyah (Indonesia)
Imam Riadi (Universitas Ahmad Dahlan)



Article Info

Publish Date
13 Jun 2026

Abstract

The rapid growth of scientific publications poses challenges in grouping journal articles based on subject area, especially when using metadata such as titles, abstracts, and keywords. However, differences in feature representation and classification algorithms often result in varying performance, requiring comparative studies to determine the optimal model combination. This study compares four combinations of subject area classification models, namely TF-IDF + Naïve Bayes, TF-IDF + Support Vector Machine, Bag-of-Words + Support Vector Machine, and Bag-of-Words + Naïve Bayes. The research process included text preprocessing, feature extraction, and testing using an 80% training and 20% testing data split scheme in five scenarios. The evaluation was performed using confusion matrices, accuracy, precision, recall, and F1-score. The experimental results showed variations in performance between models, with an average F1-score of 0.8103 for TF-IDF + Naïve Bayes, 0.8494 for TF-IDF + Support Vector Machine, 0.8297 for Bag-of-Words + Support Vector Machine, and 0.8335 for Bag-of-Words + Naïve Bayes as the best performance. These findings indicate that a word frequency-based approach combined with Naïve Bayes is effective for classifying journal article subject areas based on metadata, although challenges remain in subject areas with semantic proximity.

Copyrights © 2026






Journal Info

Abbrev

ESL

Publisher

Subject

Computer Science & IT Control & Systems Engineering Engineering Industrial & Manufacturing Engineering Materials Science & Nanotechnology

Description

Engineering Science Letter is an international peer-reviewed letter that welcomes short original research submissions on any branch of engineering, computer science, and technology, as well as their applications in industry, education, health, business, and other fields. Artificial intelligence, ...