Claim Missing Document
Check
Articles

Found 1 Documents
Search

Subject Area Classification of Journal Articles Based on Metadata Using Bag of Words and Naïve Bayes Ainunna’imah; Herman Yuliansyah; Imam Riadi
Engineering Science Letter Vol. 5 No. 02 (2026): In Press - Engineering Science Letter
Publisher : The Indonesian Institute of Science and Technology Research

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56741/IISTR.esl.002041

Abstract

The rapid growth of scientific publications poses challenges in grouping journal articles based on subject area, especially when using metadata such as titles, abstracts, and keywords. However, differences in feature representation and classification algorithms often result in varying performance, requiring comparative studies to determine the optimal model combination. This study compares four combinations of subject area classification models, namely TF-IDF + Naïve Bayes, TF-IDF + Support Vector Machine, Bag-of-Words + Support Vector Machine, and Bag-of-Words + Naïve Bayes. The research process included text preprocessing, feature extraction, and testing using an 80% training and 20% testing data split scheme in five scenarios. The evaluation was performed using confusion matrices, accuracy, precision, recall, and F1-score. The experimental results showed variations in performance between models, with an average F1-score of 0.8103 for TF-IDF + Naïve Bayes, 0.8494 for TF-IDF + Support Vector Machine, 0.8297 for Bag-of-Words + Support Vector Machine, and 0.8335 for Bag-of-Words + Naïve Bayes as the best performance. These findings indicate that a word frequency-based approach combined with Naïve Bayes is effective for classifying journal article subject areas based on metadata, although challenges remain in subject areas with semantic proximity.