Claim Missing Document
Check
Articles

Found 2 Documents
Search

Subject Area Classification of Journal Articles Based on Metadata Using Bag of Words and Naïve Bayes Ainunna’imah; Herman Yuliansyah; Imam Riadi
Engineering Science Letter Vol. 5 No. 02 (2026): In Press - Engineering Science Letter
Publisher : The Indonesian Institute of Science and Technology Research

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56741/IISTR.esl.002041

Abstract

The rapid growth of scientific publications poses challenges in grouping journal articles based on subject area, especially when using metadata such as titles, abstracts, and keywords. However, differences in feature representation and classification algorithms often result in varying performance, requiring comparative studies to determine the optimal model combination. This study compares four combinations of subject area classification models, namely TF-IDF + Naïve Bayes, TF-IDF + Support Vector Machine, Bag-of-Words + Support Vector Machine, and Bag-of-Words + Naïve Bayes. The research process included text preprocessing, feature extraction, and testing using an 80% training and 20% testing data split scheme in five scenarios. The evaluation was performed using confusion matrices, accuracy, precision, recall, and F1-score. The experimental results showed variations in performance between models, with an average F1-score of 0.8103 for TF-IDF + Naïve Bayes, 0.8494 for TF-IDF + Support Vector Machine, 0.8297 for Bag-of-Words + Support Vector Machine, and 0.8335 for Bag-of-Words + Naïve Bayes as the best performance. These findings indicate that a word frequency-based approach combined with Naïve Bayes is effective for classifying journal article subject areas based on metadata, although challenges remain in subject areas with semantic proximity.
Model Klasifikasi Kesesuaian Artikel Pada Jurnal SINTA Berdasarkan Metadata Menggunakan Term Frequency–Inverse Document Frequency dan Naïve Bayes Ainunna’imah; Imam Riadi; Herman Yuliansyah
SemanTIK : Teknik Informasi Vol. 12 No. 1 (2026): Volume 12 Number 1 (January-june 2026)
Publisher : Informatics Engineering Department of Halu Oleo University

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.55679/semantik.v12i1.270

Abstract

Pertumbuhan publikasi ilmiah di Indonesia menimbulkan kebutuhan akan metode yang efisien untuk membantu peneliti mengklasifikasikan dan menentukan jurnal yang sesuai bagi artikel akademik. Berbagai penelitian menunjukkan bahwa metode machine learning, khususnya Naïve Bayes, efektif dalam tugas klasifikasi teks berbahasa Indonesia. Namun, penelitian yang secara khusus memanfaatkan metadata artikel untuk menentukan kesesuaian artikel terhadap jurnal terindeks SINTA masih terbatas, khususnya terkait integrasi TF–IDF dan evaluasi berbasis cross-validation. Penelitian ini bertujuan mengembangkan model klasifikasi kesesuaian artikel pada jurnal SINTA berdasarkan metadata menggunakan Term Frequency–Inverse Document Frequency dan Naïve Bayes. Dataset terdiri atas 1.200 metadata artikel mencakup judul, abstrak, dan kata kunci, yang dikumpulkan melalui crawling manual terhadap jurnal-jurnal bidang teknologi pada portal SINTA. Tahapan penelitian meliputi pengumpulan data, prapemrosesan teks (case folding, translasi, tokenisasi, stopword removal, dan stemming), penggabungan metadata, ekstraksi fitur menggunakan TF–IDF, serta penerapan algoritma Naïve Bayes dengan skema 5-fold cross-validation. Evaluasi berdasarkan confusion matrix menunjukkan bahwa model mencapai accuracy 0,7058, precision 0,6977, recall 0,7133, dan F1-score 0,7065. Hasil ini menegaskan bahwa Naïve Bayes mampu memberikan performa klasifikasi yang cukup baik terhadap metadata artikel, serta berpotensi mendukung pengembangan sistem rekomendasi target submission jurnal The rapid growth of scientific publications in Indonesia has created the need for efficient methods to assist researchers in classifying and determining suitable journals for academic articles. Previous studies have shown that machine learning methods, particularly Naïve Bayes, are effective for various Indonesian text classification tasks. However, research specifically utilizing article metadata to determine the suitability of articles for SINTA-indexed journals remains limited, especially regarding the integration of TF–IDF features and cross-validation–based evaluation. This study aims to develop a classification model for determining article–journal suitability within SINTA using Term Frequency–Inverse Document Frequency and the Naïve Bayes algorithm. The dataset consists of 1,200 article metadata entries, including titles, abstracts, and keywords, collected through manual crawling of technology-related journals listed on the SINTA portal. The research stages include data collection, text preprocessing (case folding, translation, tokenization, stopword removal, and stemming), metadata merging, feature extraction using TF–IDF, and the implementation of Naïve Bayes with a 5-fold cross-validation scheme. Evaluation using confusion matrix metrics shows that the model achieved an accuracy of 0.7058, precision of 0.6977, recall of 0.7133, and an F1-score of 0.7065. These results indicate that Naïve Bayes provides a reasonably strong classification performance on article metadata and has potential application in journal submission recommendation systems