Science in Information Technology Letters
Vol 3, No 2 (2022): November 2022

Text classification of traditional and national songs using naïve bayes algorithm

Simbolon, Triyanti (Unknown)
Wibawa, Aji Prasetya (Unknown)
Zaeni, Ilham Ari Elbaith (Unknown)
Ismail, Amelia Ritahani (Unknown)



Article Info

Publish Date
30 Nov 2022

Abstract

In this research, we investigate the effectiveness of the multinomial Naïve Bayes algorithm in the context of text classification, with a particular focus on distinguishing between folk songs and national songs. The rationale for choosing the Naïve Bayes method lies in its unique ability to evaluate word frequencies not only within individual documents but across the entire dataset, leading to significant improvements in accuracy and stability. Our dataset includes 480 folk songs and 90 national songs, categorized into six distinct scenarios, encompassing two, four, and 31 labels, with and without the application of Synthetic Minority Over-sampling Technique (SMOTE). The research journey involves several essential stages, beginning with pre-processing tasks such as case folding, punctuation removal, tokenization, and TF-IDF transformation. Subsequently, the text classification is executed using the multinomial Naïve Bayes algorithm, followed by rigorous testing through k-fold cross-validation and SMOTE resampling techniques. Notably, our findings reveal that the most favorable scenario unfolds when SMOTE is applied to two labels, resulting in a remarkable accuracy rate of 93.75%. These findings underscore the prowess of the multinomial Naïve Bayes algorithm in effectively classifying small data label categories.

Copyrights © 2022






Journal Info

Abbrev

sitech

Publisher

Subject

Computer Science & IT

Description

Science in Information Technology Letters (SITech) aims to keep abreast of the current development and innovation in the area of Science in Information Technology as well as providing an engaging platform for scientists and engineers throughout the world to share research results in related ...