Books play an essential role in life as a source of knowledge and information. The increasing number of books published makes classification more complex, especially in a multi-label context where a book may belong to more than one genre. Furthermore, automatic classification of book genres is required due to the transition of books to e-book and audiobook formats. This research analyzes the application of machine learning techniques using Support Vector Machine (SVM), Logistic Regression (LR), and Multinomial Naive Bayes (MNB) for multi-label book genre classification by comparing their performance through stemming and unstemming process in text preprocessing with TF-IDF and K-Fold cross-validation (k = 10). In addition, two problem transformation methods, Binary Relevance (BR) and Label Powerset (LP), are evaluated. The results show that SVM combined with stemming outperforms other models across all metrics of accuracy, precision, recall, and F1-score. SVM is effective in handling complex and imbalanced data distributions, resulting in more accurate and consistent predictions. The stemming process positively contributes by reducing word variation and allowing the model to focus on word meanings. Among problem transformation methods, LP yields better results because it can capture relationships between labels more effectively than BR.
Copyrights © 2025