KLIK: Kajian Ilmiah Informatika dan Komputer
Vol. 4 No. 5 (2024): April 2024

Comparative Analysis of DT and SVM Model Performance with SMOTE in Sentiment Classification

Yerik Afrianto Singgalen (Universitas Katolik Indonesia Atma Jaya, Jakarta)



Article Info

Publish Date
30 Apr 2024

Abstract

This research investigates the efficacy of employing the Cross-Industry Standard Process for Data Mining (CRISP-DM) framework to analyze sentiment classification models. The study focuses on evaluating the performance of Decision Trees (DT) and Support Vector Machine (SVM) models integrated with the Synthetic Minority Over-sampling Technique (SMOTE) across various performance metrics, including accuracy, precision, recall, f-measure, and Area Under the Curve (AUC). Using CRISP-DM, the research ensures a systematic data preprocessing, modeling, and evaluation approach. The findings reveal that both DT and SVM models with SMOTE achieve high accuracy rates, with DT yielding an accuracy of 98.37% +/- 0.48% and SVM achieving 98.91% +/- 0.59%. These models effectively distinguish between positive and negative sentiments, as precision, recall, and f-measure scores indicate. Additionally, the AUC scores underscore the robustness of the models in sentiment analysis tasks. These results highlight the potential of CRISP-DM as a structured methodology for sentiment classification research, providing insights into the performance of different machine learning algorithms in handling imbalanced datasets. Based on these findings, it is recommended that future studies further explore the application of CRISP-DM in sentiment analysis tasks and investigate the scalability of DT and SVM models with SMOTE in larger datasets.

Copyrights © 2024






Journal Info

Abbrev

klik

Publisher

Subject

Computer Science & IT

Description

Topik utama yang diterbitkan mencakup: 1. Teknik Informatika 2. Sistem Informasi 3. Sistem Pendukung Keputusan 4. Sistem Pakar 5. Kecerdasan Buatan 6. Manajemen Informasi 7. Data Mining 8. Big Data 9. Jaringan Komputer 10. Dan lain-lain (topik lainnya yang berhubungan dengan Teknologi Informati dan ...