This Author published in this journals
All Journal Jurnal INFOTEL
Fornieli Gulo
Immanuel Christian University Yogyakarta, Indonesia

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Data preprocessing approach for machine learning-based sentiment classification Sunneng Sandino Berutu; Haeni Budiati; Jatmika Jatmika; Fornieli Gulo
JURNAL INFOTEL Vol 15 No 4 (2023): November 2023
Publisher : LPPM INSTITUT TEKNOLOGI TELKOM PURWOKERTO

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.20895/infotel.v15i4.1030

Abstract

Public sentiment regarding a particular issue, product, activity, or organization can be measured and monitored with an application based on artificial intelligence. The data come from comments circulating on social media. However, the rules for writing comments on social media have yet to be standardized, so non-standard words often appear in these comments. Non-standard words affect the determination of sentiment into positive, negative, and neutral categories. Therefore, this study proposes a data preprocessing approach by inserting the Rabin-Karp algorithm to improve non-standard words. This research consists of several stages, namely crawling data, data preprocessing, feature extraction, model development (based on Naïve Bayes (NB), Support Vector Machine (SVM), and Decision Tree (DT) methods), and analysis of the results. The experimental results showed that the proposed approach influences the determination of the sentiment category composition. Then, model testing results showed that all models obtain the highest value in the Positive category for the precision parameter with a value 1. All models in the Neutral category obtain the highest value for the recall parameter, almost reaching 1. All models in the Neutral category achieve the highest value of the f1-score parameter, with an average value of 0.95. In general, the results of the performance analysis of the classification model showed that the NB and SVM-based models have better performance than the DT method.