Claim Missing Document
Check
Articles

Found 22 Documents
Search

PERFORMANCE ANALYSIS OF MACHINE LEARNING AND INDOBERT IN CLASSIFYING SENTIMENTS ON INDONESIA'S FREE NUTRITIOUS MEAL Maulyanda; Nazhifah, Sri Azizah; Pane, Syafrial Fachri; Irvanizam, Irvanizam
CYBERSPACE: Jurnal Pendidikan Teknologi Informasi Vol 10 No 1 (2026)
Publisher : Universitas Islam Negeri Ar-Raniry Banda Aceh

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22373/cj.v10i1.33886

Abstract

Natural Language Processing (NLP) is a branch of artificial intelligence that is widely used to analyze whether a sentence contains positive, negative, or neutral sentiment, particularly in the context of expressing opinions in the online environment. This study compares several models to identify the most optimal one, namely Naïve Bayes, Support Vector Machine (SVM), XGBoost, and IndoBERT. The dataset used in this research was obtained from Kaggle and consists of 5,644 data points in the neutral class, 2,934 data points in the positive class, and 2,606 data points in the negative class. Prior to model implementation, the dataset underwent a preprocessing stage that included case folding, cleansing, tokenization, stemming, and stopword removal. Subsequently, the data were trained using the four aforementioned methods. The results indicate that Naïve Bayes achieved an accuracy of 75%, SVM reached 79%, XGBoost obtained 76%, while IndoBERT achieved the highest accuracy at 85%. Therefore, it can be concluded that, using this dataset, IndoBERT performed sentiment classification very effectively.
Pengaruh Metode Seleksi Fitur terhadap Akurasi Model SVM dalam Klasifikasi Customer Churn pada Perusahaan Telekomunikasi Rohmaniar, Mayke Andani; Habibi, Roni; Pane, Syafrial Fachri
IJAI (Indonesian Journal of Applied Informatics) Vol 9, No 1 (2024)
Publisher : Universitas Sebelas Maret

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.20961/ijai.v9i1.92983

Abstract

Abstrak:Penelitian ini menganalisis pengaruh metode seleksi fitur terhadap akurasi model Support Vector Machine dalam memprediksi pelanggan di industri telekomunikasi. Empat metode seleksi fitur (Correlation Matrix, PCA, dan GA) dan empat kernel (Linear, Polynomial, RBF, dan Sigmoid) dibandingkan menggunakan dataset pelanggan telekomunikasi dari Kaggle dengan 7043 entri dan 33 fitur. Metodologi CRISP-DM digunakan, meliputi Pemahaman Bisnis, Pemahaman Data, Persiapan Data, Pemodelan, Evaluasi, dan Implementasi. Hasil penelitian menunjukkan bahwa metode seleksi fitur menggunakan Correlation Matrix dengan kernel Linear memberikan kinerja terbaik. Model ini mencapai akurasi tertinggi sebesar 92,48%, dengan precision 0,93, recall 0,97, dan f1-score 0,95. Metode seleksi fitur lainnya, seperti PCA dan GA, memberikan hasil yang lebih rendah dibandingkan dengan Correlation Matrix. Implementasi model prediksi yang akurat diharapkan dapat membantu perusahaan telekomunikasi mengembangkan strategi retensi pelanggan yang lebih efektif.=================================================Abstract:This study examines the impact of various feature selection methods on the accuracy of the Support Vector Machine (SVM) model in predicting customer behavior within the telecommunications sector. Specifically, the research compares four feature selection techniques: Correlation Matrix, Principal Component Analysis (PCA), and Genetic Algorithm (GA). Additionally, it evaluates the performance of four SVM kernels: Linear, Polynomial, Radial Basis Function (RBF), and Sigmoid. Utilizing a telecom customer dataset from Kaggle, which comprises 7043 entries and 33 features, the study adheres to the CRISP-DM methodology. This methodology includes phases such as Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Implementation. The findings indicate that the Correlation Matrix feature selection method, when paired with the Linear kernel, provides the best performance. This particular configuration achieves the highest accuracy rate of 92.48%, along with a precision score of 0.93, a recall score of 0.97, and an F1-score of 0.95. In contrast, other feature selection methods, such as PCA and GA, result in lower performance metrics. These findings underscore the effectiveness of the Correlation Matrix and Linear kernel combination in enhancing the predictive accuracy of SVM models.