Claim Missing Document
Check
Articles

SAER : Comparison of Rule Prediction Algorithms on Constructing a Corpus for Taxation Related Tweet Aspect-Based Sentiment Analysis Sopian, Annisa Mufidah; Ilyas, Ridwan; Kasyidi, Fatan; Hadiana, Asep Id
JOIN (Jurnal Online Informatika) Vol 9 No 1 (2024)
Publisher : Department of Informatics, UIN Sunan Gunung Djati Bandung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15575/join.v9i1.1275

Abstract

Twitter is a popular social media in Indonesia, and sentiment analysis on Twitter has an important role in measuring public trust, especially in taxation issues. Aspect extraction is an important task in sentiment analysis. In this research, we propose SAER, a Syntactic Aspect-opinion Extraction and Rule prediction, that used language rule-based approach using syntactic features for aspect and opinion extraction, and we compare several algorithm for rule prediction such as Random Forest Regression, Decision Tree Regression, K-Nearest Neighbor Regression (KNN), Linear Regression, Support Vector Regression (SVR), and Extreme Gradient Boosting Regression (XGBoost) that can generate rules with a tree-based approach. By employing syntactic features and rule prediction, it has been able to explore important features in a sentence. In rule prediction, comparison results show that Support Vector Regression (SVR) was identified as the most effective model for aspects rule prediction, providing the best results with a Mean Squared Error (MSE) of 0.022, Root Mean Squared Error (RMSE) of 0.150, and Mean Absolute Error (MAE) of 0.123. While XGBoost was identified as the most effective model for opinions rule prediction, with MSE of 0.013, RMSE of 0.117, and MAE of 0.075. Since we used syntactic feature-based approaches and rule prediction in this work, it is expected to be implemented for other cases, with other domain datasets.
Prediksi Penyakit Diabetes menggunakan Teknik Imputasi Missforest dan Klasifikasi LightGBM FERDIANSYAH, ALDOVA; UMBARA, FAJRI RAKHMAT; KASYIDI, FATAN
MIND (Multimedia Artificial Intelligent Networking Database) Journal Vol 10, No 2 (2025): MIND Journal
Publisher : Institut Teknologi Nasional Bandung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26760/mindjournal.v10i2.221-234

Abstract

AbstrakDiabetes adalah salah satu penyakit kronis dengan grafik prevalensinya meningkat secara global. Penyakit ini disebabkan oleh gangguan metabolisme tubuh yang memengaruhi kadar gula darah, dan jika tidak ditangani sejak dini dapat menimbulkan komplikasi serius seperti stroke, gagal ginjal, kebutaan, hingga kematian. Penelitian ini mengembangkan model prediksi risiko diabetes berbasis klasifikasi biner menggunakan algoritma LightGBM yang dikombinasikan dengan teknik imputasi Missforest untuk menangani data yang hilang. Dataset yang digunakan berasal dari Pima Indian, tersedia secara publik di Kaggle. Tahapan pre-processing mencakup imputasi data hilang, penanganan outlier dengan Isolution Forest, pembagian data menjadi 80:20. Evaluasi model menunjukkan hasil akurasi sebesar 91,84% dan ROC AUC 0.9614. BMI menjadi faktor paling berpengaruh dalam prediksi yang diikuti oleh DiabetesPedigreeFunction dan Glucose.Kata kunci: diabetes melitus, data mining, klasifikasi, LightGBM, missforestAbstractDiabetes mellitus is one of the most common chronic diseases, with a globally increasing prevalence. It is caused by metabolic disorders that affect blood glucose levels and, if not treated early, can lead to serious complications such as stroke, kidney failure, blindness, and even death. This research develops a diabetes risk prediction model based on binary classification using the LightGBM algorithm combined with the Missforest imputation technique to handle missing data. The dataset used is the publicly available Pima Indian dataset from Kaggle. The pre-processing stages include missing value imputation, outlier handling using Isolution Forest, an 80:20 data split. Model evaluation shows an accuracy of 91.84% and a ROC AUC 0.9614. BMI was found to be the most influential factor in the prediction, followed by DiabetesPedigreeFunction and Glucose.Keywords: diabetes mellitus, data mining, classification, LightGBM, missforest