Claim Missing Document
Check
Articles

Found 15 Documents
Search

IndoBERT Optimization for Sentiment Analysis on DeepSeek App Reviews Sunan, Muh.; Resiloy, Unique Desyrre A.; Endriani, Desy; Suhaeni, Cici; Sartono, Bagus; Dito, Gerry Alfa
IJCCS (Indonesian Journal of Computing and Cybernetics Systems) Vol 20, No 1 (2026): January
Publisher : IndoCEISS in colaboration with Universitas Gadjah Mada, Indonesia.

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22146/ijccs.107507

Abstract

In the digital era, sentiment analysis is important to evaluate public opinion, especially in the context of Play Store apps with Indonesian-language reviews. This research aims to improve the performance of the IndoBERT model in sentiment analysis of DeepSeek app reviews by using data augmentation and hyperparameter tuning techniques. Data augmentation is done through the back-translation technique, while the hyperparameters tested include the number of epochs, learning rate, and batch size. Experimental results show that the combination of data augmentation with epoch 10, learning rate 2e-5, and batch size 16 produces the highest accuracy of 93.95% and F1-score of 0.94, with better stability than the model without augmentation. The model without augmentation showed fluctuations in performance, indicating overfitting in some configurations. These findings confirm the importance of applying augmentation techniques and hyperparameter tuning in improving the accuracy and stability of sentiment analysis models, and contribute to the development of NLP models for Indonesian and other resource-constrained languages.
DETECTION OF ADULTERATION IN COCONUT MILK USING CUCKOO SEARCH-OPTIMIZED XGBOOST ON HIGH-DIMENSIONAL FTIR SPECTRAL DATA Sentana Putra, I Gusti Ngurah; Sadik, Kusman; Soleh, Agus Mohamad; Suhaeni, Cici
JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika) Vol 10, No 3 (2025)
Publisher : STKIP PGRI Tulungagung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29100/jipi.v10i3.8376

Abstract

Coconut milk adulteration is an important issue because it can reduce food quality and endanger consumers. This study aims to develop a rapid and accurate detection method for coconut milk adulteration using a combination of FTIR spectroscopy technology and the XGBoost machine learning algorithm optimized with the Cuckoo Search Algorithm (CSA). FTIR spectral data from traditional and instant coconut milk samples were analyzed using Standard Normal Variate (SNV) and Savitzky-Golay (SG) preprocessing to reduce noise and clarify spectral features. The XGBoost model was then optimized through CSA with hyperparameter tuning. The results showed that the combination of SNV+SG preprocessing increased the model accuracy by 84.44%, with a precision of 92.73% and an F1-score of 79.94%. In addition, CSA optimization provided a 19.7% increase in accuracy compared to the model without tuning. These findings prove the effectiveness of the CSA-XGBoost approach in analyzing high-dimensional spectral data and is a potential solution in efficiently detecting the authenticity of coconut milk. In conclusion, this approach has the potential to be widely applied to test the authenticity of other food products quickly, non-destructively and accurately.
KAJIAN SIMULASI PENDUGAAN SELANG KEPERCAYAAN BOOTSTRAP BAGI ARAH MEDIAN DATA SIRKULAR Suhaeni, Cici; Sumertajaya, I Made; Djuraidah, Anik
Indonesian Journal of Statistics and Applications Vol 2 No 1 (2018)
Publisher : Statistics and Data Science Program Study, SSMI, IPB University, in collaboration with the Forum Pendidikan Tinggi Statistika Indonesia (FORSTAT) and the Ikatan Statistisi Indonesia (ISI)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v2i1.64

Abstract

The median direction is one of central tendency of circular data. The estimation process usually requires information about sampling distribution of statistic that want to be used as a parameter estimate. Theoretically, sampling distribution derived from population distribution. But, it is not easy to get sampling distribution of median although the population distribution is known. When the sampling distribution cannot be derived easily from population distribution, the bootstrap method can be an alternative to handle it. This study wants to evaluate the effect of increasing concentration parameter to the performance of bootstrap confidence interval estimation for median direction through simulation study. Three methods were used to estimate the interval which are equal-tailed arc (ETA), symmetric arc (SYMA), and likelihood-based arc (LBA). The most important criterion to evaluate them were true coverage and interval width. The simulation results that in general, the increasing of concentration parameter followed by more narrow interval. For small concentration parameter (k<1), all methods give unstable true coverage and interval width. The authors also identify that those three methods produce intervals with identical width when the parameter concentration is 20 or more. In terms of coverage and interval width, the best method was ETA.
Bayesian Neural Network untuk Prediksi Diabetes: Uncertainty Quantification dalam Machine Learning Kamila, Sabrina Adnin; Sadik, Kusman; Suhaeni, Cici; Soleh, Agus Mohamad
Indonesian Journal of Applied Statistics Vol 9, No 1 (2026)
Publisher : Universitas Sebelas Maret

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.13057/ijas.v9i1.103994

Abstract

Penelitian ini bertujuan mengevaluasi dan membandingkan kinerja tiga model machine learning, yaitu random forest (RF), feedforward neural network (FNN), dan bayesian neural network (BNN), dalam klasifikasi diabetes menggunakan Diabetes Health Indicators Dataset dari UCI Machine Learning Repository yang memiliki ketidakseimbangan kelas. Prapemrosesan data meliputi normalisasi fitur menggunakan StandardScaler dan penanganan ketidakseimbangan kelas dengan synthetic minority over-sampling technique (SMOTE). Evaluasi model dilakukan menggunakan metrik akurasi dan skor F1, yang didukung oleh classification report dan confusion matrix. Hasil evaluasi menunjukkan bahwa RF menghasilkan akurasi tinggi (0,8493) namun skor F1 yang rendah (0,3386), yang mengindikasikan rendahnya sensitivitas model terhadap kasus positif diabetes. FNN memberikan performa yang lebih seimbang dengan skor F1 sebesar 0,4490 setelah penyesuaian threshold optimal. Sementara itu, BNN mencapai akurasi 0,8498 dan skor F1 sebesar 0,4043, serta memiliki keunggulan tambahan berupa kemampuan mengukur ketidakpastian prediksi melalui pendekatan Monte Carlo Dropout. Dengan demikian, FNN lebih unggul dalam keseimbangan klasifikasi, sementara BNN lebih relevan untuk aplikasi medis yang membutuhkan informasi tingkat kepercayaan prediksi guna mendukung pengambilan keputusan klinis yang lebih andal.This study aims to evaluate and compare the performance of three machine learning models, namely random forest (RF), feedforward neural network (FNN), and bayesian neural network (BNN), for diabetes classification using the Diabetes Health Indicators Dataset from the UCI Machine Learning Repository, which exhibits significant class imbalance. Data preprocessing includes feature normalization using StandardScaler and class imbalance handling through synthetic minority over-sampling technique (SMOTE). Model performance is evaluated using accuracy and F1-score metrics, supported by classification report and confusion matrix analysis. The results show that RF achieves high accuracy (0.8493) but a low F1-score (0.3386), indicating poor sensitivity to positive diabetes cases. FNN provides more balanced performance with an F1-score of 0.4490 after optimal threshold adjustment. Meanwhile, BNN achieves an accuracy of 0.8498 and F1-score of 0.4043, while offering the additional advantage of uncertainty quantification through Monte Carlo Dropout. Therefore, FNN is more effective for balanced classification performance, while BNN is more suitable for medical applications that require prediction confidence information to support more reliable and informed clinical decision-making.Kata Kunci: Prediksi diabetes, kuantifikasi ketidakpastian, bayesian neural network, classification imbalance, machine learning.Keywords: Diabetes prediction, uncertainty quantification, bayesian neural network, classification imbalance, machine learning.
Deteksi Polycystic Ovary Syndrome (PCOS) Berbasis Machine Learning: Kombinasi SMOTE, Random Forest, Gradient Boosting, dan Bayesian Optimization Alfiryal, Naufalia; Sadik, Kusman; Suhaeni, Cici; Soleh, Agus Mohamad
Indonesian Journal of Applied Statistics Vol 8, No 2 (2025)
Publisher : Universitas Sebelas Maret

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.13057/ijas.v8i2.109931

Abstract

Polycystic ovary syndrome (PCOS) merupakan gangguan endokrin yang umum terjadi pada wanita usia reproduktif. Kondisi ini dapat menyebabkan gangguan ovulasi, ketidakseimbangan hormon, resistensi insulin, serta meningkatkan risiko penyakit kardiovaskular, obesitas, dan gangguan psikologis. Meskipun prevalensinya cukup tinggi, sekitar 75% kasus PCOS masih belum terdiagnosis dalam praktik klinis akibat kompleksitas gejala dan keterbatasan metode diagnosis yang digunakan saat ini. Untuk mengatasi permasalahan tersebut, penelitian ini mengusulkan pendekatan berbasis machine learning guna meningkatkan akurasi dan efisiensi deteksi PCOS. Penelitian ini membandingkan performa dua algoritma pembelajaran terawasi, yaitu random forest dan gradient boosting, dalam melakukan prediksi PCOS. Dataset yang digunakan diperoleh dari repositori publik dan memuat berbagai fitur klinis yang berkaitan dengan PCOS. Untuk menangani permasalahan ketidakseimbangan kelas, metode synthetic minority over-sampling technique (SMOTE) diterapkan pada data pelatihan. Selain itu, bayesian optimization digunakan untuk melakukan penyetelan hiperparameter pada masing-masing model agar diperoleh performa yang optimal. Evaluasi performa model dilakukan menggunakan beberapa metrik, dengan area under the curve–receiver operating characteristic (AUC-ROC) sebagai metrik utama. Hasil penelitian menunjukkan bahwa model Gradient Boosting memberikan performa terbaik dengan nilai AUC sebesar 0,8983 dan nilai recall sebesar 0,95, yang mengindikasikan sensitivitas tinggi dalam mengidentifikasi kasus PCOS. Temuan ini menunjukkan bahwa kombinasi SMOTE dan bayesian optimization efektif dalam meningkatkan akurasi prediksi, khususnya pada dataset medis yang tidak seimbang. Pendekatan yang diusulkan memiliki potensi untuk diintegrasikan ke dalam sistem pendukung keputusan klinis guna mendukung proses skrining PCOS yang lebih dini dan andal.Polycystic ovary syndrome (PCOS) is a common endocrine disorder among reproductive-aged women. This condition can lead to ovulatory dysfunction, hormonal imbalance, insulin resistance, and an increased risk of cardiovascular disease, obesity, and psychological disorders. Despite its high prevalence, approximately 75% of PCOS cases remain undiagnosed in clinical settings due to the complexity of symptoms and limitations of current diagnostic methods. To address this issue, a machine learning-based approach is proposed to improve the accuracy and efficiency of PCOS detection. This study compares the performance of two supervised learning algorithms random forest and gradient boosting for PCOS prediction. The dataset used was obtained from a public repository and contains various clinical features associated with PCOS. To address the class imbalance problem, the synthetic minority over-sampling technique (SMOTE) was applied to the training data. Additionally, bayesian optimization was employed to fine-tune the hyperparameters of each model for optimal performance. Model performance was evaluated using several metrics, with the area under the curve–receiver operating characteristic (AUC-ROC) as the primary measure. The Gradient Boosting model achieved the best results, with an AUC of 0.8983 and a recall of 0.95, indicating high sensitivity in identifying positive PCOS cases. These findings demonstrate that the combination of SMOTE and Bayesian Optimization is effective in enhancing predictive accuracy, especially in imbalanced medical datasets. The proposed approach shows promise for integration into clinical decision-support systems to facilitate earlier and more reliable PCOS screening.Kata Kunci: Bayesian optimization; gradient boosting; PCOS; random forest; SMOTE.Keywords : Bayesian optimization; gradient boosting; PCOS; random forest; SMOTE.