Claim Missing Document
Check
Articles

Found 6 Documents
Search

Regularisasi model pembelajaran mesin dengan regresi terpenalti pada data yang mengandung multikolinearitas (Studi kasus prediksi Indeks Pembangunan Manusia di 34 provinsi di Indonesia) Khamidah, Nur; Sadik, Kusman; M Soleh, Agus; Dito, Gerry Alfa
Majalah Ilmiah Matematika dan Statistika Vol. 24 No. 1 (2024): Majalah Ilmiah Matematika dan Statistika
Publisher : Jurusan Matematika FMIPA Universitas Jember

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.19184/mims.v24i1.40360

Abstract

This research intends to model high-dimensional data that contains multicollinearity in four machine-learning algorithms: Random Forest, K-Nearest Neighbor, XGBoost, and Regression Tree. Previously, regularization was carried out with penalized ridge regression, least absolute shrinkage and selection operator (LASSO) regression, and Elastic Net regression. A total of 100 predictor variables and 1 response variable which are the Development Index 2022 data of 34 provinces in Indonesia from BPS were used and standardized. The simulation is also applied to highly correlated data on two distributions, uniform and normal with parameter values taken from existing empirical data. The results showed that the ridge regularization method is the best for producing accurate and stable predictions. Furthermore, there was no difference in the root mean square error (RMSE) results between the data with standardization and without standardization, wherein all the data analyzed it was found that the kNN model was better than other models on simulation data, and the Random Forest and XGBoost models were better than other models on empirical data. In addition, the Regression Tree model is not recommended according to the results of this study. Keywords: regularization, multicollinearity, ridge, LASSO, elastic netMSC2020: 62J07
Evaluating Fasttext and Glove Embeddings for Sentiment Analysis of AI-Generated Ghibli-Style Images Sentana Putra, I Gusti Ngurah; Yusran, Muhammad; Sari, Jefita Resti; Suhaeni, Cici; Sartono, Bagus; Dito, Gerry Alfa
Journal of Applied Informatics and Computing Vol. 9 No. 5 (2025): October 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i5.10600

Abstract

The development of text-to-image generation technology based on artificial intelligence has triggered mixed public reactions, especially when applied to iconic visual styles such as Studio Ghibli. This research aims to evaluate public sentiment towards the phenomenon of Ghibli-style AI images by comparing two static word embedding methods, namely FastText and GloVe, on three classification algorithms: Logistic Regression, Random Forest, and Convolutional Neural Network (CNN). Data in the form of Indonesian tweets were collected from Twitter using hashtags such as #ghibli, #ghiblistyle, and #hayaomiyazaki during the period 25 March to 25 April 2025. Each tweet was manually labelled with positive or negative sentiment, then preprocessed and represented using pre-trained FastText and GloVe embeddings. Evaluation was conducted using accuracy, precision, recall, and F1-score metrics, both macro and weighted. Results showed that FastText consistently performed the best on most models, especially in terms of precision and overall accuracy, thanks to its ability to handle sub-word information and spelling variations in social media texts. The combination of CNN with FastText yielded the highest performance with a macro F1-score of 76.56% and accuracy of 84.69%. However, GloVe still showed competitive performance in recall on the Logistic Regression model, making it relevant for contexts that prioritise sentiment detection coverage. This study emphasizes the importance of selecting embeddings and models that are appropriate to the characteristics of the data and the purpose of the analysis in informal social media-based sentiment classification.
Performance Comparative Study of Machine Learning Classification Algorithms for Food Insecurity Experience by Households in West Java Khikmah, Khusnia Nurul; Sartono, Bagus; Susetyo, Budi; Dito, Gerry Alfa
JOIN (Jurnal Online Informatika) Vol 9 No 1 (2024)
Publisher : Department of Informatics, UIN Sunan Gunung Djati Bandung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15575/join.v9i1.1012

Abstract

This study aims to compare the classification performance of the random forest, gradient boosting, rotation forest, and extremely randomized tree methods in classifying the food insecurity experience scale in West Java. The dataset used in this research is based on the Socio-Economic Survey by Statistics Indonesia in 2020. The novelty of this research is comparing the performance of the four methods used, which all are the tree ensemble approaches. In addition, due to the imbalance class problem, the authors also applied three imbalance handling techniques in this study. The results show that the combination of the random-forest algorithm and the random-under sampling technique is the best classifier. This approach has a balanced accuracy value of 65.795%. The best classification method results show that the food insecurity experience scale in West Java can be identified by considering the factors of floor area (house size), the number of depositors, type of floor, health insurance ownership status, and internet access capabilities.
Spatio-temporal Clustering Analysis of Dengue Hemorrhagic Fever Cases in West Java 2016 – 2021: Analisis Penggerombolan Spasio-temporal Kasus DBD di Jawa Barat Tahun 2016 – 2021 Yanti, Yusma; Rahardiantoro, Septian; Dito, Gerry Alfa
Indonesian Journal of Statistics and Applications Vol 7 No 1 (2023)
Publisher : Statistics and Data Science Program Study, IPB University, IPB University, in collaboration with the Forum Pendidikan Tinggi Statistika Indonesia (FORSTAT) and the Ikatan Statistisi Indonesia (ISI)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v7i1p56-63

Abstract

In 2020, WHO included dengue as a global health threat among 10 other diseases. This is also a problem in Indonesia, especially the province of West Java. Based on data from the Ministry of Health for 2022, West Java is the largest contributor to cases of Dengue Hemorrhagic Fever (DHF) in Indonesia. The spread of dengue fever is through mosquitoes, but climate also greatly influences the spread of this disease. The spread of West Java is quite wide, consisting of 27 city districts and a relatively high population density. This greatly influences the increase in the number of dengue fever cases. In this research, we will group years with the same dengue fever cases and identify groups of districts/cities in West Java with the same pattern of dengue fever cases for 2016 to 2021. The results obtained are that 2016 is the group with the highest number of cases. Meanwhile, from 27 city districts in West Java, three groups were obtained. Group 1 is the group with the highest number of cases consisting of Sukabumi City, Bandung City, Cimahi City, Depok City, Tasikmalaya City.
IndoBERT Optimization for Sentiment Analysis on DeepSeek App Reviews Sunan, Muh.; Resiloy, Unique Desyrre A.; Endriani, Desy; Suhaeni, Cici; Sartono, Bagus; Dito, Gerry Alfa
IJCCS (Indonesian Journal of Computing and Cybernetics Systems) Vol 20, No 1 (2026): January
Publisher : IndoCEISS in colaboration with Universitas Gadjah Mada, Indonesia.

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22146/ijccs.107507

Abstract

In the digital era, sentiment analysis is important to evaluate public opinion, especially in the context of Play Store apps with Indonesian-language reviews. This research aims to improve the performance of the IndoBERT model in sentiment analysis of DeepSeek app reviews by using data augmentation and hyperparameter tuning techniques. Data augmentation is done through the back-translation technique, while the hyperparameters tested include the number of epochs, learning rate, and batch size. Experimental results show that the combination of data augmentation with epoch 10, learning rate 2e-5, and batch size 16 produces the highest accuracy of 93.95% and F1-score of 0.94, with better stability than the model without augmentation. The model without augmentation showed fluctuations in performance, indicating overfitting in some configurations. These findings confirm the importance of applying augmentation techniques and hyperparameter tuning in improving the accuracy and stability of sentiment analysis models, and contribute to the development of NLP models for Indonesian and other resource-constrained languages.
Evaluation of Tree-Based Models for Predicting Social Assistance Recipient Status Based on National Socio-Economic Survey (SUSENAS) 2024 Hiola, Yani Prihantini; Zulhijrah; Putra, I Gusti Ngurah Sentana; Limba, Syella Zignora; Sartono, Bagus; Firdawanti, Aulia Rizki; Susetyo, Budi; Dito, Gerry Alfa
Journal of Mathematics, Computations and Statistics Vol. 9 No. 1 (2026): Volume 09 Issue 01 (March 2026)
Publisher : Jurusan Matematika FMIPA UNM

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35580/xyyv0f37

Abstract

Abstract. Poverty is a major socioeconomic challenge in Indonesia that affects the effectiveness of social protection programs. In response to this challenge, the government has created social assistance programs to improve the welfare of the people. However, the distribution of social assistance is often considered to be inaccurate, resulting in households that are deemed eligible for social assistance not being identified as recipients. One solution to improve the accuracy of distribution is the application of machine learning in the context of classification. Several tree-based models, such as LightGBM, Random Forest, and XGBoost, were selected because of their superior capabilities compared to classical models such as logistic regression, especially in handling complex data and fulfilling model assumptions. This study compares the performance of these three models in predicting social assistance recipient status using data from the 2024 West Java Provincial National Socioeconomic Survey (SUSENAS). Model evaluation was conducted on several data pre-processing scenarios involving outlier handling, class balancing, and feature engineering. The results show that LightGBM consistently outperforms the other models on six metrics, namely Accuracy, Balanced Accuracy, F1-Score, ROC-AUC, PR-AUC, and Brier Score, out of a total of eight evaluation metrics used. SHAP analysis identifies Social Assistance History and Asset Score as the most influential features for model prediction. Friedman and Nemenyi nonparametric tests confirmed significant performance differences between LightGBM and other models based on the F1-Score, PR-AUC, and Brier Score metrics. These findings indicate that tree-based models, particularly LightGBM, can support the development of a more targeted and data-driven social assistance targeting system. Keywords: Social Assistance; Tree-Based; SHAP; SUSENAS; Hybrid Bayesian Optimization