Claim Missing Document
Check
Articles

Found 4 Documents
Search
Journal : Building of Informatics, Technology and Science

Klasifikasi Kelayakan Air Minum Mengkombinasikan Algoritma Random Forest dengan Teknik Optimasi Bayesian Darmawan, Aditya Aqil; D, Ishak Bintang; Astuti, Yani Parti; Winarno, Agus
Building of Informatics, Technology and Science (BITS) Vol 6 No 4 (2025): March 2025
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i4.7055

Abstract

The quality of clean and safe drinking water is crucial for public health; however, environmental pollution from industrial waste, domestic waste, and urbanization has significantly deteriorated water quality. Manual methods for water quality analysis, such as the Water Quality Index (WQI) and STORET, have limitations in efficiency and accuracy. Therefore, this study proposes a machine learning-based classification system to determine the potability of drinking water more accurately and efficiently. The Water Potability dataset from Kaggle, consisting of 3,276 samples with nine key parameters, was used in this research. Initial analysis showed that most features had a nearly normal distribution, although some variables, such as Solids and Conductivity, exhibited right-skewness due to extreme values. Correlation analysis revealed no significant linear relationships between water quality parameters. The preprocessing stage included missing data imputation using the mean method, normalization, feature engineering, and oversampling with SMOTE to address class imbalance. The machine learning models used in this study include LightGBM, Random Forest, XGBoost, and CatBoost, with model optimization performed using Bayesian Search CV, which improved performance, particularly for Random Forest. Experimental results showed that the optimized Random Forest model achieved the best performance with an accuracy of 85.38%, precision of 85.86%, recall of 85.38%, and an F1-score of 85.37%. However, some misclassifications remained, especially in detecting potable water samples, indicating that ensemble learning methods can be effectively used to evaluate drinking water potability.
A Comparative Analysis of LSTM and GRU Models for AQI Forecasting in Tourist Destinations Ardianto, Luluk; Astuti, Yani Parti
Building of Informatics, Technology and Science (BITS) Vol 7 No 1 (2025): June (2025)
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i1.6633

Abstract

The Air Quality Index (AQI) is a critical metric for assessing air quality and its impact on human health, particularly in densely populated and tourist-heavy areas such as Malioboro, Yogyakarta. As one of Indonesia's most popular tourist destinations, the region experiences significant air quality fluctuations influenced by human activities, including transportation and tourism. This study evaluates the performance of two advanced deep learning models, Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), in forecasting AQI and key pollutant parameters, PM10 and PM2.5, using two years of air quality data collected between January 2022 and December 2023. The results demonstrate that the LSTM model consistently outperforms GRU in predicting AQI (MSE: 163.757, RMSE: 12.797, MAE: 7.432, MAPE: 0.133) and PM2.5 (MSE: 32.001, RMSE: 5.657, MAE: 3.005, MAPE: 0.139), indicating its capability to model complex temporal patterns effectively. Conversely, the GRU model achieves better accuracy for PM10 predictions (MSE: 58.592, RMSE: 7.655, MAE: 4.168, MAPE: 0.180), showcasing its computational efficiency with competitive performance. These findings underscore the suitability of LSTM for applications prioritizing accuracy, while GRU provides a viable option for scenarios requiring faster computations. This research highlights the potential of leveraging deep learning models to tackle air quality challenges in urban and tourist areas, paving the way for informed decision-making and sustainable development initiatives
Perbandingan Kinerja Metode Naïve Bayes dan Random Forest untuk Klasifikasi Penyakit Diabetes Berdasarkan Data Medis Pradana, Rendy Risqi; Astuti, Yani Parti
Building of Informatics, Technology and Science (BITS) Vol 7 No 1 (2025): June (2025)
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i1.7446

Abstract

Diabetes mellitus merupakan penyakit tidak menular yang prevalensinya terus meningkat di Indonesia. Proses diagnosis secara konvensional sering menghadapi berbagai tantangan, seperti keterlambatan dan biaya yang tinggi. Penelitian ini bertujuan untuk membandingkan kinerja algoritma Naive Bayes dan Random Forest dalam klasifikasi diabetes dengan menggunakan dataset Pima Indians Diabetes. Untuk mengatasi ketidakseimbangan kelas, dataset diproses menggunakan teknik Synthetic Minority Over-sampling Technique (SMOTE). Evaluasi kinerja dilakukan menggunakan metrik akurasi, presisi, recall, dan F1-score. Hasil penelitian menunjukkan bahwa algoritma Random Forest memperoleh akurasi sebesar 79,5%, presisi 79,6%, recall 79,5%, dan F1-score 79,5%. Sementara itu, algoritma Naive Bayes memperoleh akurasi 76,5%, presisi 76,5%, recall 76,5%, dan F1-score 76,5%. Temuan ini menunjukkan bahwa Random Forest unggul dalam menangani data yang kompleks dengan akurasi prediksi yang lebih tinggi, sedangkan Naive Bayes tetap efektif untuk implementasi yang lebih sederhana karena efisiensi komputasinya. Studi ini berkontribusi dalam pengembangan sistem pendukung keputusan cerdas untuk deteksi dini diabetes yang lebih cepat dan akurat, sehingga dapat membantu mengurangi beban pada sistem layanan kesehatan.
Perbandingan Kinerja Algoritma CatBoost, XGBoost, LightGBM dan Random Forest Dalam Memprediksi Risiko Infeksi Aids Dalam Dataset Kesehatan Yulianto, Pramudya Ridwan; Astuti, Yani Parti
Building of Informatics, Technology and Science (BITS) Vol 7 No 4 (2026): March 2026
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i4.8975

Abstract

This study investigates the prediction of AIDS infection risk using tree-based algorithms CatBoost, XGBoost, LightGBM, and Random Forest applied to a medical and demographic dataset consisting of 2,139 observations and 23 variables. The research process includes data exploration, cleaning, handling extreme values using the interquartile range (IQR) method, normalization with RobustScaler, and class balancing using the Synthetic Minority Over-sampling Technique (SMOTE). Due to the imbalanced nature of the dataset, model evaluation emphasizes not only accuracy but also Recall, F1-Score, and AUC-ROC to better assess infected class detection. Prior to SMOTE implementation, all models achieved high accuracy but relatively low recall for the positive class; after resampling, CatBoost demonstrated the most significant improvement, with recall increasing from 63% to 77% and F1-Score from 72% to 79%, achieving an overall accuracy of 90%. In comparison, XGBoost reached an accuracy of 88.63% with a more moderate recall improvement, while LightGBM and Random Forest showed consistent yet smaller gains, indicating that the combination of SMOTE and CatBoost is more effective in minimizing False Negatives in AIDS infection cases. The main contribution of this study lies in the integration of robust outlier handling, feature normalization, and class balancing within a structured experimental framework, with a specific emphasis on sensitivity optimization to enhance early detection reliability in clinical screening contexts.