Claim Missing Document
Check
Articles

Found 4 Documents
Search
Journal : Building of Informatics, Technology and Science

Predicting Diabetes with Machine Learning: Evaluating Tree-Based and Ensemble Models with Custom Metrics and Statistical Validation Airlangga, Gregorius
Building of Informatics, Technology and Science (BITS) Vol 6 No 3 (2024): December 2024
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i3.6419

Abstract

This study investigates the predictive performance of machine learning models in diagnosing diabetes using the Pima Indians Diabetes Dataset. Seven models, including Logistic Regression, Random Forest, Gradient Boosting, XGBoost, LightGBM, Stacking Classifier, and Voting Classifier, were evaluated. A 10-fold cross-validation strategy was employed to ensure robust and reliable performance assessment. The evaluation incorporated standard metrics such as accuracy, precision, recall, F1 score, and ROC AUC, as well as a custom metric designed to prioritize recall while maintaining precision, addressing the clinical importance of minimizing false negatives. LightGBM and Random Forest emerged as the top-performing individual models, achieving competitive scores across metrics. Ensemble methods, particularly the Stacking Classifier, demonstrated robustness by leveraging the complementary strengths of base models. Statistical validation using the Friedman test confirmed significant differences in model rankings, with a test statistic of 22.77 and a p-value of 0.00088. However, pairwise comparisons using the Wilcoxon signed-rank test revealed that the differences between top models, such as LightGBM and Random Forest, were not statistically significant. These results emphasize the effectiveness of tree-based and ensemble models in addressing clinical diagnostic challenges. The study highlights the importance of using a custom metric to align model evaluation with clinical priorities. Future work should explore hybrid modeling approaches and larger datasets to further enhance predictive accuracy and generalizability in real-world healthcare applications.
A Comparative Analysis of Diabetes Prediction through Deep Learning Architectures Airlangga, Gregorius
Building of Informatics, Technology and Science (BITS) Vol 6 No 3 (2024): December 2024
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i3.6446

Abstract

Diabetes prediction plays a vital role in healthcare, enabling early diagnosis and timely interventions to mitigate the risks associated with the disease. This study investigates the application of advanced machine learning architectures to predict diabetes using the Pima Indians Diabetes Dataset, a widely used benchmark for medical diagnostics. Five models: Deep Neural Network (DNN), Convolutional Neural Network (CNN) with Attention, LSTM with Residual Connections, Bidirectional LSTM (BiLSTM) with Attention, and GRU with Dense Layers were developed and evaluated on multiple performance metrics, including accuracy, precision, recall, F1 score, and ROC AUC. A stratified five-fold cross-validation strategy was employed to ensure robustness, while SHAP analysis was conducted to enhance interpretability. Among the models, the GRU with Dense Layers achieved superior performance, recording the highest accuracy (76.17%), F1 score (69.85%), and ROC AUC (83.52%). SHAP analysis revealed Glucose as the most influential feature, with significant interactions identified between Glucose and Pregnancies, aligning with established medical insights. Statistical analysis confirmed the reliability of the results, with all metrics demonstrating statistically significant improvements over a baseline of random chance (p < 0.05). These findings underscore the efficacy of GRU-based models in capturing complex patterns in medical data while maintaining computational efficiency. Future work will explore hybrid architectures and larger datasets to enhance generalizability and real-world applicability, contributing to more effective decision-making in healthcare.
Comparative Study of Machine Learning Models for Temperature Prediction: Analyzing Accuracy, Stability, and Generalization Airlangga, Gregorius
Building of Informatics, Technology and Science (BITS) Vol 6 No 4 (2025): March 2025
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i4.7114

Abstract

Accurate temperature prediction is crucial for climate monitoring, energy management, and disaster preparedness. This study provides a comparative analysis of various machine learning models, including Random Forest, Gradient Boosting, Histogram-Based Gradient Boosting, XGBoost, Support Vector Regression (SVR), Ridge Regression, and Lasso Regression, to evaluate their predictive accuracy, stability, and generalization capability. The models are assessed using five-fold cross-validation, with the R² metric as the primary evaluation criterion. The results indicate that Random Forest achieves the highest accuracy, with an R² mean of 0.999994, demonstrating its strong ability to model temperature variations. Ridge Regression unexpectedly performs at a similar level, suggesting that the dataset contains strong linear dependencies. Gradient Boosting, Histogram-Based Gradient Boosting, and XGBoost also achieve high accuracy, confirming their effectiveness in capturing complex relationships between meteorological parameters. SVR, while effective, exhibits higher variance, indicating that it may require further tuning for improved consistency. Lasso Regression, with an R² mean of 0.9783, shows the lowest accuracy, confirming that linear models are less suitable for complex meteorological predictions. These findings highlight the superiority of ensemble-based methods in temperature forecasting, reinforcing their stability and adaptability. Future research should explore hybrid models that integrate ensemble techniques with feature engineering optimizations to further enhance predictive performance. This study contributes to the ongoing development of machine learning applications in meteorology, offering insights into model selection for climate-related forecasting tasks.
Hybrid Machine Learning Approaches for Atmospheric CO₂ Prediction: Evaluating Regression and Ensemble Models with Advanced Feature Engineering Airlangga, Gregorius
Building of Informatics, Technology and Science (BITS) Vol 6 No 4 (2025): March 2025
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i4.7121

Abstract

The accurate prediction of atmospheric CO₂ concentrations is essential for understanding climate change dynamics and developing effective environmental policies. This study evaluates the predictive capabilities of various machine learning models, including ensemble-based regressors such as Random Forest, Gradient Boosting, and XGBoost, alongside traditional regression models such as Support Vector Regression (SVR), Ridge, and Lasso regression. The dataset, derived from meteorological observations, was preprocessed using multiple feature scaling techniques, including StandardScaler, MinMaxScaler, and RobustScaler, followed by feature engineering techniques such as polynomial transformation and Principal Component Analysis (PCA) to enhance predictive accuracy. Model performance was assessed using the coefficient of determination (R²) and cross-validation techniques. The results indicate that tree-based models, including Random Forest and XGBoost, struggled to generalize well, exhibiting negative R² values due to overfitting and an inability to capture the temporal dependencies in CO₂ variations. SVR emerged as the best-performing model, though its predictive power remained limited. Computational complexity analysis revealed that tree-based methods incurred high processing costs, while linear models such as Ridge and Lasso demonstrated lower complexity but failed to capture non-linear dependencies. The study highlights the challenges of CO₂ prediction using conventional machine learning techniques and underscores the need for advanced deep learning approaches, such as hybrid Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) models, to better capture spatial and temporal dependencies. Future research should explore integrating external environmental factors and leveraging deep learning architectures to improve predictive performance.