Claim Missing Document
Check
Articles

Predicting Diabetes with Machine Learning: Evaluating Tree-Based and Ensemble Models with Custom Metrics and Statistical Validation Airlangga, Gregorius
Building of Informatics, Technology and Science (BITS) Vol 6 No 3 (2024): December 2024
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i3.6419

Abstract

This study investigates the predictive performance of machine learning models in diagnosing diabetes using the Pima Indians Diabetes Dataset. Seven models, including Logistic Regression, Random Forest, Gradient Boosting, XGBoost, LightGBM, Stacking Classifier, and Voting Classifier, were evaluated. A 10-fold cross-validation strategy was employed to ensure robust and reliable performance assessment. The evaluation incorporated standard metrics such as accuracy, precision, recall, F1 score, and ROC AUC, as well as a custom metric designed to prioritize recall while maintaining precision, addressing the clinical importance of minimizing false negatives. LightGBM and Random Forest emerged as the top-performing individual models, achieving competitive scores across metrics. Ensemble methods, particularly the Stacking Classifier, demonstrated robustness by leveraging the complementary strengths of base models. Statistical validation using the Friedman test confirmed significant differences in model rankings, with a test statistic of 22.77 and a p-value of 0.00088. However, pairwise comparisons using the Wilcoxon signed-rank test revealed that the differences between top models, such as LightGBM and Random Forest, were not statistically significant. These results emphasize the effectiveness of tree-based and ensemble models in addressing clinical diagnostic challenges. The study highlights the importance of using a custom metric to align model evaluation with clinical priorities. Future work should explore hybrid modeling approaches and larger datasets to further enhance predictive accuracy and generalizability in real-world healthcare applications.
A Comparative Analysis of Diabetes Prediction through Deep Learning Architectures Airlangga, Gregorius
Building of Informatics, Technology and Science (BITS) Vol 6 No 3 (2024): December 2024
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i3.6446

Abstract

Diabetes prediction plays a vital role in healthcare, enabling early diagnosis and timely interventions to mitigate the risks associated with the disease. This study investigates the application of advanced machine learning architectures to predict diabetes using the Pima Indians Diabetes Dataset, a widely used benchmark for medical diagnostics. Five models: Deep Neural Network (DNN), Convolutional Neural Network (CNN) with Attention, LSTM with Residual Connections, Bidirectional LSTM (BiLSTM) with Attention, and GRU with Dense Layers were developed and evaluated on multiple performance metrics, including accuracy, precision, recall, F1 score, and ROC AUC. A stratified five-fold cross-validation strategy was employed to ensure robustness, while SHAP analysis was conducted to enhance interpretability. Among the models, the GRU with Dense Layers achieved superior performance, recording the highest accuracy (76.17%), F1 score (69.85%), and ROC AUC (83.52%). SHAP analysis revealed Glucose as the most influential feature, with significant interactions identified between Glucose and Pregnancies, aligning with established medical insights. Statistical analysis confirmed the reliability of the results, with all metrics demonstrating statistically significant improvements over a baseline of random chance (p < 0.05). These findings underscore the efficacy of GRU-based models in capturing complex patterns in medical data while maintaining computational efficiency. Future work will explore hybrid architectures and larger datasets to enhance generalizability and real-world applicability, contributing to more effective decision-making in healthcare.
UAV Logistics Pattern Language for Rural Areas Rahmananta, Radyan; Airlangga, Gregorius; Sukwadi, Ronald; Basuki, Widodo Widjaja; Sugianto, Lai Ferry; Nugroho, Oskar Ika Adi; Kristian, Yoel
International Journal of Robotics and Control Systems Vol 5, No 1 (2025)
Publisher : Association for Scientific Computing Electronics and Engineering (ASCEE)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31763/ijrcs.v5i1.1554

Abstract

The logistical challenges in rural areas, which often face limited infrastructure, varied terrains, and dispersed populations, often lead to inefficient and costly delivery systems. Recent developments in Unmanned Aerial Vehicle (UAV) technology offer a theoretical framework for overcoming these challenges. This research proposes a comprehensive pattern language specifically designed for multi-UAV logistics operations in rural settings. The proposed system integrates critical components such as LiDAR-based map generation, altitude information storage, partial goal estimation, and collision avoidance into a unified framework. Unlike existing research that typically focuses on isolated aspects like route optimization or payload management, this system features an advanced path planning algorithm capable of real-time environmental assessment and direction-aware navigation. Focus group discussions with logistics experts from Talaud Island, North Sulawesi, Indonesia informed the design and refinement of the proposed patterns, ensuring that they address the practical needs of rural logistics. Our analysis suggests that this system offers a theoretical foundation for significantly improving the efficiency, reliability, and sustainability of delivering essential goods and services to rural areas, thereby supporting equitable development and improving the quality of life in these communities. While no empirical data is presented, the framework serves as a scalable foundation for future implementations of UAV-based rural logistics systems.
Stress Detection Using Hybrid Deep Learning Models with Attention Mechanisms: A Comparative Study of CNN-LSTM, CNN-GRU, and Ensemble Approaches Airlangga, Gregorius
Journal of Computer System and Informatics (JoSYC) Vol 6 No 1 (2024): November 2024
Publisher : Forum Kerjasama Pendidikan Tinggi (FKPT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/josyc.v6i1.6284

Abstract

Accurate and reliable stress detection remains a critical challenge in health monitoring due to the multifaceted nature of stress and the difficulty in capturing its temporal and spatial characteristics from physiological data. Existing methods often lack the ability to effectively model these dependencies, leading to suboptimal performance and limited interpretability, which hinder their application in real-world scenarios such as wearable devices and mobile health systems. This study addresses these limitations by investigating hybrid deep learning models with attention mechanisms, specifically focusing on CNN-LSTM, CNN-GRU, and CNN-BiLSTM architectures and their ensemble. Leveraging the complementary strengths of convolutional and recurrent layers, these models aim to capture both spatial and temporal dependencies in stress-related data, while attention layers enhance interpretability by prioritizing relevant features. Experimental results reveal that the CNN-LSTM with Attention model achieved the best performance, with the lowest Mean Squared Error (MSE) and Mean Absolute Error (MAE), demonstrating its suitability for complex stress prediction tasks. The CNN-GRU model also performed well, offering a balance between computational efficiency and accuracy, while the CNN-BiLSTM model showed limitations, suggesting that additional model complexity may lead to overfitting. The ensemble model, combining predictions from all three architectures, delivered stable performance across metrics, underscoring the value of ensemble approaches in improving robustness and mitigating model-specific biases. These findings have significant implications for practical applications, such as wearable devices and mobile health systems, where accurate, interpretable, and reliable stress monitoring is essential for timely interventions. Future work should focus on optimizing these models for real-time deployment, exploring adaptive learning for personalized stress detection, and validating across diverse datasets to enhance generalizability. This research highlights the importance of hybrid architectures and attention mechanisms in addressing the challenges of stress detection, paving the way for responsive and user-centered health monitoring systems.
A Hybrid Machine Learning Framework for Enhanced Tsunami Prediction Using Ensemble Models and Neural Networks Airlangga, Gregorius
Journal of Computer System and Informatics (JoSYC) Vol 6 No 1 (2024): November 2024
Publisher : Forum Kerjasama Pendidikan Tinggi (FKPT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/josyc.v6i1.6291

Abstract

Tsunami prediction is a critical task for mitigating risks associated with natural disasters, yet achieving accurate and reliable predictions remains a significant challenge due to the inherent complexity and uncertainty in earthquake-related data. Traditional predictive models often struggle to capture the intricate relationships between earthquake features, such as magnitude, latitude, longitude, depth, and instrumental intensities, leading to suboptimal performance and unreliable predictions. To address these challenges, this research proposes a hybrid machine learning framework that integrates ensemble models and neural networks to enhance both accuracy and robustness in tsunami prediction. The dataset undergoes rigorous preprocessing, including the removal of missing values, normalization, and shuffling, to improve data quality. The framework employs a diverse set of ensemble models such as Random Forest, Gradient Boosting, XGBoost, LightGBM, and CatBoost alongside a neural network with three hidden layers for predictive modeling. Predictions from these models are aggregated into meta-features and passed to a logistic regression meta-classifier for final decision-making. Using ten-fold stratified cross-validation, the framework is evaluated on key metrics, including precision, recall, F1-Score, accuracy, and ROC-AUC. Results demonstrate that the hybrid model significantly outperforms individual models, effectively addressing the challenges of low accuracy and instability in traditional approaches. By leveraging the complementary strengths of ensemble models and neural networks, the proposed framework offers a scalable and adaptable solution for tsunami prediction, contributing to enhanced disaster preparedness and risk mitigation strategies.
Evaluation of Ensemble and Hybrid Models for Predicting Household Energy Consumption: A Comparative Study of Machine Learning Approaches Airlangga, Gregorius
Indonesian Journal of Artificial Intelligence and Data Mining Vol 8, No 1 (2025): March 2025
Publisher : Universitas Islam Negeri Sultan Syarif Kasim Riau

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24014/ijaidm.v8i1.32819

Abstract

Accurately predicting household energy consumption is critical for efficient energy management, particularly as global energy demands rise. This study explores the predictive performance of various machine learning models, including linear regression, Ridge regression, Lasso regression, Random Forest, Gradient Boosting, XGBoost, CatBoost, and a hybrid model combining Long Short-Term Memory (LSTM) networks with Random Forest regression. The models were evaluated on a dataset consisting of minute-level energy readings over a 350-day period. Key performance metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and the coefficient of determination (?2) were used to assess model accuracy. The results demonstrate that ensemble models, particularly Random Forest and CatBoost, outperformed traditional regression models in terms of error minimization. CatBoost achieved the lowest MSE among all models, highlighting its effectiveness in handling non-linearities and categorical data. However, none of the models achieved a positive (?2) score, indicating their limitations in fully explaining the variance within the dataset. The hybrid LSTM + Random Forest model, despite its expected strength in capturing temporal dependencies, performed worse than simpler models, suggesting issues with feature extraction and model integration.These findings suggest that while ensemble methods are well-suited for energy consumption prediction, more advanced modeling techniques or enhanced feature engineering are needed to improve performance. Future research could explore deeper neural networks or time-series models such as ARIMA to better capture the temporal patterns in household energy consumption.
A Hybrid CNN-RNN Model for Enhanced Anemia Diagnosis: A Comparative Study of Machine Learning and Deep Learning Techniques Airlangga, Gregorius
Indonesian Journal of Artificial Intelligence and Data Mining Vol 7, No 2 (2024): September 2024
Publisher : Universitas Islam Negeri Sultan Syarif Kasim Riau

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24014/ijaidm.v7i2.29898

Abstract

This study proposes a hybrid Convolutional Neural Network-Recurrent Neural Network (CNN-RNN) model for the accurate diagnosis of anemia types, leveraging the strengths of both architectures in capturing spatial and temporal patterns in Complete Blood Count (CBC) data. The research involves the development and evaluation of various models of single-architecture deep learning (DL) models, specifically Multi-Layer Perceptron (MLP), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Fully Convolutional Network (FCN). The models are trained and validated using stratified k-fold cross-validation to ensure robust performance. Key metrics such as test accuracy are utilized to provide a comprehensive assessment of each model's performance. The hybrid CNN-RNN model achieved the highest test accuracy of 90.27%, surpassing the CNN (89.88%), FCN (85.60%), MLP (79.77%), and RNN (73.54%) models. The hybrid model also demonstrated superior performance in cross-validation, with an accuracy of 87.31% ± 1.77%. Comparative analysis highlights the hybrid model's advantages over single-architecture DL models, particularly in handling imbalanced data and providing reliable classifications across all anemia types. The results underscore the potential of advanced DL architectures in medical diagnostics and suggest pathways for further refinements, such as incorporating attention mechanisms or additional feature engineering, to enhance model performance. This study contributes to the growing body of knowledge on AI-driven medical diagnostics and presents a viable tool for clinical decision support in anemia diagnosis
Decoding Energy Usage Predictions: An Application of XAI Techniques for Enhanced Model Interpretability Airlangga, Gregorius
Indonesian Journal of Artificial Intelligence and Data Mining Vol 7, No 2 (2024): September 2024
Publisher : Universitas Islam Negeri Sultan Syarif Kasim Riau

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24014/ijaidm.v7i2.29041

Abstract

The growing complexity of machine learning models has heightened the need for interpretability, particularly in applications impacting resource management and sustainability. This study addresses the challenge of interpreting predictions from sophisticated machine learning models used for building energy consumption predictions. By leveraging Explainable AI (XAI) techniques, including Permutation Importance, SHapley Additive exPlanations (SHAP), and Local Interpretable Model-Agnostic Explanations (LIME), we have dissected the predictive features influencing building energy usage. Our research delves into a dataset consisting of various building characteristics and weather conditions, applying an XGBoost model to predict Site Energy Usage Intensity (Site EUI). The Permutation Importance method elucidated the global significance of features across the dataset, while SHAP provided a dual perspective, revealing both the global importance and local impact of features on individual predictions. Complementing these, LIME offered rapid, locally focused interpretations, showcasing its utility for instances where immediate insights are essential. The findings indicate that 'Energy Star Rating', 'Facility Type', and 'Floor Area' are among the top predictors of energy consumption, with environmental factors also contributing to the models' decisions. The application of XAI techniques yielded a nuanced understanding of the model's behavior, enhancing transparency and fostering trust in the predictions. This study contributes to the field of sustainable energy management by demonstrating the application of XAI for insightful model interpretation, reinforcing the significance of interpretable AI in the development of energy policies and efficiency strategies. Our approach exemplifies the balance between predictive accuracy and the necessity for model transparency, advocating for the continued integration of XAI in AI-driven decision-making processes.
Performance Evaluation of Machine Learning Models for Predicting Household Energy Consumption: A Comparative Study Airlangga, Gregorius
Indonesian Journal of Artificial Intelligence and Data Mining Vol 8, No 1 (2025): March 2025
Publisher : Universitas Islam Negeri Sultan Syarif Kasim Riau

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24014/ijaidm.v8i1.32791

Abstract

Accurate prediction of household energy consumption is critical for improving energy efficiency and optimizing resource allocation in smart grids. This study evaluates the performance of several machine learning regression models, including Linear Regression, Ridge Regression, Lasso Regression, Random Forest, Gradient Boosting, XGBoost, CatBoost, and LightGBM, for predicting daily household energy consumption. The models were trained and tested on time series data, and their performance was measured using four key metrics: Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R². Results show that non-linear models, especially ensemble-based methods such as Random Forest and CatBoost, outperformed traditional linear regression models. Random Forest achieved the lowest MAE (0.1682) and competitive RMSE (0.2450), making it the best overall model. CatBoost, with its advanced gradient boosting algorithm, also demonstrated superior predictive accuracy, achieving an RMSE of 0.2421 and an MAE of 0.1830. In contrast, linear models struggled to capture the complex patterns in the data, with Linear Regression showing the worst performance. The negative R² scores across all models indicate challenges in explaining the variance in the dataset, which may be attributed to external factors or noise not captured by the models. This study highlights the importance of choosing appropriate machine learning models for time series forecasting and recommends further exploration of deep learning models and external features to improve prediction accuracy.
Leveraging Machine Learning for Accurate Anemia Diagnosis Using Complete Blood Count Data Airlangga, Gregorius
Indonesian Journal of Artificial Intelligence and Data Mining Vol 7, No 2 (2024): September 2024
Publisher : Universitas Islam Negeri Sultan Syarif Kasim Riau

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24014/ijaidm.v7i2.29869

Abstract

Anemia, a prevalent hematologic disorder, necessitates accurate and timely diagnosis for effective management and treatment. This study explores the application of various machine learning models to classify anemia types using complete blood count (CBC) data. We evaluated multiple models, including DecisionTreeClassifier, ExtraTreeClassifier, RandomForestClassifier, ExtraTreesClassifier, XGBoost, LightGBM, and CatBoost, to identify the most effective approach for anemia diagnosis. The dataset comprised CBC data labeled with anemia diagnoses, sourced from multiple medical facilities. Rigorous data preprocessing was performed, followed by feature selection using methods such as Variance Inflation Factor (VIF), Predictive Power Score (PPS), and feature importance from ensemble models. The models were trained and evaluated using 5-fold cross-validation, with hyperparameter tuning conducted via GridSearchCV. Results demonstrated that the DecisionTreeClassifier achieved the highest balanced accuracy score of 94.17%, outperforming more complex ensemble methods. Confusion matrices validated its robust performance, highlighting its precision and recall. The study underscores the potential of simple decision tree models in medical diagnosis tasks, particularly when datasets are well-preprocessed. These findings have significant implications for clinical practice, suggesting that machine learning can enhance diagnostic accuracy and efficiency. Future work will explore advanced techniques to further improve performance and integration into clinical workflows.