Claim Missing Document
Check
Articles

Found 12 Documents
Search
Journal : Infolitika Journal of Data Science

Machine Learning Approach for Diabetes Detection Using Fine-Tuned XGBoost Algorithm Maulana, Aga; Faisal, Farassa Rani; Noviandy, Teuku Rizky; Rizkia, Tatsa; Idroes, Ghazi Mauer; Tallei, Trina Ekawati; El-Shazly, Mohamed; Idroes, Rinaldi
Infolitika Journal of Data Science Vol. 1 No. 1 (2023): September 2023
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v1i1.72

Abstract

Diabetes is a chronic condition characterized by elevated blood glucose levels which leads to organ dysfunction and an increased risk of premature death. The global prevalence of diabetes has been rising, necessitating an accurate and timely diagnosis to achieve the most effective management. Recent advancements in the field of machine learning have opened new possibilities for improving diabetes detection and management. In this study, we propose a fine-tuned XGBoost model for diabetes detection. We use the Pima Indian Diabetes dataset and employ a random search for hyperparameter tuning. The fine-tuned XGBoost model is compared with six other popular machine learning models and achieves the highest performance in accuracy, precision, sensitivity, and F1-score. This study demonstrates the potential of the fine-tuned XGBoost model as a robust and efficient tool for diabetes detection. The insights of this study advance medical diagnostics for efficient and personalized management of diabetes.
ANFIS-Based QSRR Modelling for Kovats Retention Index Prediction in Gas Chromatography Idroes, Rinaldi; Noviandy, Teuku Rizky; Maulana, Aga; Suhendra, Rivansyah; Sasmita, Novi Reandy; Muslem, Muslem; Idroes, Ghazi Mauer; Jannah, Raudhatul; Afidh, Razief Perucha Fauzie; Irvanizam, Irvanizam
Infolitika Journal of Data Science Vol. 1 No. 1 (2023): September 2023
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v1i1.73

Abstract

This study aims to evaluate the implementation and effectiveness of the Adaptive Neuro-Fuzzy Inference System (ANFIS) based Quantitative Structure Retention Relationship (QSRR) to predict the Kovats retention index of compounds in gas chromatography. The model was trained using 340 essential oil compounds and their molecular descriptors. The evaluation of the ANFIS models revealed promising results, achieving an R2 of 0.974, an RMSE of 48.12, and an MAPE of 3.3% on the testing set. These findings highlight the ANFIS approach as remarkably accurate in its predictive capacity for determining the Kovats retention index in the context of gas chromatography. This study provides valuable perspectives on the efficiency of retention index prediction through ANFIS-based QSRR methods and the potential practicality in compound analysis and chromatographic optimization.
Ensemble Machine Learning Approach for Quantitative Structure Activity Relationship Based Drug Discovery: A Review Noviandy, Teuku Rizky; Maulana, Aga; Idroes, Ghazi Mauer; Emran, Talha Bin; Tallei, Trina Ekawati; Helwani, Zuchra; Idroes, Rinaldi
Infolitika Journal of Data Science Vol. 1 No. 1 (2023): September 2023
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v1i1.91

Abstract

This comprehensive review explores the pivotal role of ensemble machine learning techniques in Quantitative Structure-Activity Relationship (QSAR) modeling for drug discovery. It emphasizes the significance of accurate QSAR models in streamlining candidate compound selection and highlights how ensemble methods, including AdaBoost, Gradient Boosting, Random Forest, Extra Trees, XGBoost, LightGBM, and CatBoost, effectively address challenges such as overfitting and noisy data. The review presents recent applications of ensemble learning in both classification and regression tasks within QSAR, showcasing the exceptional predictive accuracy of these techniques across diverse datasets and target properties. It also discusses the key challenges and considerations in ensemble QSAR modeling, including data quality, model selection, computational resources, and overfitting. The review outlines future directions in ensemble QSAR modeling, including the integration of multi-modal data, explainability, handling imbalanced data, automation, and personalized medicine applications while emphasizing the need for ethical and regulatory guidelines in this evolving field.
Maternal Health Risk Detection Using Light Gradient Boosting Machine Approach Noviandy, Teuku Rizky; Nainggolan, Sarah Ika; Raihan, Raihan; Firmansyah, Isra; Idroes, Rinaldi
Infolitika Journal of Data Science Vol. 1 No. 2 (2023): December 2023
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v1i2.123

Abstract

Maternal health risk detection is crucial for reducing morbidity and mortality among pregnant women. In this study, we employed the Light Gradient Boosting Machine (LightGBM) model to identify risk levels using data from rural healthcare facilities. The dataset included key health indicators aligned with the United Nations Sustainable Development Goals. The LightGBM model underwent rigorous optimization through hyperparameter tuning and 10-fold cross-validation. Its predictive performance was benchmarked against other algorithms using accuracy, precision, recall, and F1-score, with feature importance assessed to identify critical risk predictors. The LightGBM model demonstrating the highest performance across all metrics. The results underscore the value of advanced machine learning techniques in public health. Future research directions include expanding the demographic scope, incorporating temporal data, and enhancing model transparency. This study highlights the transformative potential of machine learning in maternal healthcare, providing a foundation for improved risk detection and proactive healthcare interventions.
A Statistical Clustering Approach: Mapping Population Indicators Through Probabilistic Analysis in Aceh Province, Indonesia Sasmita, Novi Reandy; Khairul, Moh; Sofyan, Hizir; Kruba, Rumaisa; Mardalena, Selvi; Dahlawy, Arriz; Apriliansyah, Feby; Muliadi, Muliadi; Saputra, Dimas Chaerul Ekty; Noviandy, Teuku Rizky; Watsiq Maula, Ahmad
Infolitika Journal of Data Science Vol. 1 No. 2 (2023): December 2023
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v1i2.130

Abstract

The clustering, one of statistical analysis, can be used for understanding population patterns and as a basis for more targeted policy making. In this ecological study, we explored the population dynamics across 23 districts/cities in Aceh Province. The study used the Aceh Population Development Profile Year 2022 data, focusing on the total population, in-migrants, out-migrants, fertility, and maternal mortality as variables. The study employed descriptive statistics to ascertain the data distribution, followed by the Shapiro-Wilk test to evaluate normality, which is crucial for selecting the appropriate statistical methods. The Spearman test was used to determine correlations between the total population and the variable as indicators. Probabilistic Fuzzy C-Means (PFCM) method is used for clustering. To optimize clustering, the silhouette coefficient was calculated using the Euclidean Distance and the elbow method, with the results analyzed using R-4.3.2 software. This study's design and methods aim to provide a nuanced understanding of demographic patterns for targeted policy-making and regional development in Aceh, Indonesia. Based on the data normality test results, only fertility (p-value = 0.45), while the other variables are not normally distributed. Spearman test was used, and the results showed that only in-migrants (p-value = 1.78 x 10-6) and out-migrants (p-value = 2.30 x 10-6) correlated to the Aceh Province population. Using the population variable and the two variables associated with it, it was found that 4 is the best optimum number of clusters, where clusters 1, 2, 3, and 4 consist of three districts/city, nine districts/city, four districts/city and seven districts/city respectively.
Decision Tree versus k-NN: A Performance Comparison for Air Quality Classification in Indonesia Sasmita, Novi Reandy; Ramadeska, Siti; Kesuma, Zurnila Marli; Noviandy, Teuku Rizky; Maulana, Aga; Khairul, Mhd; Suhendra, Rivansyah
Infolitika Journal of Data Science Vol. 2 No. 1 (2024): May 2024
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v2i1.179

Abstract

Air quality can affect human health, the environment, and the sustainability of ecosystems, so efforts are needed to monitor and control air quality. The Plume Air Quality Index (PAQI) is one of the indices to measure and determine the level of air quality. In measuring the accuracy of the air quality level, it is necessary to do the right classification. Some previous studies have conducted classification analysis using the decision tree and K-Nearest Neighbor (k-NN) methods, but only evaluated using accuracy values. Therefore, this study uses both methods to evaluate the results of air quality level classification not only with accuracy but also with precision, recall, and F1-score. Secondary data of pollutant concentration values and PAQI categories based on particulate matter (PM2.5 and PM10), nitrogen dioxide (NO2), and ozone (O3) derived from Plume Labs for 33 provincial capitals in Indonesia in the time period from July 1 to December 31, 2022, were used in this study. From the results of comparing the performance of the two methods, it is found that the decision tree has a greater performance value than the performance value of k-NN. The decision tree performance values for accuracy, precision, recall and F1-score are 90.67%, 90.61%, 90.67%, and 90.63%, respectively. So, it can be concluded that the decision tree performs better than k-NN in classifying PAQI categories with better overall evaluation metric values.
Backpropagation Neural Network-Based Prediction of Kovats Retention Index for Essential Oil Compounds Safhadi, Aulia Al-Jihad; Noviandy, Teuku Rizky; Irvanizam, Irvanizam; Suhendra, Rivansyah; Karma, Taufiq; Idroes, Rinaldi
Infolitika Journal of Data Science Vol. 2 No. 1 (2024): May 2024
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v2i1.197

Abstract

The identification of chemical compounds in essential oils is crucial in industries such as pharmaceuticals, perfumery, and food. Kovats Retention Index (RI) values are essential for compound identification using gas chromatography-mass spectrometry (GC-MS). Traditional RI determination methods are time-consuming, labor-intensive, and susceptible to experimental variability. Recent advancements in data science suggest that artificial intelligence (AI) can enhance RI prediction accuracy and efficiency. However, the full potential of AI, particularly artificial neural networks (ANN), in predicting RI values remains underexplored. This study develops a backpropagation neural network (BPNN) model to predict the Kovats RI values of essential oil compounds using five molecular descriptors: ATSc1, VCH-7, SP-1, Kier1, and MLogP. We trained the BPNN on a dataset of 340 essential oil compounds and optimized it through hyperparameter tuning. We show that the optimized BPNN model, with an epoch count of 100, a learning rate of 0.1, a hidden layer size of 10 neurons, and the ReLU activation function, achieves an R² value of 0.934 and a Root Mean Squared Error (RMSE) of 76.98. These results indicate a high correlation between predicted and actual RI values and a low average prediction error. Our findings demonstrate that BPNNs can significantly improve the efficiency and accuracy of compound identification, reducing reliance on traditional experimental methods.
A Model-Agnostic Interpretability Approach to Predicting Customer Churn in the Telecommunications Industry Noviandy, Teuku Rizky; Idroes, Ghalieb Mutig; Hardi, Irsan; Afjal, Mohd; Ray, Samrat
Infolitika Journal of Data Science Vol. 2 No. 1 (2024): May 2024
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v2i1.199

Abstract

Customer churn is critical for businesses across various industries, especially in the telecommunications sector, where high churn rates can significantly impact revenue and growth. Understanding the factors leading to customer churn is essential for developing effective retention strategies. Despite the predictive power of machine learning models, there is a growing demand for model interpretability to ensure trust and transparency in decision-making processes. This study addresses this gap by applying advanced machine learning models, specifically Naïve Bayes, Random Forest, AdaBoost, XGBoost, and LightGBM, to predict customer churn in a telecommunications dataset. We enhanced model interpretability using SHapley Additive exPlanations (SHAP), which provides insights into feature contributions to predictions. Here, we show that LightGBM achieved the highest performance among the models, with an accuracy of 80.70%, precision of 84.35%, recall of 90.54%, and an F1-score of 87.34%. SHAP analysis revealed that features such as tenure, contract type, and monthly charges are significant predictors of customer churn. These results indicate that combining predictive analytics with interpretability methods can provide telecom companies with actionable insights to tailor retention strategies effectively. The study highlights the importance of understanding customer behavior through transparent and accurate models, paving the way for improved customer satisfaction and loyalty. Future research should focus on validating these findings with real-world data, exploring more sophisticated models, and incorporating temporal dynamics to enhance churn prediction models' predictive power and applicability.
Artificial Neural Network–Particle Swarm Optimization Approach for Predictive Modeling of Kovats Retention Index in Essential Oils Kurniadinur, Kurniadinur; Noviandy, Teuku Rizky; Idroes, Ghazi Mauer; Ahmad, Noor Atinah; Irvanizam, Irvanizam; Subianto, Muhammad; Idroes, Rinaldi
Infolitika Journal of Data Science Vol. 2 No. 2 (2024): November 2024
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v2i2.220

Abstract

The Kovats retention index is a critical parameter in gas chromatography used for the identification of volatile compounds in essential oils. Traditional methods for determining the Kovats retention index are often labor-intensive, time-consuming, and prone to inaccuracies due to variations in experimental conditions. This study presents a novel approach combining Artificial Neural Networks (ANN) with Particle Swarm Optimization (PSO) to predict the Kovats retention index of essential oil compounds more accurately and efficiently. The ANN-PSO hybrid model leverages the strengths of both techniques: the ANN's capacity to model complex nonlinear relationships and PSO's capability to optimize hyperparameters by finding the global optimum. The model was trained using a dataset of 340 essential oil compounds with molecular descriptors, with the performance evaluated based on Root Mean Squared Error (RMSE) and Mean Absolute Percentage Error (MAPE). Results indicate that a simpler ANN configuration with one hidden neuron achieved the lowest RMSE (80.16) and MAPE (5.65%), suggesting that the relationship between the molecular descriptors and the Kovats retention index is not overly complex. This study demonstrates that the ANN-PSO model can serve as an effective tool for predictive modeling of the Kovats retention index, reducing the need for experimental procedures and improving analytical efficiency in essential oil research.
Advanced Anemia Classification Using Comprehensive Hematological Profiles and Explainable Machine Learning Approaches Noviandy, Teuku Rizky; Idroes, Ghifari Maulana; Suhendra, Rivansyah; Bakri, Tedy Kurniawan; Idroes, Rinaldi
Infolitika Journal of Data Science Vol. 2 No. 2 (2024): November 2024
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v2i2.237

Abstract

Anemia is a common health issue with serious clinical effects, making timely and accurate diagnosis essential to prevent complications. This study explores the use of machine learning (ML) methods to classify anemia and its subtypes using detailed hematological data. Six ML models were tested: Gradient Boosting, Random Forest, Naive Bayes, Logistic Regression, Support Vector Machine, and K-Nearest Neighbors. The dataset was preprocessed using feature standardization and the Synthetic Minority Oversampling Technique (SMOTE) to address class imbalance. Gradient Boosting delivered the highest accuracy, sensitivity, and F1-score, establishing itself as the top-performing model. SHapley Additive exPlanations (SHAP) analysis was applied to enhance model interpretability, identifying key predictive features. This study highlights the potential of explainable ML to develop efficient, accurate, and scalable tools for anemia diagnosis, fostering improved healthcare outcomes globally.
Co-Authors Abas, Abdul Hawil Abd Rahman, Sunarti Abrar , Tajul Adi Purnawarman, Adi Afidh, Razief Perucha Fauzie Afjal, Mohd Ahmad Watsiq Maula Ahmad, Noor Atinah Ahsya, Yahdina Alfharijy, Muhammad Daffa Amalina, Faizah Amirah, Kelsy Amri Amin Anisah Aprianto . Apriliansyah, Feby Asep Rusyana Azhar, Fauzul Azzuhry , Haikal Bahri, Ridzky Aulia BAKRI, TEDY KURNIAWAN Dahlawy, Arriz Dharma, Aditia Dian Handayani Dimas Chaerul Ekty Saputra Earlia, Nanda Effendy, Amalia Eko Suhartono El-Shazly, Mohamed Emran, Talha Bin Enitan, Seyi Samson Essy Harnelly Faisal, Farassa Rani Fajri, Irfan Fatani, Muhammad Fauzi, Fazlin Mohd Furqan, Nurul Ghazi Mauer Idroes Hafizah, Iffah Hardi, Irsan Hardia, Natasha Athira Keisha Hewindati, Yuni Tri Hidayatullah, Ferdy Hilal, Iin Shabrina Hizir Sofyan Husdayanti, Noviana Idroes, Ghalieb Mutig Idroes, Ghifari Maulana Imelda, Eva Imran Imran Irma Sari Irvanizam, Irvanizam Isra Firmansyah, Isra Kadri, Mirzatul Khairan Khairan Khairul, Mhd Khairul, Moh Khairun Nisa Kruba, Rumaisa Kurniadinur, Kurniadinur Kusumo, Fitranto Lala, Andi Lindawati Lindawati Mahyuddin Mahyuddin Maimun Syukri, Maimun Mardalena, Selvi Maria Paristiowati Marwan Marwan Maulana, Aga Maulydia, Nur Balqis Misbullah, Alim Mohamed Yusof, Nur Intan Saidaah Mohd Fauzi, Fazlin Muhammad Adam, Muhammad Muhammad Faisal Muhammad Subianto Muhammad Yanis Muhammad Yusuf Muhtadin Muhtadin Mukhlisuddin Ilyas Muliadi Mursyida, Waliam Muslem Muslem, Muslem Mutaqin, Raihan Nainggolan, Sarah Ika Niode, Nurdjannah Jane Nizamuddin Nizamuddin Nurleila, Nurleila Patwekar, Mohsina Rahmawati, Cut Raihan Raihan, Raihan Ramadeska, Siti Raudhatul Jannah Ray, Samrat Razief Perucha Fauzie Afidh Rinaldi Idroes Ringga, Edi Saputra Rizkia, Tatsa RR. Ella Evrita Hestiandari Ryan Setiawan Safhadi, Aulia Al-Jihad Sasmita, Novi Reandy Satrio, Justinus Sofyan, Rahmi Solly Aryza Souvia Rahimah Sufri, Rahmat sufriani, sufriani Sugara, Dimas Rendy Suhendra , Rivansyah Suhendra, Rivansyah Suhendrayatna Suhendrayatna Suryadi Suryadi Syahyana, Ahmad Taufiq Karma Teuku Zulfikar TRINA EKAWATI TALLEI Utami, Resty Tamara Yandri, Erkata Zahriah, Zahriah Zhilalmuhana, Teuku Zuchra Helwani, Zuchra Zulkarnain Jalil Zurnila Marli Kesuma