cover
Contact Name
Teuku Rizky Noviandy
Contact Email
trizkynoviandy@gmail.com
Phone
+6282275731976
Journal Mail Official
editorial-office@heca-analitika.com
Editorial Address
Jl. Makam T. Nyak Arief Kompleks BUPERTA Blok L7B, Lamgapang, Aceh Besar, Provinsi Aceh
Location
Kab. aceh besar,
Aceh
INDONESIA
Infolitika Journal of Data Science
ISSN : -     EISSN : 30258618     DOI : https://doi.org/10.60084/ijds
Infolitika Journal of Data Science is a distinguished international scientific journal that showcases high caliber original research articles and comprehensive review papers in the field of data science. The journals core mission is to stimulate interdisciplinary research collaboration, facilitate the exchange of knowledge, and drive the advancement and application of innovative strategies within the data science domain. Topics of this journal includes, but not limited to Data Mining and Analysis, Machine Learning and Artificial Intelligence, Big Data and Data Engineering, Predictive Modeling and Forecasting, Natural Language Processing, Computer Vision, Data Visualization and Interpretation, Ethics and Privacy in Data Science, Applications of Data Science, Interdisciplinary Approaches
Articles 5 Documents
Search results for , issue "Vol. 2 No. 1 (2024): May 2024" : 5 Documents clear
Decision Tree versus k-NN: A Performance Comparison for Air Quality Classification in Indonesia Sasmita, Novi Reandy; Ramadeska, Siti; Kesuma, Zurnila Marli; Noviandy, Teuku Rizky; Maulana, Aga; Khairul, Mhd; Suhendra, Rivansyah
Infolitika Journal of Data Science Vol. 2 No. 1 (2024): May 2024
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v2i1.179

Abstract

Air quality can affect human health, the environment, and the sustainability of ecosystems, so efforts are needed to monitor and control air quality. The Plume Air Quality Index (PAQI) is one of the indices to measure and determine the level of air quality. In measuring the accuracy of the air quality level, it is necessary to do the right classification. Some previous studies have conducted classification analysis using the decision tree and K-Nearest Neighbor (k-NN) methods, but only evaluated using accuracy values. Therefore, this study uses both methods to evaluate the results of air quality level classification not only with accuracy but also with precision, recall, and F1-score. Secondary data of pollutant concentration values and PAQI categories based on particulate matter (PM2.5 and PM10), nitrogen dioxide (NO2), and ozone (O3) derived from Plume Labs for 33 provincial capitals in Indonesia in the time period from July 1 to December 31, 2022, were used in this study. From the results of comparing the performance of the two methods, it is found that the decision tree has a greater performance value than the performance value of k-NN. The decision tree performance values for accuracy, precision, recall and F1-score are 90.67%, 90.61%, 90.67%, and 90.63%, respectively. So, it can be concluded that the decision tree performs better than k-NN in classifying PAQI categories with better overall evaluation metric values.
Optimizing Geothermal Power Plant Locations in Indonesia: A Multi-Objective Optimization on The Basis of Ratio Analysis Approach Rahman, Isra Farliadi; Misbullah, Alim; Irvanizam, Irvanizam; Yusuf, Muhammad; Maulana, Aga; Marwan, Marwan; Dharma, Dian Budi; Idroes, Rinaldi
Infolitika Journal of Data Science Vol. 2 No. 1 (2024): May 2024
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v2i1.184

Abstract

As the global energy landscape shifts towards sustainable sources, geothermal energy emerges as a pivotal renewable resource, particularly in regions with abundant geothermal potential like Indonesia. This study focuses on Mount Seulawah in Aceh Province, a region rich in geothermal resources, to optimize the selection of geothermal power plant (GPP) sites using the Multi-Objective Optimization on the Basis of Ratio Analysis (MOORA) method. Our approach integrates environmental, technical, and accessibility criteria, including distance to settlements, land slope, proximity to fault lines and heat sources, and road access. By employing a structured decision matrix and applying MOORA, we systematically evaluated and ranked potential sites based on their suitability for GPP development. The results highlight the site at Ie Brôuk as the most optimal due to its minimal environmental impact and superior geological and accessibility conditions. This study not only contributes to the strategic deployment of geothermal resources in Indonesia but also provides a replicable model for other regions with similar geothermal potentials, emphasizing the importance of a balanced and informed approach to renewable energy site selection.
Predicting Obesity Levels with High Accuracy: Insights from a CatBoost Machine Learning Model Maulana, Aga; Afidh, Razief Perucha Fauzie; Maulydia, Nur Balqis; Idroes, Ghazi Mauer; Rahimah, Souvia
Infolitika Journal of Data Science Vol. 2 No. 1 (2024): May 2024
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v2i1.195

Abstract

This study aims to develop a machine learning model using the CatBoost algorithm to predict obesity based on demographic, lifestyle, and health-related features and compare its performance with other machine learning algorithms. The dataset used in this study, containing information on 2,111 individuals from Mexico, Peru, and Colombia, was used to train and evaluate the CatBoost model. The dataset included gender, age, height, weight, eating habits, physical activity levels, and family history of obesity. The model's performance was assessed using accuracy, precision, recall, and F1-score and compared to logistic regression, K-nearest neighbors (KNN), random forest, and naive Bayes algorithms. Feature importance analysis was conducted to identify the most influential factors in predicting obesity levels. The results indicate that the CatBoost model achieved the highest accuracy at 95.98%, surpassing other models. Furthermore, the CatBoost model demonstrated superior precision (96.08%), recall (95.98%), and F1-score (96.00%). The confusion matrix revealed that the model accurately predicted the majority of instances in each obesity level category. Feature importance analysis identified weight, height, and gender as the most influential factors in predicting obesity levels, followed by dietary habits, physical activity, and family history of overweight. The model's high accuracy, precision, recall, and F1-score and ability to handle categorical variables effectively make it a valuable tool for obesity risk assessment and classification. The insights gained from the feature importance analysis can guide the development of targeted obesity prevention and management strategies, focusing on modifiable risk factors such as diet and physical activity. While further validation on diverse populations is necessary, the CatBoost model's results demonstrate its potential to support clinical decision-making and inform public health initiatives in the fight against the global obesity epidemic.
Backpropagation Neural Network-Based Prediction of Kovats Retention Index for Essential Oil Compounds Safhadi, Aulia Al-Jihad; Noviandy, Teuku Rizky; Irvanizam, Irvanizam; Suhendra, Rivansyah; Karma, Taufiq; Idroes, Rinaldi
Infolitika Journal of Data Science Vol. 2 No. 1 (2024): May 2024
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v2i1.197

Abstract

The identification of chemical compounds in essential oils is crucial in industries such as pharmaceuticals, perfumery, and food. Kovats Retention Index (RI) values are essential for compound identification using gas chromatography-mass spectrometry (GC-MS). Traditional RI determination methods are time-consuming, labor-intensive, and susceptible to experimental variability. Recent advancements in data science suggest that artificial intelligence (AI) can enhance RI prediction accuracy and efficiency. However, the full potential of AI, particularly artificial neural networks (ANN), in predicting RI values remains underexplored. This study develops a backpropagation neural network (BPNN) model to predict the Kovats RI values of essential oil compounds using five molecular descriptors: ATSc1, VCH-7, SP-1, Kier1, and MLogP. We trained the BPNN on a dataset of 340 essential oil compounds and optimized it through hyperparameter tuning. We show that the optimized BPNN model, with an epoch count of 100, a learning rate of 0.1, a hidden layer size of 10 neurons, and the ReLU activation function, achieves an R² value of 0.934 and a Root Mean Squared Error (RMSE) of 76.98. These results indicate a high correlation between predicted and actual RI values and a low average prediction error. Our findings demonstrate that BPNNs can significantly improve the efficiency and accuracy of compound identification, reducing reliance on traditional experimental methods.
A Model-Agnostic Interpretability Approach to Predicting Customer Churn in the Telecommunications Industry Noviandy, Teuku Rizky; Idroes, Ghalieb Mutig; Hardi, Irsan; Afjal, Mohd; Ray, Samrat
Infolitika Journal of Data Science Vol. 2 No. 1 (2024): May 2024
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v2i1.199

Abstract

Customer churn is critical for businesses across various industries, especially in the telecommunications sector, where high churn rates can significantly impact revenue and growth. Understanding the factors leading to customer churn is essential for developing effective retention strategies. Despite the predictive power of machine learning models, there is a growing demand for model interpretability to ensure trust and transparency in decision-making processes. This study addresses this gap by applying advanced machine learning models, specifically Naïve Bayes, Random Forest, AdaBoost, XGBoost, and LightGBM, to predict customer churn in a telecommunications dataset. We enhanced model interpretability using SHapley Additive exPlanations (SHAP), which provides insights into feature contributions to predictions. Here, we show that LightGBM achieved the highest performance among the models, with an accuracy of 80.70%, precision of 84.35%, recall of 90.54%, and an F1-score of 87.34%. SHAP analysis revealed that features such as tenure, contract type, and monthly charges are significant predictors of customer churn. These results indicate that combining predictive analytics with interpretability methods can provide telecom companies with actionable insights to tailor retention strategies effectively. The study highlights the importance of understanding customer behavior through transparent and accurate models, paving the way for improved customer satisfaction and loyalty. Future research should focus on validating these findings with real-world data, exploring more sophisticated models, and incorporating temporal dynamics to enhance churn prediction models' predictive power and applicability.

Page 1 of 1 | Total Record : 5