cover
Contact Name
Teuku Rizky Noviandy
Contact Email
trizkynoviandy@gmail.com
Phone
+6282275731976
Journal Mail Official
editorial-office@heca-analitika.com
Editorial Address
Jl. Makam T. Nyak Arief Kompleks BUPERTA Blok L7B, Lamgapang, Aceh Besar, Provinsi Aceh
Location
Kab. aceh besar,
Aceh
INDONESIA
Infolitika Journal of Data Science
ISSN : -     EISSN : 30258618     DOI : https://doi.org/10.60084/ijds
Infolitika Journal of Data Science is a distinguished international scientific journal that showcases high caliber original research articles and comprehensive review papers in the field of data science. The journals core mission is to stimulate interdisciplinary research collaboration, facilitate the exchange of knowledge, and drive the advancement and application of innovative strategies within the data science domain. Topics of this journal includes, but not limited to Data Mining and Analysis, Machine Learning and Artificial Intelligence, Big Data and Data Engineering, Predictive Modeling and Forecasting, Natural Language Processing, Computer Vision, Data Visualization and Interpretation, Ethics and Privacy in Data Science, Applications of Data Science, Interdisciplinary Approaches
Articles 30 Documents
Developing a Regional Framework for Disaster Risk Reduction Based on Disaster-Related Data from Aceh, Indonesia Yolanda, Yolanda; Oktari, Rina Suryani; Munawar, Munawar; Lola, Muhamad Safiih; Sofyan, Hizir
Infolitika Journal of Data Science Vol. 3 No. 1 (2025): May 2025
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v3i1.269

Abstract

Aceh Province is highly vulnerable to various hazards, necessitating effective disaster risk reduction strategies. This study aims to develop an instrument to evaluate disaster risk reduction efforts in Aceh Province and to assess progress toward global disaster resilience targets. The data includes secondary disaster-related records from 2005 to 2024 and primary data from the instrument validation process, demonstrating excellent validity results based on the Content Validity Ratio (CVR) and Content Validity Index (CVI). The findings highlight significant improvements in key areas, including reductions in disaster mortality, affected populations, economic losses, damage to critical infrastructure, and strengthened early warning systems. However, challenges persist in implementing local disaster risk reduction strategies and enhancing international cooperation. This study offers practical insights for policymakers and contributes to strengthening disaster resilience and advancing disaster risk management research in sub-national contexts.
Similarity-Based Network in the Industrial Community of Joyo City Takeuchi, Keita; Iwasaki, Masashi; Shinjo, Masato
Infolitika Journal of Data Science Vol. 3 No. 1 (2025): May 2025
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v3i1.267

Abstract

Data utilization is becoming increasingly widespread in a variety of fields around the world, and has become especially important in the industrial world. Data utilization techniques and approaches can contribute to the development of not only individual companies but also certain groups of companies. In this paper, we consider the industrial structure of Joyo City, Japan, by analyzing data collected through interviews with company presidents and managers. The main purpose of this paper is to grasp it in terms of similarity across industrial categories. We first express the features of each company as a vector with entries determined from the interview data. We then compute vector similarities in order to draw a graphical network, in which nodes corresponding to similar companies are linked by an edge. From the resulting network, we derive the most similar companies in the same and different industrial categories for each company. Moreover, we then classify Joyo City's companies into new groups across the standard categories.
Optimizing Energy Consumption Prediction Across the IMT-GT Region Through PCA-Based Modeling Farid, Muhammad; Nuzullah, Teuku Muhammad Faiz; Aklya, Zatul; Nazila, Syifa; Ulhaq , Muhammad Zia; Apriliansyah, Feby; Sasmita, Novi Reandy
Infolitika Journal of Data Science Vol. 3 No. 1 (2025): May 2025
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v3i1.286

Abstract

This study aims to improve the accuracy of energy consumption prediction in the Indonesia-Malaysia-Thailand Growth Triangle (IMT-GT) region by addressing multicollinearity among independent variables such as energy production (Mtoe), lignite coal production (million tons), crude oil production (million tons), refined oil production (million tons), natural gas production (billion cubic meters), and electricity production (terawatt-hours). By integrating Principal Component Analysis (PCA) with Random Forest (RF), six correlated variables were reduced into two uncorrelated principal components (PC1 and PC2), explaining 80.77% of the data variance. The PCA-RF hybrid model outperformed the standalone Random Forest (RF) model, with an increase in the coefficient of determination (R2) from 0.976 to 0.993. Additionally, it achieved significant reductions in error metrics, with the mean absolute error (MAE) decreasing from 5.811 to 4.169 and the root mean square error (RMSE) dropping from 9.278 to 4.786. These results demonstrate PCA’s effectiveness in isolating dominant drivers such as energy and lignite coal production while improving model stability. The framework provides policymakers with a reliable tool to forecast energy demand and align economic growth with sustainability in fossil fuel-dependent economies.
Explainable Deep Learning with Lightweight CNNs for Tuberculosis Classification Noviandy, Teuku Rizky; Idroes, Ghazi Mauer; Zulfikar, Teuku; Idroes, Rinaldi
Infolitika Journal of Data Science Vol. 3 No. 1 (2025): May 2025
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v3i1.305

Abstract

Tuberculosis (TB) remains a major global health threat, particularly in low-resource settings where timely diagnosis is critical yet often limited by the lack of radiological expertise. Chest X-rays (CXRs) are widely used for TB screening, but manual interpretation is prone to errors and variability. While deep learning has shown promise in automating CXR analysis, most existing models are computationally intensive and lack interpretability, limiting their deployment in real-world clinical environments. To address this gap, we evaluated three lightweight and explainable CNN architectures, ShuffleNetV2, SqueezeNet 1.1, and MobileNetV3, for binary TB classification using a locally sourced dataset of 3,008 CXR images. Using transfer learning and Grad-CAM for visual explanation, we show that MobileNetV3 and ShuffleNetV2 achieved perfect test performance with 100% accuracy, sensitivity, specificity, precision, and F1-score, along with AUC scores of 1.00 and inference times of 94.66 and 103.63 seconds, respectively. SqueezeNet performed moderately, with a lower F1-score of 82.98% and several misclassifications. These results demonstrate that lightweight CNNs can deliver high diagnostic accuracy and transparency, supporting their use in scalable, AI-assisted TB screening systems for underserved healthcare settings.
Inductive Biases in Feature Reduction for QSAR: SHAP vs. Autoencoders Noviandy, Teuku Rizky; Idroes, Ghifari Maulana; Lala, Andi; Helwani, Zuchra; Idroes, Rinaldi
Infolitika Journal of Data Science Vol. 3 No. 1 (2025): May 2025
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v3i1.306

Abstract

Machine learning models in drug discovery often depend on high-dimensional molecular descriptors, many of which may be redundant or irrelevant. Reducing these descriptors is essential for improving model performance, interpretability, and computational efficiency. This study compares two widely used reduction strategies: SHAP-based feature selection and autoencoder-based compression, within the context of Quantitative Structure-Activity Relationship (QSAR) classification. LightGBM is used as a consistent modeling framework to evaluate models trained on all descriptors, the top 50 and 100 SHAP-ranked descriptors, and a 64-dimensional autoencoder embedding. The results show that SHAP-based selection produces interpretable and stable models with minimal performance loss, particularly when using the top 100 descriptors. In contrast, the autoencoder achieves the highest test performance by capturing nonlinear patterns in a compact, low-dimensional representation, although this comes at the cost of interpretability and consistency across data splits. These findings reflect the differing inductive biases of each method. SHAP prioritizes sparsity and attribution, while autoencoders focus on reconstruction and continuity. The analysis emphasizes that descriptor reduction strategies are not interchangeable. SHAP-based selection is suitable for applications where interpretability and reliability are essential, such as in hypothesis-driven or regulatory settings. Autoencoders are more appropriate for performance-driven tasks, including virtual screening. The choice of reduction strategy should be guided not only by performance metrics but also by the specific modeling requirements and assumptions relevant to cheminformatics workflows.
Comparison of Spatial Interpolation Methods: Inverse Distance Weighted and Kriging for Earthquake Intensity Mapping in Aceh, Indonesia Rahayu, Latifah; Utami, Cut Chairilla Yolanda; Fauzi, Rahmatul; Sasmita, Novi Reandy
Infolitika Journal of Data Science Vol. 3 No. 2 (2025): November 2025
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v3i2.347

Abstract

Aceh Province, located in the Sumatra megathrust zone of Indonesia, is one of the most seismically active regions in Southeast Asia. Understanding the spatial distribution of earthquake magnitudes is essential for disaster mitigation and risk management. This study compares two spatial interpolation methods Inverse Distance Weighted (IDW) and Kriging to determine the most accurate approach for mapping earthquake intensity in Aceh Province. A total of 2,255 earthquake events with magnitudes of 2.5 M and above, recorded between 1990 and 2024 by the United States Geological Survey (USGS), were analyzed. IDW was tested using five power parameters (p = 1–5), while Kriging applied three semivariogram models (spherical, exponential, and Gaussian). The interpolation accuracy was assessed through Root Mean Square Error (RMSE), Mean Square Error (MSE), and Mean Absolute Percentage Error (MAPE). Results indicated that Kriging with the exponential semivariogram achieved the highest accuracy, with RMSE = 0.0848, MSE = 0.0072, and MAPE = 1.14%, outperforming IDW (RMSE = 0.2288, MSE = 0.0523, MAPE = 1.24%). The Kriging model effectively represented the gradual spatial decay of seismic energy, identifying Aceh Singkil and northern Simeulue as the most earthquake-prone zones, consistent with regional tectonic patterns. These findings confirm that incorporating spatial autocorrelation enhances interpolation accuracy and geophysical interpretation. The study establishes Kriging as a reliable tool for seismic hazard mapping and provides valuable insights for disaster preparedness, infrastructure planning, and future geostatistical applications in earthquake risk assessment.
A Convolutional Neural Network Model for Mushroom Toxicity Recognition Irvanizam, Irvanizam; Subianto, Muhammad; Jamil, Muhammad Salsabila
Infolitika Journal of Data Science Vol. 3 No. 2 (2025): November 2025
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v3i2.359

Abstract

Mushroom poisoning remains a public health concern, often caused by misidentifying toxic species that visually resemble edible ones. This study investigates the feasibility of using a Convolutional Neural Network (CNN) to classify five mushroom species, Amanita caesarea, Amanita phalloides, Cantharellus cibarius, Omphalotus olearius, and Volvariella volvacea into toxic and non-toxic categories based on image data. A dataset of 137 images was collected and preprocessed through resizing, normalization, and data augmentation. A modified AlexNet-based CNN was trained and evaluated using accuracy, precision, recall, and F1-score. The best-performing model achieved a validation accuracy of 0.40, indicating limited discriminative capability. These findings highlight that the dataset size is insufficient for training a CNN from scratch and that the model cannot reliably distinguish species with subtle morphological differences. The study concludes that larger datasets, improved image quality, and transfer learning approaches are essential for achieving practical and deployable mushroom classification performance.
Assessing the Performance of Ensemble and Regularized Models for Daily Rainfall Forecasting in Singapore Musliadi, Musliadi; Zulkarnaini, Muhammad; Musaffa, Asalul; Yolanda, Yolanda
Infolitika Journal of Data Science Vol. 3 No. 2 (2025): November 2025
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v3i2.360

Abstract

This study benchmarks ensemble and regularized machine learning models for daily rainfall forecasting using meteorological data from forty-four observation stations across Singapore. The country’s highly variable tropical climate and frequent short-duration rainfall events pose major challenges for urban flood mitigation and operational forecasting. To address this, three algorithms—Lasso Regression, XGBoost Regression, and Gradient Boosting Regression—were developed and evaluated through a systematic comparison of predictive performance. Each model was trained using data from 1980–2023 and validated on independent observations from 2024–2025. The input variables included sub-hourly rainfall intensity, temperature, and wind-related parameters processed through a standardized data-cleaning and imputation pipeline. Results show that XGBoost achieved the most consistent and accurate predictions, with superior performance under both normal and heavy rainfall conditions. Statistical tests confirmed that the improvement was significant compared to Lasso and Gradient Boosting. These findings demonstrate the effectiveness of ensemble-based approaches for enhancing the reliability of data-driven rainfall forecasting in tropical urban environments and support their integration into early warning and hydrological risk management systems.
Enhanced Thyroid Disorder Classification Through XGBoost-Based Machine Learning Techniques Maulana, Aga
Infolitika Journal of Data Science Vol. 3 No. 2 (2025): November 2025
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v3i2.361

Abstract

Thyroid disorders are common endocrine conditions whose diagnosis often requires integrating multiple clinical and laboratory indicators. This study proposes a machine learning framework for multiclass classification of thyroid diseases using XGBoost combined with an automated preprocessing and feature-engineering pipeline. A dataset of 9,167 patient records and 30 clinical and biochemical features was processed using a structured pipeline that included imputation, encoding, scaling, and hyperparameter optimization with RandomizedSearchCV and GridSearchCV. The optimized XGBoost model achieved 95.20% test accuracy, a high weighted F1-score (0.94), and consistent cross-validated performance. Classification results showed excellent discrimination for major thyroid conditions and reliable identification of healthy individuals. Feature importance analysis revealed that TBG-related measurements, thyroxine therapy status, and key hormone indices (TSH, TT4, FTI) were the most influential predictors. Overall, the findings demonstrate that the proposed XGBoost-based framework provides accurate and robust support for multiclass thyroid disease diagnosis and can serve as a practical foundation for clinical decision-support applications.
An Interpretable Machine Learning Framework for Predicting Advanced Tumor Stages Noviandy, Teuku Rizky; Patwekar, Mohsina; Patwekar, Faheem; Idroes, Rinaldi
Infolitika Journal of Data Science Vol. 3 No. 2 (2025): November 2025
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v3i2.364

Abstract

Accurate identification of advanced tumor stages is essential for timely clinical decision-making and personalized treatment planning. This study proposes an explainable ensemble learning framework for predicting advanced tumor stage using a dataset containing 10,000 samples with 18 clinical and radiological features. Four machine learning models, namely Logistic Regression, Naïve Bayes, AdaBoost, and LightGBM, were evaluated using stratified train–test splits along with standard performance metrics. LightGBM achieved the highest performance, with an accuracy of 86.05% and an F1-score of 76.61%, outperforming linear and probabilistic classifiers. ROC–AUC and precision–recall analyses further confirmed the superior discriminative ability of ensemble methods. SHAP explainability techniques highlighted mitotic count, Ki-67 index, enhancement, and necrosis as the most influential predictors of advanced stage. The proposed framework demonstrates strong predictive capability and provides clinically interpretable insights, underscoring its potential as a decision-support tool in oncological diagnostics. Future work will involve external validation and integration of additional multimodal data to enhance generalizability.

Page 3 of 3 | Total Record : 30