Claim Missing Document
Check
Articles

Found 18 Documents
Search

Decision Tree versus k-NN: A Performance Comparison for Air Quality Classification in Indonesia Sasmita, Novi Reandy; Ramadeska, Siti; Kesuma, Zurnila Marli; Noviandy, Teuku Rizky; Maulana, Aga; Khairul, Mhd; Suhendra, Rivansyah
Infolitika Journal of Data Science Vol. 2 No. 1 (2024): May 2024
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v2i1.179

Abstract

Air quality can affect human health, the environment, and the sustainability of ecosystems, so efforts are needed to monitor and control air quality. The Plume Air Quality Index (PAQI) is one of the indices to measure and determine the level of air quality. In measuring the accuracy of the air quality level, it is necessary to do the right classification. Some previous studies have conducted classification analysis using the decision tree and K-Nearest Neighbor (k-NN) methods, but only evaluated using accuracy values. Therefore, this study uses both methods to evaluate the results of air quality level classification not only with accuracy but also with precision, recall, and F1-score. Secondary data of pollutant concentration values and PAQI categories based on particulate matter (PM2.5 and PM10), nitrogen dioxide (NO2), and ozone (O3) derived from Plume Labs for 33 provincial capitals in Indonesia in the time period from July 1 to December 31, 2022, were used in this study. From the results of comparing the performance of the two methods, it is found that the decision tree has a greater performance value than the performance value of k-NN. The decision tree performance values for accuracy, precision, recall and F1-score are 90.67%, 90.61%, 90.67%, and 90.63%, respectively. So, it can be concluded that the decision tree performs better than k-NN in classifying PAQI categories with better overall evaluation metric values.
Classifying Beta-Secretase 1 Inhibitor Activity for Alzheimer’s Drug Discovery with LightGBM Noviandy, Teuku Rizky; Nisa, Khairun; Idroes, Ghalieb Mutig; Hardi, Irsan; Sasmita, Novi Reandy
Journal of Computing Theories and Applications Vol. 1 No. 4 (2024): JCTA 1(4) 2024
Publisher : Universitas Dian Nuswantoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62411/jcta.10129

Abstract

This study explores the utilization of LightGBM, a gradient-boosting framework, to classify the inhibitory activity of beta-secretase 1 inhibitors, addressing the challenges of Alzheimer's disease drug discovery. The study aims to enhance classification performance by focusing on overcoming the limitations of traditional statistical models and conventional machine-learning techniques in handling complex molecular datasets. By sourcing a dataset of 7298 compounds from the ChEMBL database and calculating molecular descriptors for each compound as features, we employed LightGBM in conjunction with a set of carefully selected molecular descriptors to achieve a nuanced analysis of compound activities. The model's efficiency was benchmarked against traditional machine-learning algorithms, revealing LightGBM's superior accuracy (84.93%), precision (87.14%), sensitivity (89.93%), specificity (77.63%), and F1-score (88.17%) in classifying beta-secretase 1 inhibitor activity. The study underscores the critical role of molecular descriptors in understanding drug efficacy, highlighting LightGBM's potential in streamlining the virtual screening process. Conclusively, the findings advocate for LightGBM's adoption in computational drug discovery, offering a promising avenue for advancing Alzheimer's disease therapeutic development by facilitating the identification of potential drug candidates with enhanced precision and reliability.
Spatial Estimation for Tuberculosis Relative Risk in Aceh Province, Indonesia: A Bayesian Conditional Autoregressive Approach with the Besag-York-Mollie (BYM) Model Sasmita, Novi Reandy; Arifin, Mauzatul; Kesuma, Zurnila Marli; Rahayu, Latifah; Mardalena, Selvi; Kruba, Rumaisa
Journal of Applied Data Sciences Vol 5, No 2: MAY 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i2.185

Abstract

Tuberculosis (TB) remains a significant public health challenge globally, with Indonesia being the second-highest country in TB cases worldwide. Aceh Province has one of the highest TB incidence rates in Indonesia. This study aims to estimate and map the spatial distribution patterns of TB relative risk across districts in Aceh Province, Indonesia, to reveal significant variations. The study employed an ecological time-series study design, utilizing the Bayesian Conditional Autoregressive (CAR) approach with the Besag-York-Mollie (BYM) model for spatial estimation and mapping of TB relative risk. TB case data and population data for 23 districts/cities in Aceh Province from 2016 to 2022 were analyzed. Spatial analysis was used to estimate and map TB's relative risk, aiding in identifying areas with higher transmission risks. The results showed that the relative risk of TB varied across districts/cities in Aceh Province over the study period. However, Lhokseumawe and Banda Aceh consistently exhibited high to very high relative risks over the years. In 2022, Lhokseumawe City and Banda Aceh City had the highest relative risks by 2.26 and 2.17, respectively, while Sabang City and Bener Meriah District had the lowest by 0.43 and 0.32, respectively. This study provides valuable insights into the heterogeneous landscape of TB risk in Aceh Province, which can inform targeted interventions and planning strategies for effective TB control. Using the Bayesian CAR BYM model proved effective in estimating and mapping TB's relative risk, highlighting areas requiring prioritized attention in TB prevention and control efforts.
Spatial Estimation of Relative Risk for Dengue Fever in Aceh Province using Conditional Autoregressive Method Rahayu, Latifah; Sasmita, Novi Reandy; Adila, Wulan Farisa; Kesuma, Zurnila Marli; Kruba, Rumaisa
Journal of Applied Data Sciences Vol 4, No 4: DECEMBER 2023
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v4i4.141

Abstract

Dengue Fever (DHF) is a dangerous infectious disease that can cause death in an infected person. DHF is a disease transmitted by the Aedes Aegypti mosquito. Dengue cases have been reported in 449 districts/cities spread across 34 provinces with deaths spread across 162 districts/cities in 31 provinces, one of which is in Aceh Province. However, there are districts and cities in Aceh Province with a large number of cases and population at risk, and there are also districts and cities with fewer cases and population at risk. As a result, the number of cases and population at risk of DHF varies. Therefore, it is important to do planning to see which districts and cities have a high chance of DHF. In this study, the type of data used is secondary data sourced from the Aceh Provincial Health Profile from 2016 to 2022. The approach used is the Bayesian Conditional Autoregressive (CAR) prior model Besag-York-Mollie (BYM). The results of this study showed that mortality in dengue cases in Aceh Province from 2016 to 2022 had the highest mortality values in 2016 and 2022. The results of estimating the relative risk of DHF cases using the Bayesian Conditional Autoregressive (CAR) approach of the Besag-York-Mollie (BYM) Model in Aceh Province fulfill all categories with their relative risk values. Some districts/cities have relative risk values. Some districts/cities have high relative risk values of DHF cases and low relative risk values of DHF cases. Sabang city had the highest relative risk value of 3.54 and Bener Meriah district had the lowest relative risk of 0.2.
Forecasting Upwelling Phenomena in Lake Laut Tawar: A Semi-Supervised Learning Approach Ulhaq, Muhammad Zia; Farid, Muhammad; Aziza, Zahra Ifma; Nuzullah, Teuku Muhammad Faiz; Syakir, Fakhrus; Sasmita, Novi Reandy
Infolitika Journal of Data Science Vol. 2 No. 2 (2024): November 2024
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v2i2.211

Abstract

The current climate change is causing the upwelling phenomenon to occur frequently in lakes and reservoirs. As a result of this phenomenon, thousands of fish die, causing floating net cage fish farmers to suffer losses. From existing studies, temperature sensors are used to determine the current condition of a body of water experiencing upwelling or not. Therefore, this study applies clustering to historical climate data from 2017-2023 using a semi-supervised learning approach that produces two labels: "potential for upwelling" and "no potential for upwelling." In the clustering process, the data is divided into two clusters using K-Means Clustering, and Support Vector Machine (SVM) is chosen to classify them. The performance of the proposed algorithm is expressed with accuracy, precision, recall, and F1-score values of 0.99, 0.995, 0.970, and 0.985, respectively. The analysis results show that this model has excellent performance in identifying upwelling potential. By using this method, information about upwelling potential can be obtained more quickly and accurately, allowing fish farmers to take appropriate preventive measures. This study also shows that the combination of K-Means Clustering and Support Vector Machine (SVM) can be effectively used to analyze historical climate data and generate useful predictions.
Optimizing Long-Term Meteorological Data Completeness in North Aceh, Indonesia: A Comparative Analysis of Interpolation Methods Sasmita, Novi Reandy; Saragih, Novita Sari; Rahayu, Latifah; Malfirah, Malfirah
JTAM (Jurnal Teori dan Aplikasi Matematika) Vol 9, No 1 (2025): January
Publisher : Universitas Muhammadiyah Mataram

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31764/jtam.v9i1.27929

Abstract

More data in meteorological records is needed to ensure the accuracy of meteorological modeling, particularly in long-term datasets. This study aims to identify the most effective interpolation method for addressing missing data in North Aceh's meteorological dataset from 2010 to 2023, with a focus on the accuracy of methods applied across various meteorological variables. The study analyzed data from North Aceh Regency, Indonesia, comprising 25,565 daily observations of temperature, humidity, rainfall, sunshine duration, and wind speed. Missing values were interpolated using three methods: spline, stineman, and moving average interpolation. Performance was evaluated using Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Squared Logarithmic Error (MSLE) across 10%, 20%, and 30% levels of simulated missing data. All analysis in this study were carried out using R-4.4.2 software. While spline interpolation performed reasonably well, it showed increased variability, especially for high-variance variables like rainfall. Moving average interpolation was less reliable, with error rates increasing alongside higher levels of missing data. In contrast, stineman interpolation consistently achieved the lowest error metrics across all levels of missing data, with MAE ranging from 0.219 to 0.6691, MSLE from 0.035 to 0.109, and RMSE from 1.247 to 2.245, demonstrating superior robustness. Stineman interpolation offers a highly effective approach for managing missing meteorological data in North Aceh’s long-term dataset, enhancing data reliability for meteorological modeling and decision-making in meteorological-sensitive sectors. This study provides practical recommendations for selecting optimal interpolation techniques, especially in regions with variable meteorological data quality.
Optimizing Energy Consumption Prediction Across the IMT-GT Region Through PCA-Based Modeling Farid, Muhammad; Nuzullah, Teuku Muhammad Faiz; Aklya, Zatul; Nazila, Syifa; Ulhaq , Muhammad Zia; Apriliansyah, Feby; Sasmita, Novi Reandy
Infolitika Journal of Data Science Vol. 3 No. 1 (2025): May 2025
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v3i1.286

Abstract

This study aims to improve the accuracy of energy consumption prediction in the Indonesia-Malaysia-Thailand Growth Triangle (IMT-GT) region by addressing multicollinearity among independent variables such as energy production (Mtoe), lignite coal production (million tons), crude oil production (million tons), refined oil production (million tons), natural gas production (billion cubic meters), and electricity production (terawatt-hours). By integrating Principal Component Analysis (PCA) with Random Forest (RF), six correlated variables were reduced into two uncorrelated principal components (PC1 and PC2), explaining 80.77% of the data variance. The PCA-RF hybrid model outperformed the standalone Random Forest (RF) model, with an increase in the coefficient of determination (R2) from 0.976 to 0.993. Additionally, it achieved significant reductions in error metrics, with the mean absolute error (MAE) decreasing from 5.811 to 4.169 and the root mean square error (RMSE) dropping from 9.278 to 4.786. These results demonstrate PCA’s effectiveness in isolating dominant drivers such as energy and lignite coal production while improving model stability. The framework provides policymakers with a reliable tool to forecast energy demand and align economic growth with sustainability in fossil fuel-dependent economies.
Relative Risk and Distribution Assessment of Tuberculosis Cases: A Time-Series Ecological Study in Aceh, Indonesia Sasmita, Novi Reandy; Khairul, Mhd; Fikri, Mumtaz Kemal; Rahayu, Latifa; Kesuma, Zurnila Marli; Mardalena, Selvi; Kruba, Rumaisa; Chongsuvivatwong, Virasakdi; Asshiddiqi, M. Ischaq Nabil
Media Publikasi Promosi Kesehatan Indonesia (MPPKI) Vol. 8 No. 6: JUNE 2025 - Media Publikasi Promosi Kesehatan Indonesia (MPPKI)
Publisher : Fakultas Kesehatan Masyarakat, Universitas Muhammadiyah Palu

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56338/mppki.v8i6.7264

Abstract

Introduction: Tuberculosis (TB) remains a critical public health issue, particularly in high-incidence regions like Aceh Province, Indonesia. This study aimed to estimate the Relative Risk (RR) and analyze significant differences in the temporal distribution of TB cases across Aceh Province. Methods: A time-series ecological study was conducted using TB case and population data from 23 districts/cities in Aceh Province between 2016 and 2022. Data were analyzed using R software, applying descriptive and inferential statistics. The Standardized Morbidity Ratio (SMR) method estimates RR and is categorized into five risk levels. The Kolmogorov-Smirnov test assessed data normality, guiding the selection of statistical tests. The Friedman and Wilcoxon Signed-Rank tests examined differences in TB case distribution trends. Results: Significant spatial and temporal variations in TB risk were identified. Districts such as Banda Aceh (RR = 2.29–2.13) and Lhokseumawe (RR = 1.89–2.21) consistently demonstrated high RR from 2016 to 2022, reflecting persistent TB transmission. A general upward trend in TB cases was observed across districts, with significant spatial variation (p < 0.001), highlighting a worsening TB burden. Conclusions: The study emphasizes the urgent need for targeted public health interventions tailored to TB's unique spatial and temporal dynamics in Aceh Province, Indonesia. Applying SMR and robust statistical analyses provides valuable insights to inform localized TB control policies and strengthen management strategies in high-burden areas.