Garuda - Garba Rujukan Digital

Perbaikan Akurasi Naïve Bayes dengan Chi-Square dan SMOTE Dalam Mengatasi High Dimensional dan Imbalanced Data Banjir Rivaldo, Vito Junivan; Siswa, Taghfirul Azhima Yoga; Pranoto, Wawan Joko
JURNAL MEDIA INFORMATIKA BUDIDARMA Vol 8, No 3 (2024): Juli 2024
Publisher : Universitas Budi Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30865/mib.v8i3.7886

Floods are one of the natural disasters that frequently occur in Indonesia. The city of Samarinda is affected by floods every year, resulting in significant losses. The data used in this study comes from the Regional Disaster Management Agency (BPBD) and the Meteorology, Climatology, and Geophysics Agency (BMKG) for the years 2021-2023 in Samarinda. This data includes 11 attributes and 1095 records. Previous studies on data mining related to floods have been conducted. However, issues arise with high-dimensional data and data imbalance. High dimensionality leads to overfitting and reduced accuracy, while imbalanced data causes overfitting to the majority class and inaccurate representation. This study aims to improve the accuracy of the Naive Bayes algorithm in predicting high-dimensional and imbalanced flood data. The approach involves using the Chi-Square feature selection technique and oversampling with the Synthetic Minority Over-sampling Technique (SMOTE). Chi-Square is used to find optimal features for predicting floods and to enhance the accuracy of the Naive Bayes algorithm in predicting high-dimensional and imbalanced flood data. The validation method used is 10-fold cross-validation, and a confusion matrix model is employed to calculate accuracy values. The results of the study show that Chi-Square can identify four best features: average humidity (rh_avg), rainfall (rr), maximum wind direction (ddd_x), and most frequent wind direction (ddd_car). The use of the Naive Bayes algorithm with SMOTE achieved an accuracy of 71.58%. However, after applying Chi-Square feature selection, the accuracy dropped to 60.82%. This decline is attributed to the reduced number of minority classes after feature selection. Therefore, Chi-Square feature selection is not sufficiently effective in improving the accuracy of Naive Bayes on high-dimensional data.

Optimasi Random Forest dengan Genetic Algorithm dan Recursive Feature Elimination pada High Dimensional Data Stunting Samarinda Satria, Bima; Siswa, Taghfirul Azhima Yoga; Pranoto, Wawan Joko
JURNAL MEDIA INFORMATIKA BUDIDARMA Vol 8, No 3 (2024): Juli 2024
Publisher : Universitas Budi Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30865/mib.v8i3.7883

Stunting is a chronic malnutrition problem that disrupts children's growth, with long-term impacts on physical growth, cognitive development, and productivity in adulthood. In Indonesia, the prevalence of stunting is still above the WHO threshold, reaching 24.4% according to the 2021 Indonesian Nutritional Status Study (SSGI), and in Samarinda City, the prevalence reached 24.7% in 2021 with 1,402 toddlers identified as stunted. Addressing this problem requires a more structured data-driven approach to provide targeted interventions. This study uses data from the Samarinda City Health Office, encompassing 150,474 stunting data points, and involves data collection, data cleaning, feature selection, and classification model application. This study aims to improve the accuracy of stunting data classification in Samarinda City in 2023 using the Random Forest algorithm enhanced with Recursive Feature Elimination (RFE) feature selection techniques and Genetic Algorithm (GA) optimization. The feature selection results using RFE show that the most influential features are Weight, ZS TB/U, ZS BB/U, and BB/U. The application of RFE increased the model's average accuracy from 91.91% to 93.64%, while GA optimization further increased the average accuracy to 98.39%. The definite accuracy increased from 94.23% (baseline model) to 97.10% (with RFE) and reached 99.70% (with RFE and GA). The combination of RFE and GA has proven effective in tackling data complexity and improving the reliability of stunting predictions. This study significantly contributes to the development of machine learning techniques for high-dimensional data analysis in health and is expected to be the foundation for more effective intervention programs in addressing stunting issues in Indonesia.

Perbaikan Akurasi NaÃ¯ve Bayes dengan Chi-Square dan SMOTE Dalam Mengatasi High Dimensional dan Imbalanced Data Banjir Rivaldo, Vito Junivan; Siswa, Taghfirul Azhima Yoga; Pranoto, Wawan Joko
JURNAL MEDIA INFORMATIKA BUDIDARMA Vol 8, No 3 (2024): Juli 2024
Publisher : Universitas Budi Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30865/mib.v8i3.7886

Floods are one of the natural disasters that frequently occur in Indonesia. The city of Samarinda is affected by floods every year, resulting in significant losses. The data used in this study comes from the Regional Disaster Management Agency (BPBD) and the Meteorology, Climatology, and Geophysics Agency (BMKG) for the years 2021-2023 in Samarinda. This data includes 11 attributes and 1095 records. Previous studies on data mining related to floods have been conducted. However, issues arise with high-dimensional data and data imbalance. High dimensionality leads to overfitting and reduced accuracy, while imbalanced data causes overfitting to the majority class and inaccurate representation. This study aims to improve the accuracy of the Naive Bayes algorithm in predicting high-dimensional and imbalanced flood data. The approach involves using the Chi-Square feature selection technique and oversampling with the Synthetic Minority Over-sampling Technique (SMOTE). Chi-Square is used to find optimal features for predicting floods and to enhance the accuracy of the Naive Bayes algorithm in predicting high-dimensional and imbalanced flood data. The validation method used is 10-fold cross-validation, and a confusion matrix model is employed to calculate accuracy values. The results of the study show that Chi-Square can identify four best features: average humidity (rh_avg), rainfall (rr), maximum wind direction (ddd_x), and most frequent wind direction (ddd_car). The use of the Naive Bayes algorithm with SMOTE achieved an accuracy of 71.58%. However, after applying Chi-Square feature selection, the accuracy dropped to 60.82%. This decline is attributed to the reduced number of minority classes after feature selection. Therefore, Chi-Square feature selection is not sufficiently effective in improving the accuracy of Naive Bayes on high-dimensional data.

Optimasi Random Forest dengan Genetic Algorithm dan Recursive Feature Elimination pada High Dimensional Data Stunting Samarinda Satria, Bima; Siswa, Taghfirul Azhima Yoga; Pranoto, Wawan Joko
JURNAL MEDIA INFORMATIKA BUDIDARMA Vol 8, No 3 (2024): Juli 2024
Publisher : Universitas Budi Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30865/mib.v8i3.7883

Stunting is a chronic malnutrition problem that disrupts children's growth, with long-term impacts on physical growth, cognitive development, and productivity in adulthood. In Indonesia, the prevalence of stunting is still above the WHO threshold, reaching 24.4% according to the 2021 Indonesian Nutritional Status Study (SSGI), and in Samarinda City, the prevalence reached 24.7% in 2021 with 1,402 toddlers identified as stunted. Addressing this problem requires a more structured data-driven approach to provide targeted interventions. This study uses data from the Samarinda City Health Office, encompassing 150,474 stunting data points, and involves data collection, data cleaning, feature selection, and classification model application. This study aims to improve the accuracy of stunting data classification in Samarinda City in 2023 using the Random Forest algorithm enhanced with Recursive Feature Elimination (RFE) feature selection techniques and Genetic Algorithm (GA) optimization. The feature selection results using RFE show that the most influential features are Weight, ZS TB/U, ZS BB/U, and BB/U. The application of RFE increased the model's average accuracy from 91.91% to 93.64%, while GA optimization further increased the average accuracy to 98.39%. The definite accuracy increased from 94.23% (baseline model) to 97.10% (with RFE) and reached 99.70% (with RFE and GA). The combination of RFE and GA has proven effective in tackling data complexity and improving the reliability of stunting predictions. This study significantly contributes to the development of machine learning techniques for high-dimensional data analysis in health and is expected to be the foundation for more effective intervention programs in addressing stunting issues in Indonesia.

Title

Found 4 Documents
Search
Journal : JURNAL MEDIA INFORMATIKA BUDIDARMA

Abstract

Abstract

Abstract

Abstract

Title Search

Found 4 Documents Search Journal : JURNAL MEDIA INFORMATIKA BUDIDARMA

Abstract

Abstract

Abstract

Abstract

Title

Found 4 Documents
Search
Journal : JURNAL MEDIA INFORMATIKA BUDIDARMA