Garuda - Garba Rujukan Digital

Perbaikan Akurasi Naïve Bayes dengan Chi-Square dan SMOTE Dalam Mengatasi High Dimensional dan Imbalanced Data Banjir Rivaldo, Vito Junivan; Siswa, Taghfirul Azhima Yoga; Pranoto, Wawan Joko
JURNAL MEDIA INFORMATIKA BUDIDARMA Vol 8, No 3 (2024): Juli 2024
Publisher : Universitas Budi Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30865/mib.v8i3.7886

Floods are one of the natural disasters that frequently occur in Indonesia. The city of Samarinda is affected by floods every year, resulting in significant losses. The data used in this study comes from the Regional Disaster Management Agency (BPBD) and the Meteorology, Climatology, and Geophysics Agency (BMKG) for the years 2021-2023 in Samarinda. This data includes 11 attributes and 1095 records. Previous studies on data mining related to floods have been conducted. However, issues arise with high-dimensional data and data imbalance. High dimensionality leads to overfitting and reduced accuracy, while imbalanced data causes overfitting to the majority class and inaccurate representation. This study aims to improve the accuracy of the Naive Bayes algorithm in predicting high-dimensional and imbalanced flood data. The approach involves using the Chi-Square feature selection technique and oversampling with the Synthetic Minority Over-sampling Technique (SMOTE). Chi-Square is used to find optimal features for predicting floods and to enhance the accuracy of the Naive Bayes algorithm in predicting high-dimensional and imbalanced flood data. The validation method used is 10-fold cross-validation, and a confusion matrix model is employed to calculate accuracy values. The results of the study show that Chi-Square can identify four best features: average humidity (rh_avg), rainfall (rr), maximum wind direction (ddd_x), and most frequent wind direction (ddd_car). The use of the Naive Bayes algorithm with SMOTE achieved an accuracy of 71.58%. However, after applying Chi-Square feature selection, the accuracy dropped to 60.82%. This decline is attributed to the reduced number of minority classes after feature selection. Therefore, Chi-Square feature selection is not sufficiently effective in improving the accuracy of Naive Bayes on high-dimensional data.

Optimasi Random Forest dengan Genetic Algorithm dan Recursive Feature Elimination pada High Dimensional Data Stunting Samarinda Satria, Bima; Siswa, Taghfirul Azhima Yoga; Pranoto, Wawan Joko
JURNAL MEDIA INFORMATIKA BUDIDARMA Vol 8, No 3 (2024): Juli 2024
Publisher : Universitas Budi Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30865/mib.v8i3.7883

Stunting is a chronic malnutrition problem that disrupts children's growth, with long-term impacts on physical growth, cognitive development, and productivity in adulthood. In Indonesia, the prevalence of stunting is still above the WHO threshold, reaching 24.4% according to the 2021 Indonesian Nutritional Status Study (SSGI), and in Samarinda City, the prevalence reached 24.7% in 2021 with 1,402 toddlers identified as stunted. Addressing this problem requires a more structured data-driven approach to provide targeted interventions. This study uses data from the Samarinda City Health Office, encompassing 150,474 stunting data points, and involves data collection, data cleaning, feature selection, and classification model application. This study aims to improve the accuracy of stunting data classification in Samarinda City in 2023 using the Random Forest algorithm enhanced with Recursive Feature Elimination (RFE) feature selection techniques and Genetic Algorithm (GA) optimization. The feature selection results using RFE show that the most influential features are Weight, ZS TB/U, ZS BB/U, and BB/U. The application of RFE increased the model's average accuracy from 91.91% to 93.64%, while GA optimization further increased the average accuracy to 98.39%. The definite accuracy increased from 94.23% (baseline model) to 97.10% (with RFE) and reached 99.70% (with RFE and GA). The combination of RFE and GA has proven effective in tackling data complexity and improving the reliability of stunting predictions. This study significantly contributes to the development of machine learning techniques for high-dimensional data analysis in health and is expected to be the foundation for more effective intervention programs in addressing stunting issues in Indonesia.

IMPLEMENTASI METODE NAIVE BAYES UNTUK KLASIFIKASI KECELAKAAN LALU LINTAS DI KOTA SAMARINDA Salsabila, Cindy Azra; Yulianto, Fendy; Siswa, Taghfirul Azhima Yoga
Jurnal Informatika dan Teknik Elektro Terapan Vol 13, No 1 (2025)
Publisher : Universitas Lampung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.23960/jitet.v13i1.5890

Kecelakaan lalu lintas merupakan permasalahan serius di Kota Samarinda yang dipengaruhi oleh berbagai faktor seperti kondisi cahaya, cuaca, kelas jalan, tipe jalan, kondisi permukaan jalan, kemiringan jalan, batas kecepatan di lokasi, dan status jalan berkontribusi terhadap tingkat kecelakaan lalu lintas. Dalam mengatasi permasalahan penentuan kecelakaan lalu lintas dapat menggunakan konsep klasifikasi dengan metode Naive Bayes. Data yang digunakan akan dibagi menjadi dua bagian dengan rasio 80:20 untuk pelatihan dan pengujian, serta divalidasi menggunakan K-Fold Cross Validation dengan K=12, kemudian didapatkan hasil akurasi sebesar 84%. Hasil ini menunjukkan bahwa metode Naive Bayes dapat digunakan untuk melakukan penentuan jenis kecelakaan lalu lintas yang ada di Kota Samarinda.

ANALISIS SENTIMEN APLIKASI MYSILOAM MENGGUNAKAN METODE NAÏVE BAYES lia, Alvina; Rahim, Abdul; Yoga Siswa, Taghfirul Azhima
Jurnal Informatika dan Teknik Elektro Terapan Vol 13, No 1 (2025)
Publisher : Universitas Lampung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.23960/jitet.v13i1.5997

Aplikasi Mysiloam yang dikembangkan oleh Siloam Hospitals merupakan platform yang menyediakan berbagai layanan kesehatan, aplikasi ini dirancang untuk memudahkan pasien dalam mengakses berbagai layanan kesehatan secara efisien dan praktis, maka dari itu penting untuk memahami persepsi pengguna melalui analisis sentimen. Penelitian ini bertujuan untuk menganalisis sentimen pengguna terhadap aplikasi Mysiloam dengan menggunakan metode Naive Bayes. Data yang digunakan dalam penelitian ini terdiri dari ulasan pengguna yang diambil dari lama Google Play Store pada aplikasi Mysiloam sebanyak 1995 ulasan melalui tahapan Scrapping. Proses analisis dimulai dengan tahap Processing data, termasuk pembersihan teks, penghapusan stop words, dan tokenize untuk mempersiapkan data sebelum dilakukan analisis. Setelah data diproses, model dilatih menggunakan teknik TF-IDF dan Confusion Matriks untuk menguji ketepatan analisis. Hasil penelitian menunjukkan bahwa model Naive Bayes berhasil mencapai akurasi sebesar 86%, yang menunjukkan efektivitas metode ini dalam menganalisis sentimen positif dan negatif dari ulasan pengguna. Dari analisis yang dilakukan, ditemukan bahwa mayoritas pengguna memberikan ulasan positif mengenai fitur dan kemudahan penggunaan aplikasi, meskipun terdapat beberapa kritik terkait performa aplikasi.

Penerapan Metode PSO-SMOTE Pada Algoritma Random Forest Untuk Mengatasi Class Imbalance Data Bencana Tanah Longsor Ariyadi, Dedy; Siswa, Taghfirul Azhima Yoga; Rudiman, R
Kesatria : Jurnal Penerapan Sistem Informasi (Komputer dan Manajemen) Vol 6, No 1 (2025): Edisi Januari
Publisher : LPPM STIKOM Tunas Bangsa

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30645/kesatria.v6i1.574

Landslides are natural disasters that frequently occur in Samarinda City, with 45-80 affected areas in 2022-2023. The use of machine learning to classify landslide data faces the challenge of data imbalance, which can lead to bias towards the majority class. This study aims to address this issue by implementing the Random Forest algorithm combined with the Synthetic Minority Oversampling Technique (SMOTE) and optimization using Particle Swarm Optimization (PSO). The data used comes from BMKG and BPBD Samarinda City, consisting of 11 features and 730 records. The results show that SMOTE successfully balanced the data, improving accuracy from 89.91% to 94.76%, an increase of 4.85%.

Optimasi Algoritma KNN dengan Parameter K dan PSO Untuk Klasifikasi Status Gizi Balita Rochman, Bagus Fathur; Rahim, Abdul; Siswa, Taghfirul Azhima Yoga
JURNAL MEDIA INFORMATIKA BUDIDARMA Vol 8, No 3 (2024): Juli 2024
Publisher : Universitas Budi Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30865/mib.v8i3.7841

The toddler years are a crucial phase that requires constant nutritional monitoring, because rapid growth and development require optimal nutritional intake. Nutritional problems in toddlers can hinder physical growth and can even be fatal. In assessing the nutritional status of toddlers, it is important to use efficient methods. One approach that can be used is machine learning, which can help determine the nutritional status of toddlers. K-Nearest Neighbors (KNN) is an algorithm commonly used in object classification based on nearest neighbors. Even though it is simple, determining the correct K value is very important because it can significantly influence KNN performance. This research emphasizes the importance of choosing the right parameters to increase the accuracy of the KNN model in classifying the nutritional status of toddlers. The test results show that the optimal combination for KNN is at K=4, using the 'distance' weight and distance metric p=1, producing the highest accuracy of 91.15% on the test data. Furthermore, the research applied Particle Swarm Optimization (PSO) to optimize KNN parameters, and it was found that the optimal combination was with K=6, 'distance' weight, and distance metric p=1, achieving a mean accuracy of 93.44% and a test accuracy of 93.98%. PSO is proven to be effective in finding the best parameters that increase model generalization to test data. Test results with a training data ratio of 80% and testing 20% show the best accuracy of 93.98%. .The use of PSO for parameter optimization succeeded in increasing model accuracy by 3.10% compared to the model without optimization

Perbaikan Akurasi NaÃ¯ve Bayes dengan Chi-Square dan SMOTE Dalam Mengatasi High Dimensional dan Imbalanced Data Banjir Rivaldo, Vito Junivan; Siswa, Taghfirul Azhima Yoga; Pranoto, Wawan Joko
JURNAL MEDIA INFORMATIKA BUDIDARMA Vol 8, No 3 (2024): Juli 2024
Publisher : Universitas Budi Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30865/mib.v8i3.7886

Floods are one of the natural disasters that frequently occur in Indonesia. The city of Samarinda is affected by floods every year, resulting in significant losses. The data used in this study comes from the Regional Disaster Management Agency (BPBD) and the Meteorology, Climatology, and Geophysics Agency (BMKG) for the years 2021-2023 in Samarinda. This data includes 11 attributes and 1095 records. Previous studies on data mining related to floods have been conducted. However, issues arise with high-dimensional data and data imbalance. High dimensionality leads to overfitting and reduced accuracy, while imbalanced data causes overfitting to the majority class and inaccurate representation. This study aims to improve the accuracy of the Naive Bayes algorithm in predicting high-dimensional and imbalanced flood data. The approach involves using the Chi-Square feature selection technique and oversampling with the Synthetic Minority Over-sampling Technique (SMOTE). Chi-Square is used to find optimal features for predicting floods and to enhance the accuracy of the Naive Bayes algorithm in predicting high-dimensional and imbalanced flood data. The validation method used is 10-fold cross-validation, and a confusion matrix model is employed to calculate accuracy values. The results of the study show that Chi-Square can identify four best features: average humidity (rh_avg), rainfall (rr), maximum wind direction (ddd_x), and most frequent wind direction (ddd_car). The use of the Naive Bayes algorithm with SMOTE achieved an accuracy of 71.58%. However, after applying Chi-Square feature selection, the accuracy dropped to 60.82%. This decline is attributed to the reduced number of minority classes after feature selection. Therefore, Chi-Square feature selection is not sufficiently effective in improving the accuracy of Naive Bayes on high-dimensional data.

Optimasi Random Forest dengan Genetic Algorithm dan Recursive Feature Elimination pada High Dimensional Data Stunting Samarinda Satria, Bima; Siswa, Taghfirul Azhima Yoga; Pranoto, Wawan Joko
JURNAL MEDIA INFORMATIKA BUDIDARMA Vol 8, No 3 (2024): Juli 2024
Publisher : Universitas Budi Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30865/mib.v8i3.7883

Stunting is a chronic malnutrition problem that disrupts children's growth, with long-term impacts on physical growth, cognitive development, and productivity in adulthood. In Indonesia, the prevalence of stunting is still above the WHO threshold, reaching 24.4% according to the 2021 Indonesian Nutritional Status Study (SSGI), and in Samarinda City, the prevalence reached 24.7% in 2021 with 1,402 toddlers identified as stunted. Addressing this problem requires a more structured data-driven approach to provide targeted interventions. This study uses data from the Samarinda City Health Office, encompassing 150,474 stunting data points, and involves data collection, data cleaning, feature selection, and classification model application. This study aims to improve the accuracy of stunting data classification in Samarinda City in 2023 using the Random Forest algorithm enhanced with Recursive Feature Elimination (RFE) feature selection techniques and Genetic Algorithm (GA) optimization. The feature selection results using RFE show that the most influential features are Weight, ZS TB/U, ZS BB/U, and BB/U. The application of RFE increased the model's average accuracy from 91.91% to 93.64%, while GA optimization further increased the average accuracy to 98.39%. The definite accuracy increased from 94.23% (baseline model) to 97.10% (with RFE) and reached 99.70% (with RFE and GA). The combination of RFE and GA has proven effective in tackling data complexity and improving the reliability of stunting predictions. This study significantly contributes to the development of machine learning techniques for high-dimensional data analysis in health and is expected to be the foundation for more effective intervention programs in addressing stunting issues in Indonesia.

Penerapan Metode GA-TL Pada Algoritma Naive Bayes Untuk Mengatasi Class Imbalance Data Beasiswa KIP-Kuliah Widyastuti, Dessy; Siswa, Taghfirul Azhima Yoga; Rudiman, Rudiman
Building of Informatics, Technology and Science (BITS) Vol 6 No 4 (2025): March 2025
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i4.6737

The Indonesia Smart Card (KIP) Scholarship Program aims to support students from underprivileged families in pursuing higher education, yet the distribution of recipient data often experiences class imbalance, leading to inaccuracies in scholarship allocation. This imbalance, characterized by disproportionate data between recipient and non-recipient groups, affects classification model performance, causing models to favor the majority class and overlook the minority class, potentially excluding eligible recipients. To address this issue, this study combines the Genetic Algorithm for feature selection and optimization with Tomek Links-Random Undersampling for data balancing. The research process includes data preprocessing, 10-fold cross-validation, and performance evaluation using a confusion matrix. Results indicate that without Tomek Links-Random Undersampling, Naïve Bayes accuracy increased from 65.2% to 66.0% after feature selection and optimization using the Genetic Algorithm, while applying Tomek Links-Random Undersampling improved accuracy from 56% to 63%. This method also enhanced fairness in recipient classification, promoting a more equitable distribution of benefits. The improved model accuracy significantly aids future scholarship selection processes, demonstrating that integrating efficient machine learning approaches optimizes the KIP Scholarship Program by ensuring beneficiaries are appropriately targeted based on predetermined criteria.

Penerapan Metode GA-CBU Pada Algoritma Logistic Regression Untuk Mengatasi Class Imbalance Data Beasiswa KIP-Kuliah Poernamawan, Ahmad Nugraha; Siswa, Taghfirul Yoga Azhima; Rudiman, Rudiman
Building of Informatics, Technology and Science (BITS) Vol 6 No 4 (2025): March 2025
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i4.6747

The issue of class imbalance often poses a challenge in data analysis, where the number of instances in the majority class is significantly higher than that in the minority class. This can lead classification models to be biased towards predicting the majority class, resulting in low accuracy in identifying the minority class. This research aims to implement the Logistic Regression (LR) algorithm combined with the Clustering Based Undersampling (CBU) method as an undersampling technique, feature selection, and optimization using Genetic Algorithm (GA) in classifying KIP-College scholarship data at Muhammadiyah University of East Kalimantan. In addition, this research also evaluates the performance of the model with 10-Fold Cross Validation and Confusion Matrix techniques as accuracy metrics and aims to overcome the problem of class imbalance in the data of scholarship recipients (KIP) at Muhammadiyah University of East Kalimantan. The data used consists of 1075 records with 37 features related to the socio-economic factors of scholarship recipients. The results from the application of the CBU method indicate an increase in the accuracy of the Logistic Regression model from 62.51% to 67.68%. Furthermore, the combination of GA and CBU has providing more stable results in classifying minority classes. It is hoped that this research can make a significant contribution to the development of a more accurate and efficient scholarship recipient selection system, as well as serve as a reference for future studies in the fields of data mining and machine learning.

Title

Found 79 Documents
Search

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Title Search

Found 79 Documents Search

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Title

Found 79 Documents
Search