Claim Missing Document
Check
Articles

Analysis of Important Features in Software Defect Prediction Using Synthetic Minority Oversampling Techniques (SMOTE), Recursive Feature Elimination (RFE) and Random Forest Ghinaya, Helma; Herteno, Rudy; Faisal, Mohammad Reza; Farmadi, Andi; Indriani, Fatma
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 3 (2024): July
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i3.453

Abstract

Software Defect Prediction (SDP) is essential for improving software quality during testing. As software systems grow more complex, accurately predicting defects becomes increasingly challenging. One of the challenges faced is dealing with imbalanced class distributions, where the number of defective instances is significantly lower than non-defective ones. To tackle the imbalanced class issue, use the SMOTE technique. Random Forest as a classification algorithm is due to its ability to handle non-linear data, its resistance to overfitting, and its ability to provide information about the importance of features in classification. This research aims to evaluate important features and measure accuracy in SDP using the SMOTE+RFE+Random Forest technique. The dataset used in this study is NASA MDP D", which included 12 data sets. The method used combines SMOTE, RFE, and random forest techniques. This study is conducted in two stages of approach. The first stage uses the RFE+Random Forest technique; the second stage involves adding the SMOTE technique before RFE and Random Forest to measure the accurate data from NASA MDP. The result of this study is that the use of the SMOTE technique enhances accuracy across most datasets, with the best performance achieved on the MC1 dataset with an accuracy of 0.9998. Feature importance analysis identifies "maintenance severity" and "cyclomatic density" as the most crucial features in data modeling for SDP. Therefore, the SMOTE+RFE+RF technique effectively improves prediction accuracy across various datasets and successfully addresses class imbalance issues.
Classification of Lung Disease in X-Ray Images Using Gray Level Co-Occurrence Matrix Method and Convolutional Neural Network Nurcahyati, Ica; Saragih, Triando Hamonangan; Farmadi, Andi; Kartini, Dwi; Muliadi, Muliadi
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 4 (2024): October
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i4.457

Abstract

The lungs are a very important part of the human body, as they serve as a place for oxygen exchange. They have a very complex task and are susceptible to damage from the polluted air we breathe every day, which can lead to various diseases. Lung disease is a very common health problem that can be found in everyone, but there are still many people who do not pay attention to their lung health, making them vulnerable to lung disease. One of the methods used to detect lung disorders is by examining images obtained from X-rays. Image processing is one of the techniques that can also be used for lung disease identification and is most commonly used in medical images. Therefore, the purpose of this research is to implement image processing to determine the accuracy of lung disease identification using deep learning algorithms and the application of feature extraction. In this research, there are two experiments conducted consisting of the application of the classification method, namely Convolutional Neural Network and Gray Level Co-Occurrence Matrix feature extraction with CNN. The results show that the CNN model gets a precision of 0.92, recall of 0.92, f1-score of 0.92, and average accuracy of 0.92. The combination of the GLCM method with CNN produces a precision of 0.87, recall of 0.87, f1-score of 0.87, and average accuracy of 0.87. The results of this study indicate that the use of CNN in the lung disease classification model based on X-ray images is superior to the GLCM-CNN method.
Baby Cry Sound Detection: A Comparison of Mel Spectrogram Image on Convolutional Neural Network Models Junaidi, Ridha Fahmi; Faisal, Mohammad Reza; Farmadi, Andi; Herteno, Rudy; Nugrahadi, Dodon Turianto; Ngo, Luu Duc; Abapihi, Bahriddin
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 4 (2024): October
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i4.465

Abstract

Baby cries contain patterns that indicate their needs, such as pain, hunger, discomfort, colic, or fatigue. This study explores the use of Convolutional Neural Network (CNN) architectures for classifying baby cries using Mel Spectrogram images. The primary objective of this research is to compare the effectiveness of various CNN architectures such as VGG-16, VGG-19, LeNet-5, AlexNet, ResNet-50, and ResNet-152 in detecting baby needs based on their cries. The datasets used include the Donate-a-Cry Corpus and Dunstan Baby Language. The results show that AlexNet achieved the best performance with an accuracy of 84.78% on the Donate-a-Cry Corpus dataset and 72.73% on the Dunstan Baby Language dataset. Other models like ResNet-50 and LeNet-5 also demonstrated good performance although their computational efficiency varied, while VGG-16 and VGG-19 exhibited lower performance. This research provides significant contributions to the understanding and application of CNN models for baby cry classification. Practical implications include the development of baby cry detection applications that can assist parents and healthcare provide.
The Impactness of SMOTE as Imbalance Class Handling for Myocardial Infarction Complication Classification using Machine Learning Approach with Data Imputation and Hyperparameter Ahmad Tajali; Saragih, Triando Hamonangan; Mazdadi, Muhammad Itqan; Budiman, Irwan; Farmadi, Andi
Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol. 6 No. 4 (2024): November
Publisher : Jurusan Teknik Elektromedik, Politeknik Kesehatan Kemenkes Surabaya, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/ijeeemi.v6i4.13

Abstract

Myocardial Infarction (MI) is a critical medical emergency characterized by the sudden blockage of blood flow to the heart muscle, often resulting from a blood clot in a coronary artery that has been narrowed by atherosclerotic plaque buildup. This condition demands immediate attention, as prolonged disruption of blood supply can cause irreversible damage to the heart muscle. Diagnosing MI typically involves a combination of methods, including a physical examination, electrocardiogram (ECG) analysis, blood tests to measure heart-specific enzymes, and imaging techniques such as coronary angiography. Early prediction of potential MI complications is crucial to prevent severe outcomes and improve patient prognosis. This study focuses on the early prediction of MI complications through the application of machine learning classification methods. We employed algorithms such as Support Vector Machine (SVM), Random Forest, and XGBoost to analyze patient medical records and accurately predict these complications. The selection of Support Vector Machine (SVM), Random Forest, and XGBoost in this study is driven by their proven effectiveness in handling complex classification problems. To manage incomplete datasets and preserve valuable information, data imputation techniques like K-Nearest Neighbors (KNN) Imputation, Iterative Imputation, and MissForest were applied.  KNN, Iterative, and MissForest imputations were chosen to handle missing data due to their effectiveness in preserving data integrity, which is crucial for accurate predictions in myocardial infarction complication studies. Additionally, Bayesian Optimization was utilized to fine-tune the hyperparameters of the models, thereby enhancing their predictive accuracy. The Iterative Imputation method yielded the best performance, particularly in SVM and XGBoost algorithms. SVM achieved 100% accuracy, precision, sensitivity, F1 score, and Area Under the Curve (AUC), while XGBoost attained 99.4% accuracy, 100% precision, 79.6% sensitivity, an F1 score of 88.7%, and an AUC of 0.898. While XGBoost and MissForest proved to be the most successful pairing, the overall effectiveness of the models suggests that Iterative Imputation and Random Forest also have potential under certain conditions.
Classification of brain tumor based on shape and texture features and machine learning Rizki, M. Alfi; Faisal, Mohammad Reza; Farmadi, Andi; Saragih, Triando Hamonangan; Nugrahadi, Dodon Turianto; Bachtiar, Adam Mukharil; Keswani, Ryan Rhiveldi
Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol. 6 No. 4 (2024): November
Publisher : Jurusan Teknik Elektromedik, Politeknik Kesehatan Kemenkes Surabaya, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/27236g49

Abstract

Information from brain tumour visualisation using MRI can be used for brain tumour classification. The information can be extracted using different feature extraction techniques. This study compares shape-based feature extraction such as Zernike Moment (ZM), and Pyramid Histogram of Oriented Gradients (PHOG) with texture-based feature extraction such as Local Binary Patterns (LBP), Gray Level Co-occurrence Matrix (GLCM), Histogram of Oriented Gradients (HOG) in brain tumour classification. This research aims to find out which feature extraction is better for handling brain tumour images through the accuracy and f1-score produced. This research proposes to combine each feature based on its approach, i.e. ZM+PHOG for shape-based feature extraction and LBP+GLCM+HOG for texture-based feature extraction with default parameters from the library and modified parameters configured based on previous research. The dataset used comes from Kaggle and has three classes: meningioma, glioma, and pituitary. The machine learning classification models used are Support Vector Machine (SVM), Random Forest (RF), Naive Bayes (NB) and K-Nearest Neighbours (KNN) with default parameters from the library. The models were evaluated using 10-fold stratified cross-validation. This research resulted in an accuracy and f1-score of 84% for texture-based feature extraction with modified parameters in RF classification. In comparison, shape-based feature extraction resulted in accuracy and f1-score of 70% and 68% with modified parameters in RF classification. From the results, it can be concluded that texture-based feature extraction is better in handling brain tumour images compared to shape-based feature extraction. This study suggests that focusing on texture details in feature extraction can significantly improve classification performance in medical imaging such as brain tumours
Applying XGBoost-ADASYN in the Classification Process of Bank Customers Who Will Take Time Deposits Abdilah, Muhammad Fariz Fata; Mazdadi, Muhammad Itqan; Farmadi, Andi; Muliadi, Muliadi; Indriani, Fatma; Rozaq, Hasri Akbar Awal; Yıldız, Oktay
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.551

Abstract

Investment in the form of time deposits at banks offers stable returns. Identifying and attracting potential customers, however, poses challenges. This research enhances the predictive capabilities of deposit classification models by addressing data imbalance with a combination of XGBoost, ADASYN, and Random Search optimization techniques. The integration of ADASYN improves minority class representation, while Random Search efficiently optimizes model parameters. Our findings show a significant accuracy of 94.93%, benchmarked against baseline models, highlighting our method's effectiveness compared to traditional approaches. This hybrid model advances customer data analysis and achieves our research objectives. We discuss the integration challenges, including computational demands and technique selection. The research underscores the application of machine learning to address financial industry issues, emphasizing the impact of data preprocessing and feature engineering on performance. Future studies might explore AutoML to reduce complexity further and enhance model scalability, promising more innovation in customer data analysis.
The Enhancing Diabetes Prediction Accuracy Using Random Forest and XGBoost with PSO and GA-Based Feature Selection Dzira Naufia Jawza; Mazdadi, Muhammad Itqan; Farmadi, Andi; Saragih, Triando Hamonangan; Kartini, Dwi; Abdullayev, Vugar
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 7 No 2 (2025): April
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v7i2.626

Abstract

Diabetes represents a global health concern classified as a non-communicable disease, impacting more than 422 million people worldwide, with the number expected to increase each year. This study aims to evaluate the performance of the Random Forest and Extreme Gradient Boosting (XGBoost) classification algorithms on the diabetes disease dataset taken from Kaggle. To improve prediction accuracy, feature selection was carried out using Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) which are expected to filter the most relevant features. The study results showed that the Random Forest model without feature selection yielded an Area Under Curve (AUC) value of 0.8120, while XGBoost achieved an AUC of 0.7666. After applying feature selection with PSO, the AUC increased to 0.8582 for Random Forest and 0.8250 for XGBoost. The use of feature selection with GA gave better results, with an AUC of 0.8612 for Random Forest and 0.8351 for XGBoost. These results indicate that the increase in accuracy after feature selection using PSO ranges from 5.7% to 7.6%, while the increase with GA ranges from 6.1% to 8.9%, with GA providing more significant results. This study contributes to improving the accuracy of diabetes disease classification, which is expected to support the diagnosis process more quickly and accurately.
Implementation of Chi-Square Feature Selection for Parkinson’s Disease Classification Using LightGBM Ahdyani, Annisa Salsabila; Budiman, Irwan; Kartini, Dwi; Farmadi, Andi; Mazdadi, Muhammad Itqan
IJCCS (Indonesian Journal of Computing and Cybernetics Systems) Vol 19, No 3 (2025): July
Publisher : IndoCEISS in colaboration with Universitas Gadjah Mada, Indonesia.

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22146/ijccs.107881

Abstract

Penyakit Parkinson merupakan penyakit yang disebabkan oleh kerusakan sel saraf otak dan termasuk penyakit yang jumlah kasusnya meningkat pesat di dunia. Salah satu cara yang dapat dilakukan untuk mencegah meningkatnya kasus penyakit Parkinson adalah dengan melakukan diagnosis melalui metode klasifikasi dengan pendekatan pembelajaran algoritmik. Penelitian ini mengimplementasikan teknik Chi-Square untuk pendekatan pemilihan fitur yang relevan dengan algoritma Light Gradient Boosting Machine (LightGBM) dalam klasifikasi penyakit Parkinson. Pemilihan fitur Chi-Square bertujuan untuk mengurangi fitur yang kurang relevan sehingga dapat meningkatkan hasil kinerja model. Selain itu, metode SMOTE diterapkan untuk menangani ketidakseimbangan data dan penyetelan hiperparameter guna menentukan kombinasi parameter yang optimal. Pengujian dilakukan terhadap sepuluh variasi jumlah fitur, dengan hasil terbaik diperoleh dengan menggunakan 200 fitur yang menghasilkan akurasi sebesar 96,05%. Dengan menggunakan metode Chi-Square, kinerja model LightGBM meningkat dibandingkan dengan kinerja tanpa pemilihan fitur. Penerapan kombinasi metode ini dapat meningkatkan kinerja model klasifikasi secara signifikan dan berpotensi untuk diterapkan dalam sistem pendukung diagnosis penyakit Parkinson.
An Empirical Study of Cross-Project and Within-Project Performance in Software Defect Prediction Models Using Tree-Based and Boosting Classifiers Raidra Zeniananto; Herteno, Rudy; Radityo Adi Nugroho; Andi Farmadi; Setyo Wahyu Saputro
Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol. 7 No. 3 (2025): August
Publisher : Jurusan Teknik Elektromedik, Politeknik Kesehatan Kemenkes Surabaya, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/ijeeemi.v7i3.95

Abstract

Software Defect Prediction (SDP) is a vital process in modern software engineering aimed at identifying faulty components in the early stages of development. In this study, we conducted a comprehensive evaluation of two widely employed SDP approaches, Within-Project Software Defect Prediction (WP-SDP) and Cross-Project Software Defect Prediction (CP-SDP), using identical preprocessing steps to ensure an objective comparison. We utilized the NASA MDP dataset, where each project was split into 70% training and 30% testing data, and applied three distinct resampling strategies—no sampling, oversampling, and undersampling—to address the challenge of class imbalance. Five classification algorithms were examined, including Support Vector Machine (SVM), Random Forest (RF), Gradient Boosting (GB), XGBoost (XGB), and LightGBM (LGBM). Performance was measured primarily using Accuracy and Area Under the Curve (AUC) metrics, resulting in 360 experimental outcomes. Our findings revealed that WP-SDP, combined with oversampling and Random Forest, demonstrated superior predictive capability on most projects, achieving an Accuracy of 89.92% and an AUC of 0.931 on PC4. Nonetheless, CP-SDP excelled in certain small-scale projects (e.g., MW1), underscoring its potential when local historical data is scarce but inter-project characteristics remain sufficiently similar. This study’s results underscore the importance of selecting a prediction scheme tailored to specific project attributes, class imbalance levels, and available historical data. By establishing a standardized methodological framework, our work contributes to a clearer understanding of the strengths and limitations of WP-SDP and CP-SDP, paving the way for more effective defect detection strategies and improved software quality.
Improving Diabetes Prediction Using Feedforward Neural Network with Adam Optimization and SMOTE Technique Wijaya Kusuma, Arizha; Mazdadi, Muhammad Itqan; Kartini, Dwi; Farmadi, Andi; Indriani, Fatma; P., Chandrasekaran
Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol. 7 No. 3 (2025): August
Publisher : Jurusan Teknik Elektromedik, Politeknik Kesehatan Kemenkes Surabaya, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/ijeeemi.v7i3.127

Abstract

Diabetes mellitus is a chronic metabolic disorder that demands early and accurate detection to prevent life-threatening complications. Traditional diagnostic procedures, such as blood glucose tests and oral glucose tolerance tests, are often invasive, time-consuming, and resource-intensive, making them less practical for widespread screening. This study aims to explore the potential of artificial intelligence, specifically Feedforward Neural Networks (FNN), in predicting diabetes based on clinical data from the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). The main contribution of this research lies in the application of the Adaptive Moment Estimation (Adam) optimization algorithm and the Synthetic Minority Oversampling Technique (SMOTE) to enhance the performance and generalization of the FNN on imbalanced medical datasets. The methodology involves preprocessing steps such as imputing zero values with feature means, normalizing input features using Min-Max scaling, and applying SMOTE to balance class distribution. Two model configurations were compared: a baseline FNN trained manually using full-batch gradient descent and a second FNN optimized using Adam. Experimental results demonstrated that the baseline model achieved an accuracy of 70.13%, precision of 56.06%, recall of 68.52%, and F1-score of 61.67%, while the Adam-optimized model achieved superior results with an average accuracy of 73.31%, precision of 60.97%, recall of 66.67%, and F1-score of 63.64% across ten independent runs. These findings indicate that combining adaptive optimization with oversampling significantly enhances the robustness and reliability of neural networks for medical classification tasks. In conclusion, the proposed method provides an effective framework for AI-assisted early diabetes detection and opens pathways for future development using deeper network architectures and explainable AI models for clinical applications.
Co-Authors Abdilah, Muhammad Fariz Fata Abdullayev, Vugar Achmad Rizal Ahdyani, Annisa Salsabila Ahmad Bahroini Ahmad Faris Asy’arie Ahmad Juhdi Ahmad Rusadi Ahmad Rusadi Arrahimi - Universitas Lambung Mangkurat) Ahmad Rusadi Arrahimi - Universitas Lambung Mangkurat) Ahmad Tajali Akhmad Yusuf Ando Hamonangan Saragih Ardiansyah Sukma Wijaya Arif, Nuuruddin Hamid Arifin Hidayat Aris Pratama Azizah, Siti Roziana Bachtiar, Adam Mukharil Bahriddin Abapihi Deni Sutaji Dita Amara Djordi Hadibaya Dodon Turianto Nugrahadi Dwi Kartini Dwi Kartini Dwi Kartini, Dwi Dzira Naufia Jawza Efendi Mohtar Erdi, Muhammad Evi Nadya Prisilla Faisal Murtadho Fathul Hadi Fatma Indriani Fayyadh, Muhammad Naufaldi Fitria Agustina fitria Friska Abadi Ghinaya, Helma Gita Malinda Heru Candra Kartika Heru Kartika Chandra I Gusti Ngurah Antaryama Irwan Budiman Irwan Budiman Jumadi Mabe Parenreng Junaidi, Ridha Fahmi Keswani, Ryan Rhiveldi Khairunnisa Khairunnisa Lisnawati M. Apriannur Miftahul Muhaemen Muhammad Alkaff Muhammad Halim Muhammad Itqan Mazdadi Muhammad Khairin Nahwan Muhammad Nadim Mubaarok Muhammad Reza Faisal, Muhammad Reza Muhammad Ridha Maulidi Muhammad Rusli Muliadi Muliadi Muliadi Muliadi Aziz muliadi muliadi Muliadi Muliadi Muliadi Muliadi Musyaffa, Muhammad Hafizh Mutiara Ayu Banjarsari Nafis Satul Khasanah Ngo, Luu Duc Noryasminda Nugraha, Muhammad Amir Nugrahadi, Dodon Nurcahyati, Ica Nurlatifah Amini P., Chandrasekaran Patrick Ringkuangan Pirjatullah Pirjatullah Pirjatullah Radityo Adi Nugroho Raidra Zeniananto Ramadhan, As`'ary Rifki Izdihar Oktvian Abas Pullah Rifki Rizki, M. Alfi Rozaq, Hasri Akbar Awal Rudy Herteno Rusdiani, Husna Salsabila Anjani Saputro, Setyo Wahyu Saragih, Triando Hamonangan Sa’diah, Halimatus Setyo Wahyu Saputro Shalehah Suci Permata Sari Syahputra, Muhammad Reza Tajali, Ahmad Ulya, Azizatul Umar Ali Ahmad Wijaya Kusuma, Arizha Winda Agustina YILDIZ, Oktay