Claim Missing Document
Check
Articles

Found 8 Documents
Search

Comparative Study of Various Hyperparameter Tuning on Random Forest Classification With SMOTE and Feature Selection Using Genetic Algorithm in Software Defect Prediction Suryadi, Mulia Kevin; Herteno, Rudy; Saputro, Setyo Wahyu; Faisal, Mohammad Reza; Nugroho, Radityo Adi
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 2 (2024): April
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i2.375

Abstract

Software defect prediction is necessary for desktop and mobile applications. Random Forest defect prediction performance can be significantly increased with the parameter optimization process compared to the default parameter. However, the parameter tuning step is commonly neglected. Random Forest has numerous parameters that can be tuned, as a result manually adjusting parameters would diminish the efficiency of Random Forest, yield suboptimal results and it will take a lot of time. This research aims to improve the performance of Random Forest classification by using SMOTE to balance the data, Genetic Algorithm as selection feature, and using hyperparameter tuning to optimize the performance. Apart from that, it is also to find out which hyperparameter tuning method produces the best improvement on the Random Forest classification method. The dataset used in this study is NASA MDP which included 13 datasets. The method used contains SMOTE to handle imbalance data, Genetic Algorithm feature selection, Random Forest classification, and hyperparameter tuning methods including Grid Search, Random Search, Optuna, Bayesian (with Hyperopt), Hyperband, TPE and Nevergrad. The results of this research were carried out by evaluating performance using accuracy and AUC values. In terms of accuracy improvement, the three best methods are Nevergrad, TPE, and Hyperband. In terms of AUC improvement, the three best methods are Hyperband, Optuna, and Random Search. Nevergrad on average improves accuracy by about 3.9% and Hyperband on average improves AUC by about 3.51%. This study indicates that the use of hyperparameter tuning improves Random Forest performance and among all the hyperparameter tuning methods used, Hyperband has the best hyperparameter tuning performance with the highest average increase in both accuracy and AUC. The implication of this research is to increase the use of hyperparameter tuning in software defect prediction and improve software defect prediction performance.
Optimizing Software Defect Prediction Models: Integrating Hybrid Grey Wolf and Particle Swarm Optimization for Enhanced Feature Selection with Popular Gradient Boosting Algorithm Angga Maulana Akbar; Herteno, Rudy; Saputro, Setyo Wahyu; Faisal, Mohammad Reza; Nugroho, Radityo Adi
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 2 (2024): April
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i2.388

Abstract

Software defects, also referred to as software bugs, are anomalies or flaws in computer program that cause software to behave unexpectedly or produce incorrect results. These defects can manifest in various forms, including coding errors, design flaws, and logic mistakes, this defect have the potential to emerge at any stage of the software development lifecycle. Traditional prediction models usually have lower prediction performance. To address this issue, this paper proposes a novel prediction model using Hybrid Grey Wolf Optimizer and Particle Swarm Optimization (HGWOPSO). This research aims to determine whether the Hybrid Grey Wolf and Particle Swarm Optimization model could potentially improve the effectiveness of software defect prediction compared to base PSO and GWO algorithms without hybridization. Furthermore, this study aims to determine the effectiveness of different Gradient Boosting Algorithm classification algorithms when combined with HGWOPSO feature selection in predicting software defects. The study utilizes 13 NASA MDP dataset. These dataset are divided into testing and training data using 10-fold cross-validation. After data is divided, SMOTE technique is employed in training data. This technique generates synthetic samples to balance the dataset, ensuring better performance of the predictive model. Subsequently feature selection is conducted using HGWOPSO Algorithm. Each subset of the NASA MDP dataset will be processed by three boosting classification algorithms namely XGBoost, LightGBM, and CatBoost. Performance evaluation is based on the Area under the ROC Curve (AUC) value. Average AUC values yielded by HGWOPSO XGBoost, HGWOPSO LightGBM, and HGWOPSO CatBoost are 0.891, 0.881, and 0.894, respectively. Results of this study indicated that utilizing the HGWOPSO algorithm improved AUC performance compared to the base GWO and PSO algorithms. Specifically, HGWOPSO CatBoost achieved the highest AUC of 0.894. This represents a 6.5% increase in AUC with a significance value of 0.00552 compared to PSO CatBoost, and a 6.3% AUC increase with a significance value of 0.00148 compared to GWO CatBoost. This study demonstrated that HGWOPSO significantly improves the performance of software defect prediction. The implication of this research is to enhance software defect prediction models by incorporating hybrid optimization techniques and combining them with gradient boosting algorithms, which can potentially identify and address defects more accurately
Effect of SMOTE Variants on Software Defect Prediction Classification Based on Boosting Algorithm Aflaha, Rahmina Ulfah; Herteno, Rudy; Faisal, Mohammad Reza; Abadi, Friska; Saputro, Setyo Wahyu
Jurnal Ilmiah Teknik Elektro Komputer dan Informatika Vol. 10 No. 2 (2024): June
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26555/jiteki.v10i2.28521

Abstract

Detecting software defects early on is critical for avoiding significant financial losses. However, building accurate software defect prediction models can be challenging due to class imbalance, where the data for defective modules is much less than for standard modules. This research addresses this issue using the imbalanced dataset NASA MDP. To address this issue, researchers have proposed new methods that combine data level balancing approaches with 14 variations of the SMOTE algorithm to increase the amount of defective module data. An algorithm-level approach with three boosting algorithms, Catboost, LightGBM, and Gradient Boosting, is applied to classify modules as defective or non-defective. These methods aim to improve the accuracy of software defect prediction. The results show that this new method can produce a more accurate classification than previous studies. The DSMOTE and Gradient Boosting pair with 0.9161 has the highest average accuracy (0.9161). The DSMOTE and Catboost model achieved the highest average AUC value (0.9637). The ADASYN kernel and Catboost showed the best ability to perform the average G-mean value (0.9154). The research contribution to software defect prediction involves developing new techniques and evaluating their effectiveness in addressing class imbalance.
Implementation of Ant Colony Optimization in Obesity Level Classification Using Random Forest Wardana, Muhammad Difha; Budiman, Irwan; Indriani, Fatma; Nugrahadi, Dodon Turianto; Saputro, Setyo Wahyu; Rozaq, Hasri Akbar Awal; Yıldız, Oktay
Jurnal Teknik Informatika (Jutif) Vol. 6 No. 5 (2025): JUTIF Volume 6, Number 5, Oktober 2025
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2025.6.5.4696

Abstract

Obesity is a pressing global health issue characterized by excessive body fat accumulation and associated risks of chronic diseases. This study investigates the integration of Ant Colony Optimization (ACO) for feature selection in obesity-level classification using Random Forests. Results demonstrate that feature selection significantly improves classification accuracy, rising from 94.49% to 96.17% when using ten features selected by ACO. Despite limitations, such as challenges in tuning parameters like alpha (α), beta (β), and evaporation rate in ACO techniques, the study provides valuable insights into developing a more efficient obesity classification system. The proposed approach outperforms other algorithms, including KNN (78.98%), CNN (82.00%), Decision Tree (94.00%), and MLP (95.06%), emphasizing the importance of feature selection methods like ACO in enhancing model performance. This research addresses a critical gap in intelligent healthcare systems by providing the first comprehensive study of ACO-based feature selection specifically for obesity classification, contributing significantly to medical informatics and computer science. The findings have immediate practical implications for developing automated diagnostic tools that can assist healthcare professionals in early obesity detection and intervention, potentially reducing healthcare costs through improved diagnostic efficiency and supporting digital health transformation in clinical settings. Furthermore, the study highlights the broader applicability of ACO in various classification tasks, suggesting that similar techniques could be used to address other complex health issues, ultimately improving diagnostic accuracy and patient outcomes.
Accurate Skin Tone Classification for Foundation Shade Matching using GLCM Features-K-Nearest Neighbor Algorithm Syahputra, Muhammad Reza; Mazdadi, Muhammad Itqan; Budiman, Irwan; Farmadi, Andi; Saputro, Setyo Wahyu; Rozaq, Hasri Akbar Awal; Sutaji, Deni
Jurnal Teknik Informatika (Jutif) Vol. 6 No. 5 (2025): JUTIF Volume 6, Number 5, Oktober 2025
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2025.6.5.4723

Abstract

Foundation shade matching remains a significant challenge in the beauty industry, particularly in Indonesia where consumers exhibit three distinct skin tone categories: ivory white, amber yellow, and tan. Manual foundation selection often results in mismatched shades, leading to customer dissatisfaction. This study presents a novel automated skin tone classification system combining Gray Level Co-Occurrence Matrix (GLCM) feature extraction with the K-Nearest Neighbor (KNN) algorithm. The GLCM method extracts four key texture features (contrast, homogeneity, energy, and entropy) from facial images, while KNN performs classification. A comprehensive dataset of 963 facial images was used, with 770 training and 193 test samples collected under controlled lighting conditions. After testing K values from 1 to 15, the optimal K=1 achieved 75.65% accuracy. Compared to baseline color histogram methods (60% accuracy), our GLCM-KNN approach demonstrates 15.65% improvement in classification performance. This research contributes to computer vision applications in beauty technology, enabling the development of mobile applications for virtual foundation try-on and personalized product recommendations. The findings have significant implications for the cosmetics industry, particularly for automated cosmetic shade matching systems and enhanced customer experience in online beauty retail. Further research is recommended to explore deep learning approaches and expand dataset diversity to improve accuracy.
Dimensionality Reduction Using Principal Component Analysis and Feature Selection Using Genetic Algorithm with Support Vector Machine for Microarray Data Classification Kartini, Dwi; Badali, Rahmat Amin; Muliadi, Muliadi; Nugrahadi, Dodon Turianto; Indriani, Fatma; Saputro, Setyo Wahyu
Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol. 7 No. 1 (2025): February
Publisher : Jurusan Teknik Elektromedik, Politeknik Kesehatan Kemenkes Surabaya, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/mr7x9713

Abstract

DNA microarray is used to analyze gene expression on a large scale simultaneously and plays a critical role in cancer detection. The creation of a DNA microarray starts with RNA isolation from the sample, which is then converted into cDNA and scanned to generate gene expression data. However, the data generated through this process is highly dimensional, which can affect the performance of predictive models for cancer detection. Therefore, dimensionality reduction is required to reduce data complexity. This study aims to analyze the impact of applying Principal Component Analysis (PCA) for dimensionality reduction, Genetic Algorithm (GA) for feature selection, and their combination on microarray data classification using Support Vector Machine (SVM). The datasets used are microarray datasets, including breast cancer, ovarian cancer, and leukemia. The research methodology involves preprocessing, PCA for dimensionality reduction, GA for feature selection, data splitting, SVM classification, and evaluation. Based on the results, the application of PCA dimensionality reduction combined with GA feature selection and SVM classification achieved the best performance compared to other classifications. For the breast cancer dataset, the highest accuracy was 73.33%, recall 0.74, precision 0.75, and F1 score 0.73. For the ovarian cancer dataset, the highest accuracy was 98.68%, recall 0.98, precision 0.99, and F1 score 0.99. For the leukemia dataset, the highest accuracy was 95.45%, recall 0.94, precision 0.97, and F1 score 0.95. It can be concluded that combining PCA for dimensionality reduction with GA for feature selection in microarray classification can simplify the data and improve the accuracy of the SVM classification model. The implications of this study emphasize the effectiveness of applying PCA and GA methods in enhancing the classification performance of microarray data.
Hybrid Feature Selection and Balancing Data Approach for Improved Software Defect Prediction Febrian, Muhamad Michael; Saputro, Setyo Wahyu; Saragih, Triando Hamonangan; Abadi, Friska; Herteno, Rudy
Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol. 7 No. 2 (2025): May
Publisher : Jurusan Teknik Elektromedik, Politeknik Kesehatan Kemenkes Surabaya, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/ijeeemi.v7i2.67

Abstract

Software Defect Prediction (SDP) plays a vital role in identifying defects within software modules. Accurate early detection of software defects can reduce development costs and enhance software reliability. However, SDP remains a significant challenge in the software development lifecycle. This study employs Particle Swarm Optimization (PSO) and addresses several challenges associated with its application, including noisy attributes, high-dimensional data, and imbalanced class distribution. To address these challenges, this study proposed a hybrid filter-based feature selection and class balancing method. The feature selection process incorporates Chi-Square (CS), Correlation-Based Feature Selection (CFS), and Correlation Matrix-Based Feature Selection (CMFS), which have been proven effective in reducing noisy and redundant attributes. Additionally, the Synthetic Minority Over-sampling Technique (SMOTE) is applied to mitigate class imbalance in the dataset. The K-Nearest Neighbors (KNN) algorithm is employed as the classification model due to its simplicity, non-parametric nature, and suitability for handling the feature subsets produced. Performance evaluation is conducted using the Area Under Curve (AUC) metric with a significance threshold of 0.05 to assess classification capability.  The proposed method achieved an AUC of 0.872, demonstrating its effectiveness in enhancing predictive performance. The proposed method was also superior to other combinations such as PSO SMOTE (0.0043), PSO SMOTE CS (0.0091), PSO SMOTE CFS (0.0111), and PSO SMOTE CFS CMFS (0.0007). The findings of this study show that the proposed method significantly enhances the efficiency and accuracy of PSO in software defect prediction tasks. This hybrid strategy demonstrates strong potential as a robust solution for future research and application in predictive software quality assurance.
Application of Adaboost Algorithm with SMOTE and Optuna Techniques in Sleep Disorder Classification Anshory, Muhammad Naufal; Mazdadi, Muhammad Itqan; Saragih, Triando Hamonangan; Budiman, Irwan; Saputro, Setyo Wahyu
Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol. 7 No. 2 (2025): May
Publisher : Jurusan Teknik Elektromedik, Politeknik Kesehatan Kemenkes Surabaya, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/ijeeemi.v7i2.99

Abstract

Data imbalance is a serious challenge in developing machine learning models for sleep disorder classification. When models are trained on an uneven distribution of classes, classification performance for minority classes such as insomnia and sleep apnea is often low. As a result, the overall accuracy may seem elevated, yet the sensitivity to important cases to be weak. Therefore, this research aims to design and develop a robust sleep disorder classification model with the AdaBoost algorithm, with improved performance through the integration of two main approaches, namely data balancing technique utilizing SMOTE and hyperparameter optimization using Optuna. This research contributes by showing that the combination of the two approaches can significantly improve model performance, not only in terms of global accuracy, but also accuracy on previously overlooked minority classes. The dataset utilized is the Sleep Health and Lifestyle Dataset which consists of 374 synthesized data and is divided into three categories: insomnia, sleep apnea, and none. This method stages include data preprocessing, data division using train-test split (80:20), application of SMOTE to balance the class distribution, hyperparameter tuning using Optuna, and model training with the AdaBoost algorithm. Evaluation was performed using classification metrics: accuracy, precision, recall, and F1-score. Results showed that mix of SMOTE and Optuna yielded the best results, accuracy 90.6%, F1-score 0.83871 for insomnia, and 0.81250 for sleep apnea. This performance was consistently superior to scenarios with no SMOTE or no tuning. This confirms the importance of using combination strategies to obtain fair and accurate classification on medical data. Future research is recommended to use real datasets as well as test the capabilities of this research on other models such as XGBoost or LightGBM.