Claim Missing Document
Check
Articles

Found 10 Documents
Search
Journal : Journal of Electronics, Electromedical Engineering, and Medical Informatics

Feature Selection Using Firefly Algorithm With Tree-Based Classification In Software Defect Prediction Maulida, Vina; Herteno, Rudy; Kartini, Dwi; Abadi, Friska; Faisal, Mohammad Reza
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 5 No 4 (2023): October
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v5i4.315

Abstract

Defects that occur in software products are a universal occurrence. Software defect prediction is usually carried out to determine the performance, accuracy, precision and performance of the prediction model or method used in research, using various kinds of datasets. Software defect prediction is one of the Software Engineering studies that is of great concern to researchers. This research was conducted to determine the performance of tree-based classification algorithms including Decision Trees, Random Forests and Deep Forests without using feature selection and using firefly feature selection. And also know the tree-based classification algorithm with firefly feature selection which can provide better software defect prediction performance. The dataset used in this study is the ReLink dataset which consists of Apache, Safe and Zxing. Then the data is divided into testing data and training data with 10-fold cross validation. Then feature selection is performed using the Firefly Algorithm. Each ReLink dataset will be processed by each tree-based classification algorithm, namely Decision Tree, Random Forest and Deep Forest according to the results of the firefly feature selection. Performance evaluation uses the AUC value (Area under the ROC Curve). Research was conducted using google collab and the average AUC value generated by Firefly-Decision Tree is 0.66, the average AUC value generated by Firefly-Random Forest is 0.77, and the average AUC value generated by Firefly-Deep Forest is 0, 76. The results of this study indicate that the approach using the Firefly algorithm with Random Forest classification can work better in predicting software damage compared to other tree-based algorithms. In previous studies, tree-based classification with hyperparameter tuning on software defect prediction datasets obtained quite good results. In another study, the classification performance of SVM, Naïve Bayes and K-nearest neighbor with firefly feature selection resulted in improved performance. Therefore, this research was conducted to determine the performance of a tree-based algorithm using the firefly selection feature.
Comparative Study of Various Hyperparameter Tuning on Random Forest Classification With SMOTE and Feature Selection Using Genetic Algorithm in Software Defect Prediction Suryadi, Mulia Kevin; Herteno, Rudy; Saputro, Setyo Wahyu; Faisal, Mohammad Reza; Nugroho, Radityo Adi
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 2 (2024): April
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i2.375

Abstract

Software defect prediction is necessary for desktop and mobile applications. Random Forest defect prediction performance can be significantly increased with the parameter optimization process compared to the default parameter. However, the parameter tuning step is commonly neglected. Random Forest has numerous parameters that can be tuned, as a result manually adjusting parameters would diminish the efficiency of Random Forest, yield suboptimal results and it will take a lot of time. This research aims to improve the performance of Random Forest classification by using SMOTE to balance the data, Genetic Algorithm as selection feature, and using hyperparameter tuning to optimize the performance. Apart from that, it is also to find out which hyperparameter tuning method produces the best improvement on the Random Forest classification method. The dataset used in this study is NASA MDP which included 13 datasets. The method used contains SMOTE to handle imbalance data, Genetic Algorithm feature selection, Random Forest classification, and hyperparameter tuning methods including Grid Search, Random Search, Optuna, Bayesian (with Hyperopt), Hyperband, TPE and Nevergrad. The results of this research were carried out by evaluating performance using accuracy and AUC values. In terms of accuracy improvement, the three best methods are Nevergrad, TPE, and Hyperband. In terms of AUC improvement, the three best methods are Hyperband, Optuna, and Random Search. Nevergrad on average improves accuracy by about 3.9% and Hyperband on average improves AUC by about 3.51%. This study indicates that the use of hyperparameter tuning improves Random Forest performance and among all the hyperparameter tuning methods used, Hyperband has the best hyperparameter tuning performance with the highest average increase in both accuracy and AUC. The implication of this research is to increase the use of hyperparameter tuning in software defect prediction and improve software defect prediction performance.
Sentiment Analysis of TikTok Shop Closure in Indonesia on Twitter Using Supervised Machine Learning Al Habesyah, Noor Zalekha; Herteno, Rudy; Indriani, Fatma; Budiman, Irwan; Kartini, Dwi
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 2 (2024): April
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i2.381

Abstract

TikTok Shop is one of the features in TikTok application which facilitates users to buy and sell products. The integration of TikTok Shop with social media has provided new opportunities to reach customers and increase sales. However, the closure of TikTok Shop has caused controversy among the public. This study aims to analyze the views and responses of TikTok users in Indonesia to the closure of TikTok Shop. The dataset used was obtained from Twitter. The research methodology consists of labeling, oversampling, splitting, and machine learning, which includes SVM, Random Forest, Decision Tree, and Deep Learning (H2O). The contribution of this research enriches our understanding of the implementation of machine learning, especially in sentiment analysis of TikTok Shop closures. From the test results, it is known that Deep Learning (H2O) + SMOTE obtained AUC 0.900, without using SMOTE, AUC 0.867. SVM + SMOTE obtained AUC 0.885, without using SMOTE AUC 0.881. Random Forest + SMOTE obtained AUC 0.822, while without using SMOTE AUC 0.830. Decision Tree + SMOTE AUC 0.59; without SMOTE, AUC 0.646. Deep Learning (H2O) with SMOTE produces better performance compared to SVM, Random Forest, and Decision Tree. With an AUC of 0.900; it can be said that Deep Learning (H2O) has excellent performance for sentiment analysis of TikTok Shop closures. This research has significant implications for social electronic commerce due to its potential utilization by social media analysts.
Comparison of CatBoost and Random Forest Methods for Lung Cancer Classification using Hyperparameter Tuning Bayesian Optimization-based Zamzam, Yra Fatria; Saragih, Triando Hamonangan; Herteno, Rudy; Muliadi; Nugrahadi, Dodon Turianto; Huynh, Phuoc-Hai
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 2 (2024): April
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i2.382

Abstract

Lung Cancer is a disease that has a high mortality rate and is often difficult to detect until it reaches a very severe stage. Data indicates that lung cancer cases are typically diagnosed late, posing significant challenges to effective treatment. Early detection efforts offer potential for better recovery chances. Therefore, this research aims to develop methods for the identification and classification of lung cancer in the hope of providing further knowledge on effective ways to detect this condition at an early stage. One approach under scrutiny involves employing machine learning classification techniques, anticipated to serve as a pivotal tool in early disease detection and enhancing patient survival rates. This study involves five stages: data collection, data preprocessing, data partitioning for training and testing using 10-fold cross validation, model training, and analysis of evaluation results. In this research, four experiments consist of applying two classification methods, CatBoost and Random Forest, each tested using default hyperparameter and hyperparameter tuning using Bayesian Optimization. It was found that the Random Forest model using hyperparameter tuning Bayesian Optimization outperformed the other models with accuracy (0.97106), precision (0.97339), recall (0.97185), f-measure (0.97011), and AUC (0.99974) for lung cancer data. These findings highlight Bayesian Optimization for hyperparameter tuning in classification models can improve clinical prediction of lung cancer from patient medical records. The integration of Bayesian Optimization in hyperparameter tuning represents a significant step forward in refining the accuracy and effectiveness of classification models, thus contributing to the ongoing enhancement of medical diagnostics and healthcare strategies.
Optimizing Software Defect Prediction Models: Integrating Hybrid Grey Wolf and Particle Swarm Optimization for Enhanced Feature Selection with Popular Gradient Boosting Algorithm Angga Maulana Akbar; Herteno, Rudy; Saputro, Setyo Wahyu; Faisal, Mohammad Reza; Nugroho, Radityo Adi
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 2 (2024): April
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i2.388

Abstract

Software defects, also referred to as software bugs, are anomalies or flaws in computer program that cause software to behave unexpectedly or produce incorrect results. These defects can manifest in various forms, including coding errors, design flaws, and logic mistakes, this defect have the potential to emerge at any stage of the software development lifecycle. Traditional prediction models usually have lower prediction performance. To address this issue, this paper proposes a novel prediction model using Hybrid Grey Wolf Optimizer and Particle Swarm Optimization (HGWOPSO). This research aims to determine whether the Hybrid Grey Wolf and Particle Swarm Optimization model could potentially improve the effectiveness of software defect prediction compared to base PSO and GWO algorithms without hybridization. Furthermore, this study aims to determine the effectiveness of different Gradient Boosting Algorithm classification algorithms when combined with HGWOPSO feature selection in predicting software defects. The study utilizes 13 NASA MDP dataset. These dataset are divided into testing and training data using 10-fold cross-validation. After data is divided, SMOTE technique is employed in training data. This technique generates synthetic samples to balance the dataset, ensuring better performance of the predictive model. Subsequently feature selection is conducted using HGWOPSO Algorithm. Each subset of the NASA MDP dataset will be processed by three boosting classification algorithms namely XGBoost, LightGBM, and CatBoost. Performance evaluation is based on the Area under the ROC Curve (AUC) value. Average AUC values yielded by HGWOPSO XGBoost, HGWOPSO LightGBM, and HGWOPSO CatBoost are 0.891, 0.881, and 0.894, respectively. Results of this study indicated that utilizing the HGWOPSO algorithm improved AUC performance compared to the base GWO and PSO algorithms. Specifically, HGWOPSO CatBoost achieved the highest AUC of 0.894. This represents a 6.5% increase in AUC with a significance value of 0.00552 compared to PSO CatBoost, and a 6.3% AUC increase with a significance value of 0.00148 compared to GWO CatBoost. This study demonstrated that HGWOPSO significantly improves the performance of software defect prediction. The implication of this research is to enhance software defect prediction models by incorporating hybrid optimization techniques and combining them with gradient boosting algorithms, which can potentially identify and address defects more accurately
A Comparative Study: Application of Principal Component Analysis and Recursive Feature Elimination in Machine Learning for Stroke Prediction Hermiati, Arya Syifa; Herteno, Rudy; Indriani, Fatma; Saragih, Triando Hamonangan; Muliadi; Triwiyanto, Triwiyanto
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 3 (2024): July
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i3.446

Abstract

Stroke is a disease that occurs in the brain and can cause both vocal and global brain dysfunction. Stroke research mainly aims to predict risk and mortality. Machine learning can be used to diagnose and predict diseases in the healthcare field, especially in stroke prediction. However, collecting medical record data to predict a disease usually makes much noise because not all variables are important and relevant to the prediction process. In this case, dimensionality reduction is essential to remove noisy (i.e., irrelevant) and redundant features. This study aims to predict stroke using Recursive Feature Elimination as feature selection, Principal Component Analysis as feature extraction, and a combination of Recursive Feature Elimination and Principal Component Analysis. The dataset used in this research is stroke prediction from Kaggle. The research methodology consists of pre-processing, SMOTE, 10-fold Cross-Validation, feature selection, feature extraction, and machine learning, which includes SVM, Random Forest, Naive Bayes, and Linear Discriminant Analysis. From the results obtained, the SVM and Random Forest get the highest accuracy value of 0.8775 and 0.9511 without using PCA and RFE, Naive Bayes gets the highest value of 0.7685 when going through PCA with selection of 20 features followed by RFE feature selection with selection of 5 features, and LDA gets the highest accuracy with 20 features from feature selection and continued feature extraction with a value of 0. 7963. It can be concluded in this study that SVM and Random Forest get the highest accuracy value without PCA and RFE techniques, while Naive Bayes and LDA show better performance using a combination of PCA and RFE techniques. The implication of this research is to know the effect of RFE and PCA on machine learning to improve stroke prediction.
Analysis of Important Features in Software Defect Prediction Using Synthetic Minority Oversampling Techniques (SMOTE), Recursive Feature Elimination (RFE) and Random Forest Ghinaya, Helma; Herteno, Rudy; Faisal, Mohammad Reza; Farmadi, Andi; Indriani, Fatma
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 3 (2024): July
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i3.453

Abstract

Software Defect Prediction (SDP) is essential for improving software quality during testing. As software systems grow more complex, accurately predicting defects becomes increasingly challenging. One of the challenges faced is dealing with imbalanced class distributions, where the number of defective instances is significantly lower than non-defective ones. To tackle the imbalanced class issue, use the SMOTE technique. Random Forest as a classification algorithm is due to its ability to handle non-linear data, its resistance to overfitting, and its ability to provide information about the importance of features in classification. This research aims to evaluate important features and measure accuracy in SDP using the SMOTE+RFE+Random Forest technique. The dataset used in this study is NASA MDP D", which included 12 data sets. The method used combines SMOTE, RFE, and random forest techniques. This study is conducted in two stages of approach. The first stage uses the RFE+Random Forest technique; the second stage involves adding the SMOTE technique before RFE and Random Forest to measure the accurate data from NASA MDP. The result of this study is that the use of the SMOTE technique enhances accuracy across most datasets, with the best performance achieved on the MC1 dataset with an accuracy of 0.9998. Feature importance analysis identifies "maintenance severity" and "cyclomatic density" as the most crucial features in data modeling for SDP. Therefore, the SMOTE+RFE+RF technique effectively improves prediction accuracy across various datasets and successfully addresses class imbalance issues.
A Comparative Analysis of Polynomial-fit-SMOTE Variations with Tree-Based Classifiers on Software Defect Prediction Nur Hidayatullah, Wildan; Herteno, Rudy; Reza Faisal, Mohammad; Adi Nugroho, Radityo; Wahyu Saputro, Setyo; Akhtar, Zarif Bin
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 3 (2024): July
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i3.455

Abstract

Software defects present a significant challenge to the reliability of software systems, often resulting in substantial economic losses. This study examines the efficacy of polynomial-fit SMOTE (pf-SMOTE) variants in combination with tree-based classifiers for software defect prediction, utilising the NASA Metrics Data Program (MDP) dataset. The research methodology involves partitioning the dataset into training and test subsets, applying pf-SMOTE oversampling, and evaluating classification performance using Decision Trees, Random Forests, and Extra Trees. Findings indicate that the combination of pf-SMOTE-star oversampling with Extra Tree classification achieves the highest average accuracy (90.91%) and AUC (95.67%) across 12 NASA MDP datasets. This demonstrates the potential of pf-SMOTE variants to enhance classification effectiveness. However, it is important to note that caution is warranted regarding potential biases introduced by synthetic data. These findings represent a significant advancement over previous research endeavors, underscoring the critical role of meticulous algorithm selection and dataset characteristics in optimizing classification outcomes. Noteworthy implications include advancements in software reliability and decision support for software project management. Future research may delve into synergies between pf-SMOTE variants and alternative classification methods, as well as explore the integration of hyperparameter tuning to further refine classification performance.
1D and 2D Feature Extraction Based on AAC and DC Protein Descriptors for Classification of Acetylation in Lysine Proteins using Convolutional Neural Network Faisal, Mohammad Reza; Adawiyah, Laila; Saragih, Triando Hamonangan; kartini, Dwi; Herteno, Rudy; Lumbanraja, Favorisen Rosyking; Handayani, Lilies; Solechah, Siti Aisyah
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 4 (2024): October
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i4.458

Abstract

Post-Translational Modification (PTM) denotes a biochemical alteration observed in an amino acid, playing crucial roles in protein activity, functionality, and the regulation of protein structure. The recognition of associated PTMs serves as a fundamental basis for understanding biological processes, therapeutic interventions for diseases, and the development of pharmaceutical agents. Using computational approaches (in silico) offers an efficient and cost-effective means to identify PTM sites swiftly. The exploration of protein classification commences with extracting protein sequence features that are subsequently transformed into numerical features for utilization in classification algorithms. Feature extraction methodologies involve using protein descriptors like Amino Acid Composition (AAC) and Dipeptide Composition (DC). Yet, these approaches exhibit a limitation by neglecting crucial amino acid sequence details. Moreover, both descriptor techniques generate a limited number of 1-dimensional (1D) features, which may not be ideal for processing through the Convolutional Neural Network (CNN) classification method. This investigation presents a novel approach to enhance feature diversity through protein sequence segmentation techniques, employing adjacent and overlapping segment strategies. Furthermore, the study illustrates the organization of features into 1D and 2D formats to facilitate processing through 1D CNN and 2D CNN classification methodologies. The findings of this research endeavour highlight the potential for enhancing the accuracy of acetylation classification in lysine proteins through the multiplication of protein sequence segments in a 2D configuration. The highest accuracy achieved for AAC and DC-based feature extraction methods is 77.39% and 76.75%, respectively.
Baby Cry Sound Detection: A Comparison of Mel Spectrogram Image on Convolutional Neural Network Models Junaidi, Ridha Fahmi; Faisal, Mohammad Reza; Farmadi, Andi; Herteno, Rudy; Nugrahadi, Dodon Turianto; Ngo, Luu Duc; Abapihi, Bahriddin
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 4 (2024): October
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i4.465

Abstract

Baby cries contain patterns that indicate their needs, such as pain, hunger, discomfort, colic, or fatigue. This study explores the use of Convolutional Neural Network (CNN) architectures for classifying baby cries using Mel Spectrogram images. The primary objective of this research is to compare the effectiveness of various CNN architectures such as VGG-16, VGG-19, LeNet-5, AlexNet, ResNet-50, and ResNet-152 in detecting baby needs based on their cries. The datasets used include the Donate-a-Cry Corpus and Dunstan Baby Language. The results show that AlexNet achieved the best performance with an accuracy of 84.78% on the Donate-a-Cry Corpus dataset and 72.73% on the Dunstan Baby Language dataset. Other models like ResNet-50 and LeNet-5 also demonstrated good performance although their computational efficiency varied, while VGG-16 and VGG-19 exhibited lower performance. This research provides significant contributions to the understanding and application of CNN models for baby cry classification. Practical implications include the development of baby cry detection applications that can assist parents and healthcare provide.