Claim Missing Document
Check
Articles

Found 31 Documents
Search

Analysis of Important Features in Software Defect Prediction Using Synthetic Minority Oversampling Techniques (SMOTE), Recursive Feature Elimination (RFE) and Random Forest Ghinaya, Helma; Herteno, Rudy; Faisal, Mohammad Reza; Farmadi, Andi; Indriani, Fatma
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 3 (2024): July
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i3.453

Abstract

Software Defect Prediction (SDP) is essential for improving software quality during testing. As software systems grow more complex, accurately predicting defects becomes increasingly challenging. One of the challenges faced is dealing with imbalanced class distributions, where the number of defective instances is significantly lower than non-defective ones. To tackle the imbalanced class issue, use the SMOTE technique. Random Forest as a classification algorithm is due to its ability to handle non-linear data, its resistance to overfitting, and its ability to provide information about the importance of features in classification. This research aims to evaluate important features and measure accuracy in SDP using the SMOTE+RFE+Random Forest technique. The dataset used in this study is NASA MDP D", which included 12 data sets. The method used combines SMOTE, RFE, and random forest techniques. This study is conducted in two stages of approach. The first stage uses the RFE+Random Forest technique; the second stage involves adding the SMOTE technique before RFE and Random Forest to measure the accurate data from NASA MDP. The result of this study is that the use of the SMOTE technique enhances accuracy across most datasets, with the best performance achieved on the MC1 dataset with an accuracy of 0.9998. Feature importance analysis identifies "maintenance severity" and "cyclomatic density" as the most crucial features in data modeling for SDP. Therefore, the SMOTE+RFE+RF technique effectively improves prediction accuracy across various datasets and successfully addresses class imbalance issues.
A Comparative Analysis of Polynomial-fit-SMOTE Variations with Tree-Based Classifiers on Software Defect Prediction Nur Hidayatullah, Wildan; Herteno, Rudy; Reza Faisal, Mohammad; Adi Nugroho, Radityo; Wahyu Saputro, Setyo; Akhtar, Zarif Bin
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 3 (2024): July
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i3.455

Abstract

Software defects present a significant challenge to the reliability of software systems, often resulting in substantial economic losses. This study examines the efficacy of polynomial-fit SMOTE (pf-SMOTE) variants in combination with tree-based classifiers for software defect prediction, utilising the NASA Metrics Data Program (MDP) dataset. The research methodology involves partitioning the dataset into training and test subsets, applying pf-SMOTE oversampling, and evaluating classification performance using Decision Trees, Random Forests, and Extra Trees. Findings indicate that the combination of pf-SMOTE-star oversampling with Extra Tree classification achieves the highest average accuracy (90.91%) and AUC (95.67%) across 12 NASA MDP datasets. This demonstrates the potential of pf-SMOTE variants to enhance classification effectiveness. However, it is important to note that caution is warranted regarding potential biases introduced by synthetic data. These findings represent a significant advancement over previous research endeavors, underscoring the critical role of meticulous algorithm selection and dataset characteristics in optimizing classification outcomes. Noteworthy implications include advancements in software reliability and decision support for software project management. Future research may delve into synergies between pf-SMOTE variants and alternative classification methods, as well as explore the integration of hyperparameter tuning to further refine classification performance.
1D and 2D Feature Extraction Based on AAC and DC Protein Descriptors for Classification of Acetylation in Lysine Proteins using Convolutional Neural Network Faisal, Mohammad Reza; Adawiyah, Laila; Saragih, Triando Hamonangan; kartini, Dwi; Herteno, Rudy; Lumbanraja, Favorisen Rosyking; Handayani, Lilies; Solechah, Siti Aisyah
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 4 (2024): October
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i4.458

Abstract

Post-Translational Modification (PTM) denotes a biochemical alteration observed in an amino acid, playing crucial roles in protein activity, functionality, and the regulation of protein structure. The recognition of associated PTMs serves as a fundamental basis for understanding biological processes, therapeutic interventions for diseases, and the development of pharmaceutical agents. Using computational approaches (in silico) offers an efficient and cost-effective means to identify PTM sites swiftly. The exploration of protein classification commences with extracting protein sequence features that are subsequently transformed into numerical features for utilization in classification algorithms. Feature extraction methodologies involve using protein descriptors like Amino Acid Composition (AAC) and Dipeptide Composition (DC). Yet, these approaches exhibit a limitation by neglecting crucial amino acid sequence details. Moreover, both descriptor techniques generate a limited number of 1-dimensional (1D) features, which may not be ideal for processing through the Convolutional Neural Network (CNN) classification method. This investigation presents a novel approach to enhance feature diversity through protein sequence segmentation techniques, employing adjacent and overlapping segment strategies. Furthermore, the study illustrates the organization of features into 1D and 2D formats to facilitate processing through 1D CNN and 2D CNN classification methodologies. The findings of this research endeavour highlight the potential for enhancing the accuracy of acetylation classification in lysine proteins through the multiplication of protein sequence segments in a 2D configuration. The highest accuracy achieved for AAC and DC-based feature extraction methods is 77.39% and 76.75%, respectively.
Baby Cry Sound Detection: A Comparison of Mel Spectrogram Image on Convolutional Neural Network Models Junaidi, Ridha Fahmi; Faisal, Mohammad Reza; Farmadi, Andi; Herteno, Rudy; Nugrahadi, Dodon Turianto; Ngo, Luu Duc; Abapihi, Bahriddin
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 4 (2024): October
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i4.465

Abstract

Baby cries contain patterns that indicate their needs, such as pain, hunger, discomfort, colic, or fatigue. This study explores the use of Convolutional Neural Network (CNN) architectures for classifying baby cries using Mel Spectrogram images. The primary objective of this research is to compare the effectiveness of various CNN architectures such as VGG-16, VGG-19, LeNet-5, AlexNet, ResNet-50, and ResNet-152 in detecting baby needs based on their cries. The datasets used include the Donate-a-Cry Corpus and Dunstan Baby Language. The results show that AlexNet achieved the best performance with an accuracy of 84.78% on the Donate-a-Cry Corpus dataset and 72.73% on the Dunstan Baby Language dataset. Other models like ResNet-50 and LeNet-5 also demonstrated good performance although their computational efficiency varied, while VGG-16 and VGG-19 exhibited lower performance. This research provides significant contributions to the understanding and application of CNN models for baby cry classification. Practical implications include the development of baby cry detection applications that can assist parents and healthcare provide.
Prediksi Churn Pelanggan Telekomunikasi dengan Optimalisasi Seleksi Fitur dan Tuning Hyperparameter pada Algoritma Klasifikasi C4.5 Antoh, Soterio; Herteno, Rudy; Budiman, Irwan; Kartini, Dwi; Mazdadi, Muhammad Itqan
Jurnal Sistem Informasi Bisnis Vol 15, No 1 (2025): Volume 15 Number 1 Year 2025
Publisher : Diponegoro University

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/vol15iss1pp60-67

Abstract

In the telecommunications industry, predicting customer churn is crucial for maintaining business sustainability. High churn rates can negatively impact profitability, necessitating effective retention strategies. This research aims to enhance the accuracy of telecommunications customer churn prediction by optimizing the C4.5 classification algorithm through feature selection and hyperparameter tuning. The methods used include Information Gain for feature selection and hyperparameter tuning with Random Search and Grid Search. This study utilizes the Telco Customer Churn dataset from Kaggle, split into an 80:20 ratio for training and testing data. Six approaches are applied: (1) the basic C4.5 algorithm, (2) C4.5 with Information Gain, (3) C4.5 with Random Search, (4) C4.5 with Grid Search, (5) C4.5 with a combination of Information Gain and Random Search, and (6) C4.5 with a combination of Information Gain and Grid Search. The results indicate that the C4.5 algorithm alone achieves an accuracy of 74.09%, while applying Information Gain increases accuracy to 78.42%. Hyperparameter tuning with Random Search achieves the highest accuracy of 80.05%, whereas Grid Search reaches 77.71%. Combining Information Gain with Random Search results in an accuracy of 78.99%, while combining Information Gain with Grid Search yields an accuracy of 78.85%. These findings suggest that hyperparameter tuning using Random Search significantly improves accuracy compared to other methods, while Information Gain feature selection does not have a significant impact on performance in this context.
Image Classification of Traditional Indonesian Cakes Using Convolutional Neural Network (CNN) Azizah, Azkiya Nur; Budiman, Irwan; Indriani, Fatma; Faisal, M. Reza; Herteno, Rudy
Computer Engineering and Applications Journal (ComEngApp) Vol. 13 No. 2 (2024)
Publisher : Universitas Sriwijaya

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Indonesia is one of the countries famous for its traditional culinary. Traditional cakes in Indonesia are traditional snacks typical of the archipelago's culture which have a variety of textures, shapes, colors that vary and some are similar so that there are still many people who do not know the name of the cake from the many types of traditional Indonesian cakes. The problem can be solved by creating a traditional cake image recognition system that can be programmed and trained to classify various types of traditional Indonesian cakes. The Convolutional Neural Network method with the AlexNet architecture model is used in this research to predict various kinds of traditional Indonesian cakes. The dataset used in this research is 1846 datasets with 8 classes of cake images. This study trained the AlexNet model with several optimizers, namely, Adam optimizer, SGD, and RMSprop. The best parameters from the model testing results are at batchsize 16, epoch 50, learning rate 0.01 for SGD optimizer and learning rate 0.001 for Adam and RMSprop optimizers. Each optimizer tested produces different accuracy, precision, recall, and f1_score values. The highest test results that have been carried out on the image dataset of typical Indonesian traditional cakes are obtained by the Adam optimizer with an accuracy value of 79%.
An Empirical Study of Cross-Project and Within-Project Performance in Software Defect Prediction Models Using Tree-Based and Boosting Classifiers Raidra Zeniananto; Herteno, Rudy; Radityo Adi Nugroho; Andi Farmadi; Setyo Wahyu Saputro
Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol. 7 No. 3 (2025): August
Publisher : Jurusan Teknik Elektromedik, Politeknik Kesehatan Kemenkes Surabaya, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/ijeeemi.v7i3.95

Abstract

Software Defect Prediction (SDP) is a vital process in modern software engineering aimed at identifying faulty components in the early stages of development. In this study, we conducted a comprehensive evaluation of two widely employed SDP approaches, Within-Project Software Defect Prediction (WP-SDP) and Cross-Project Software Defect Prediction (CP-SDP), using identical preprocessing steps to ensure an objective comparison. We utilized the NASA MDP dataset, where each project was split into 70% training and 30% testing data, and applied three distinct resampling strategies—no sampling, oversampling, and undersampling—to address the challenge of class imbalance. Five classification algorithms were examined, including Support Vector Machine (SVM), Random Forest (RF), Gradient Boosting (GB), XGBoost (XGB), and LightGBM (LGBM). Performance was measured primarily using Accuracy and Area Under the Curve (AUC) metrics, resulting in 360 experimental outcomes. Our findings revealed that WP-SDP, combined with oversampling and Random Forest, demonstrated superior predictive capability on most projects, achieving an Accuracy of 89.92% and an AUC of 0.931 on PC4. Nonetheless, CP-SDP excelled in certain small-scale projects (e.g., MW1), underscoring its potential when local historical data is scarce but inter-project characteristics remain sufficiently similar. This study’s results underscore the importance of selecting a prediction scheme tailored to specific project attributes, class imbalance levels, and available historical data. By establishing a standardized methodological framework, our work contributes to a clearer understanding of the strengths and limitations of WP-SDP and CP-SDP, paving the way for more effective defect detection strategies and improved software quality.
Implementation of Extra Trees Classifier and Chi-Square Feature Selection for Early Detection of Liver Disease Al Ghifari, Muhammad Akmal; Budiman, Irwan; Saragih, Triando Hamonangan; Mazdadi, Muhammad Itqan; Herteno, Rudy; Rozaq, Hasri Akbar Awal
Jurnal Teknik Informatika (Jutif) Vol. 6 No. 5 (2025): JUTIF Volume 6, Number 5, Oktober 2025
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2025.6.5.4261

Abstract

The imbalanced distribution of medical data poses challenges in accurately detecting liver disease, which is crucial as symptoms often remain unnoticed until advanced stages. This study examines the application of the Extra Trees Classifier algorithm and chi-square feature selection for early detection of liver disease. Compared to traditional methods like Random Forest and SVM, the Extra Trees Classifier offers enhanced computational efficiency and better handling of imbalanced datasets, while chi-square feature selection helps identify the most relevant medical indicators. The data consists of five medical variables likely to be laboratory test results from patient samples, with labels indicating classes A and B. The data is randomly divided with a ratio of 80% for each class. To address data imbalance, SMOTE technique was applied before the data was randomly split into a ratio of 80% for training and 20% for testing to ensure effective learning and testing of the model's performance. The results showed that with the help of chi-square feature selection, the Extra Trees Classifier algorithm could provide fairly accurate predictions in liver disease classification, with an accuracy of 82.6%, sensitivity of 85.5%, precision of 78.3%, and F1-Score of 81.7%. These results demonstrate significant improvement over existing methods, and the proposed approach can aid healthcare practitioners in making timely diagnostic decisions, potentially reducing mortality rates through early intervention in liver disease cases.
Intrusion Detection System Berbasis Seleksi Fitur Dengan Kombinasi Filter Information Gain Ratio Dan Correlation Putri, Nitami Lestari; Nugroho, Radityo Adi; Herteno, Rudy
Jurnal Teknologi Informasi dan Ilmu Komputer Vol 8 No 3: Juni 2021
Publisher : Fakultas Ilmu Komputer, Universitas Brawijaya

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.25126/jtiik.0813154

Abstract

Intrusion Detection System merupakan suatu sistem yang dikembangkan untuk memantau dan memfilter aktivitas jaringan dengan mengidentifikasi serangan. Karena jumlah data yang perlu diperiksa oleh IDS sangat besar dan banyaknya fitur-fitur asing yang dapat membuat proses analisis menjadi sulit untuk mendeteksi pola perilaku yang mencurigakan, maka IDS perlu mengurangi jumlah data yang akan diproses dengan cara mengurangi fitur yang dapat dilakukan dengan seleksi fitur. Pada penelitian ini mengkombinasikan dua metode perangkingan fitur yaitu Information Gain Ratio dan Correlation dan mengklasifikasikannya menggunakan algoritma K-Nearest Neighbor. Hasil perankingan dari kedua metode dibagi menjadi dua kelompok. Pada kelompok pertama dicari nilai mediannya dan untuk kelompok kedua dihapus. Lalu dilakukan klasifikasi K-Nearest Neighbor dengan menggunakan 10 kali validasi silang dan dilakukan pengujian dengan nilai k=5. Penerapan pemodelan yang diusulkan menghasilkan akurasi tertinggi sebesar 99.61%. Sedangkan untuk akurasi tanpa seleksi fitur menghasilkan akurasi tertinggi sebesar 99.59%. AbstractIntrusion Detection System is a system that was developed for monitoring and filtering activity in network with identified of attack. Because of the amount of the data that need to be checked by IDS is very large and many foreign feature that can make the analysis process difficult for detection suspicious pattern of behavior, so that IDS need for reduce amount of the data to be processed by reducing features that can be done by feature selection. In this study, combines two methods of feature ranking is Information Gain Ratio and Correlation and classify it using K-Nearest Neighbor algorithm. The result of feature ranking from the both methods divided into two groups. in the first group searched for the median value and in the second group is removed. Then do the classification of  K-Nearest Neighbor using 10 fold cross validation and do the tests with values k=5. The result of the  proposed modelling produce the highest accuracy of 99.61%. While the highest accuracy value of the not using the feature selection is 99.59%.
Kombinasi Seleksi Fitur Berbasis Filter dan Wrapper Menggunakan Naive Bayes pada Klasifikasi Penyakit Jantung Azizah, Siti Roziana; Herteno, Rudy; Farmadi, Andi; Kartini, Dwi; Budiman, Irwan
Jurnal Teknologi Informasi dan Ilmu Komputer Vol 10 No 6: Desember 2023
Publisher : Fakultas Ilmu Komputer, Universitas Brawijaya

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.25126/jtiik.2023107467

Abstract

Penyakit jantung menjadi salah satu penyebab utama kematian bersama dengan penyakit lainnya. Dalam bidang teknologi, data mining dapat digunakan untuk mendiagnosa suatu penyakit yang bersumber dari data rekam medis pasien. Pada klasifikasi dataset medis, Naive Bayes merupakan salah satu metode terbaik yang digunakan. Tujuan dari penelitian ini adalah untuk mengetahui perbandingan hasil akurasi dari Naive Bayes menggunakan beberapa seleksi fitur yaitu Forward Selection, Backward Elimination, kombinasi union hasil seleksi fitur Forwad Selection dan Backward Elimination, Information Gain, Gain Ratio, dan kombinasi union hasil seleksi fitur Information Gain dengan Gain Ratio. Data yang digunakan dalam penelitian ini adalah data penyakit jantung yang didapatkan dari UCI Machine Learning Repository. Dari implementasi pemodelan yang akan dilakukan menghasilkan nilai akurasi tertinggi sebesar 91.80% pada algoritma Naive Bayes dengan kombinasi union hasil seleksi fitur Information Gain dan Gain Ratio menggunakan perbandingan data latih dan data uji 80:20. Sedangkan akurasi Naive Bayes dengan kombinasi union hasil seleksi fitur Forward Selection dan Backward Elimination hanya memiliki nilai akurasi sebesar 83.61%   Abstract Heart disease is one of the leading causes of death along with other diseases. In the field of technology, data mining can be used to diagnose a disease sourced from patient medical record data. In the classification of medical datasets, Naive Bayes is one of the best methods used. The purpose of this study is to determine the comparison of the accuracy results of Naive Bayes using several feature selections, namely Forward Selection, Backward Elimination, a combination of union of Forwad Selection and Backward Elimination feature selection results, Information Gain, Gain Ratio, and a combination of union of Information Gain feature selection results with Gain Ratio. The data used in this research is heart disease data obtained from the UCI Machine Learning Repository. From the implementation of modeling that will be carried out, the highest accuracy value is 91.80% in the Naive Bayes algorithm with a combination of union of Information Gain and Gain Ratio feature selection results using a ratio of training data and test data of 80:20. While the accuracy of Naive Bayes with a combination of union selection results of Forward Selection and Backward Elimination features only has an accuracy value of 83.61%.