Claim Missing Document
Check
Articles

Impact of a Synthetic Data Vault for Imbalanced Class in Cross-Project Defect Prediction Putri Nabella; Rudy Herteno; Setyo Wahyu Saputro; Mohammad Reza Faisal; Friska Abadi
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 2 (2024): April
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i2.409

Abstract

Software Defect Prediction (SDP) is crucial for ensuring software quality. However, class imbalance (CI) poses a significant challenge in predictive modeling. This study delves into the effectiveness of the Synthetic Data Vault (SDV) in mitigating CI within Cross-Project Defect Prediction (CPDP). Methodologically, the study addresses CI across ReLink, MDP, and PROMISE datasets by leveraging SDV to augment minority classes. Classification utilizing Decision Tree (DT), Logistic Regression (LR), K-Nearest Neighbors (KNN), Naive Bayes (NB), and Random Forest (RF), also model performance is evaluated using AUC and t-Test. The results consistently show that SDV performs better than SMOTE and other techniques in various projects. This superiority is evident through statistically significant improvements. KNN dominance in average AUC results, with values 0.695, 0.704, and 0.750. On ReLink, KNN show 16.06% improvement over the imbalanced and 12.84% over SMOTE. Similarly, on MDP, KNN 20.71% improvement over the imbalanced and a 10.16% over SMOTE. Moreover, on PROMISE, KNN 13.55% improvement over the imbalanced and 7.01% over SMOTE. RF displays moderate performance, closely followed by LR and DT, while NB lags behind. The statistical significance of these findings is confirmed by t-Test, all below the 0.05 threshold. These findings underscore SDV's potential in enhancing CPDP outcomes and tackling CI challenges in SDV. With KNN as the best classification algorithm. Adoption of SDV could prove to be a promising tool for enhancing defect detection and CI mitigation
Analysis of Important Features in Software Defect Prediction Using Synthetic Minority Oversampling Techniques (SMOTE), Recursive Feature Elimination (RFE) and Random Forest Ghinaya, Helma; Herteno, Rudy; Faisal, Mohammad Reza; Farmadi, Andi; Indriani, Fatma
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 3 (2024): July
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i3.453

Abstract

Software Defect Prediction (SDP) is essential for improving software quality during testing. As software systems grow more complex, accurately predicting defects becomes increasingly challenging. One of the challenges faced is dealing with imbalanced class distributions, where the number of defective instances is significantly lower than non-defective ones. To tackle the imbalanced class issue, use the SMOTE technique. Random Forest as a classification algorithm is due to its ability to handle non-linear data, its resistance to overfitting, and its ability to provide information about the importance of features in classification. This research aims to evaluate important features and measure accuracy in SDP using the SMOTE+RFE+Random Forest technique. The dataset used in this study is NASA MDP D", which included 12 data sets. The method used combines SMOTE, RFE, and random forest techniques. This study is conducted in two stages of approach. The first stage uses the RFE+Random Forest technique; the second stage involves adding the SMOTE technique before RFE and Random Forest to measure the accurate data from NASA MDP. The result of this study is that the use of the SMOTE technique enhances accuracy across most datasets, with the best performance achieved on the MC1 dataset with an accuracy of 0.9998. Feature importance analysis identifies "maintenance severity" and "cyclomatic density" as the most crucial features in data modeling for SDP. Therefore, the SMOTE+RFE+RF technique effectively improves prediction accuracy across various datasets and successfully addresses class imbalance issues.
1D and 2D Feature Extraction Based on AAC and DC Protein Descriptors for Classification of Acetylation in Lysine Proteins using Convolutional Neural Network Faisal, Mohammad Reza; Adawiyah, Laila; Saragih, Triando Hamonangan; kartini, Dwi; Herteno, Rudy; Lumbanraja, Favorisen Rosyking; Handayani, Lilies; Solechah, Siti Aisyah
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 4 (2024): October
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i4.458

Abstract

Post-Translational Modification (PTM) denotes a biochemical alteration observed in an amino acid, playing crucial roles in protein activity, functionality, and the regulation of protein structure. The recognition of associated PTMs serves as a fundamental basis for understanding biological processes, therapeutic interventions for diseases, and the development of pharmaceutical agents. Using computational approaches (in silico) offers an efficient and cost-effective means to identify PTM sites swiftly. The exploration of protein classification commences with extracting protein sequence features that are subsequently transformed into numerical features for utilization in classification algorithms. Feature extraction methodologies involve using protein descriptors like Amino Acid Composition (AAC) and Dipeptide Composition (DC). Yet, these approaches exhibit a limitation by neglecting crucial amino acid sequence details. Moreover, both descriptor techniques generate a limited number of 1-dimensional (1D) features, which may not be ideal for processing through the Convolutional Neural Network (CNN) classification method. This investigation presents a novel approach to enhance feature diversity through protein sequence segmentation techniques, employing adjacent and overlapping segment strategies. Furthermore, the study illustrates the organization of features into 1D and 2D formats to facilitate processing through 1D CNN and 2D CNN classification methodologies. The findings of this research endeavour highlight the potential for enhancing the accuracy of acetylation classification in lysine proteins through the multiplication of protein sequence segments in a 2D configuration. The highest accuracy achieved for AAC and DC-based feature extraction methods is 77.39% and 76.75%, respectively.
Baby Cry Sound Detection: A Comparison of Mel Spectrogram Image on Convolutional Neural Network Models Junaidi, Ridha Fahmi; Faisal, Mohammad Reza; Farmadi, Andi; Herteno, Rudy; Nugrahadi, Dodon Turianto; Ngo, Luu Duc; Abapihi, Bahriddin
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 4 (2024): October
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i4.465

Abstract

Baby cries contain patterns that indicate their needs, such as pain, hunger, discomfort, colic, or fatigue. This study explores the use of Convolutional Neural Network (CNN) architectures for classifying baby cries using Mel Spectrogram images. The primary objective of this research is to compare the effectiveness of various CNN architectures such as VGG-16, VGG-19, LeNet-5, AlexNet, ResNet-50, and ResNet-152 in detecting baby needs based on their cries. The datasets used include the Donate-a-Cry Corpus and Dunstan Baby Language. The results show that AlexNet achieved the best performance with an accuracy of 84.78% on the Donate-a-Cry Corpus dataset and 72.73% on the Dunstan Baby Language dataset. Other models like ResNet-50 and LeNet-5 also demonstrated good performance although their computational efficiency varied, while VGG-16 and VGG-19 exhibited lower performance. This research provides significant contributions to the understanding and application of CNN models for baby cry classification. Practical implications include the development of baby cry detection applications that can assist parents and healthcare provide.
Implementation of Extreme Learning Machine Method with Particle Swarm Optimization to Classify of Chronic Kidney Disease Muhammad Mursyidan Amini; Mazdadi, Muhammad Itqan; Muliadi, Muliadi; Faisal, Mohammad Reza; Saragih, Triando Hamonangan
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 4 (2024): October
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i4.561

Abstract

Kidney Disease (CKD) appears as a pathological condition due to infection of the kidneys and blockages due to the formation of kidney stones. In the Indonesian context, kidney disease is the second most common disease after heart disease based on BPJS Health data. Notably, in this scenario, medical practitioners and individuals with specialized knowledge in the field are still faced with challenges in effectively classifying CKD cases, thereby making them vulnerable to erroneous diagnostic conclusions. The main objective underlying this particular research effort revolves around increasing the level of accuracy that characterizes the CKD classification process by orchestrating the incorporation of Particle Swarm Optimization (PSO) techniques into the operational framework of Extreme Learning Machines (ELM) with the aim of ensuring optimal results. Configuration of input weights and critical biases to achieve superior diagnostic results. The results obtained from the investigation process include many numerical parameters including but not limited to determining the ideal number of hidden nodes set at 11, population size 80, identification of the most preferred number of iterations denoted by the Best value of 20, aggregate inertia weight assessed at 0.5, along with the constants 1 (c1) and 2 (c2) each registering a value of 1, culminating in the achievement of an accuracy metric pegged at an impressive level of 98.50%. Consequently, the implications obtained from this empirical investigation strengthen the assertion that the use of PSO optimization strategies within the operational framework of ELM has the potential to yield major advances in the classification evaluation domain related to CKD diagnosis.
A Specific Marker Approach to Improve Object Recognition in Bullet Launchers with Computer Vision Ahmad, Umar Ali; Tresna, Wildan Panji; Sugiarto, Iyon Titok; Delimayanti, Mera Kartika; Mustofa, Fahmi Charish; Faisal, Mohammad Reza; Septiawan, Reza Rendian
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 4 (2024): October
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i4.460

Abstract

Computer vision’s ability determines the accuracy of object recognition. This study tested the camera's ability to recognize both passive and active markers using LEDs. A specific active marker is analyzed using blinking on the LED. One of the factors to consider when choosing a specific marker is the value of the duty cycle accuracy. The proposed system is confirmed by implementing an integrated control system and the hardware to develop a specific marker. The result shows that the commercial camera can recognize all colors used as the test markers. Here, a specific marker was improved in the bullet launcher system due to tracking, identifying, detecting, marking, locking, and shooting a target precisely. Generally, image processing obtained the comparison of the time to speed the process, the higher the pixel resolution, the longer the time. When the object moves at a certain speed, the camera can detect several marker shapes, such as circles, squares, and triangles. The result shows that a circle marker gives a higher accuracy at every speed level. In the duty cycle variation test, when the duty cycle value is set to 50%, the best accuracy is obtained when the red LED is used, with the accuracy value obtained reaching 96%. In the LED test, it is also found that the effect of light affects the color detection results on the LED. Moreover, using the highest accuracy results from the LEDs at the implementation stage would be very good.
Classification of brain tumor based on shape and texture features and machine learning Rizki, M. Alfi; Faisal, Mohammad Reza; Farmadi, Andi; Saragih, Triando Hamonangan; Nugrahadi, Dodon Turianto; Bachtiar, Adam Mukharil; Keswani, Ryan Rhiveldi
Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol. 6 No. 4 (2024): November
Publisher : Jurusan Teknik Elektromedik, Politeknik Kesehatan Kemenkes Surabaya, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/27236g49

Abstract

Information from brain tumour visualisation using MRI can be used for brain tumour classification. The information can be extracted using different feature extraction techniques. This study compares shape-based feature extraction such as Zernike Moment (ZM), and Pyramid Histogram of Oriented Gradients (PHOG) with texture-based feature extraction such as Local Binary Patterns (LBP), Gray Level Co-occurrence Matrix (GLCM), Histogram of Oriented Gradients (HOG) in brain tumour classification. This research aims to find out which feature extraction is better for handling brain tumour images through the accuracy and f1-score produced. This research proposes to combine each feature based on its approach, i.e. ZM+PHOG for shape-based feature extraction and LBP+GLCM+HOG for texture-based feature extraction with default parameters from the library and modified parameters configured based on previous research. The dataset used comes from Kaggle and has three classes: meningioma, glioma, and pituitary. The machine learning classification models used are Support Vector Machine (SVM), Random Forest (RF), Naive Bayes (NB) and K-Nearest Neighbours (KNN) with default parameters from the library. The models were evaluated using 10-fold stratified cross-validation. This research resulted in an accuracy and f1-score of 84% for texture-based feature extraction with modified parameters in RF classification. In comparison, shape-based feature extraction resulted in accuracy and f1-score of 70% and 68% with modified parameters in RF classification. From the results, it can be concluded that texture-based feature extraction is better in handling brain tumour images compared to shape-based feature extraction. This study suggests that focusing on texture details in feature extraction can significantly improve classification performance in medical imaging such as brain tumours
Analisis Perbandingan Metode Harmonic Mean dan Local Mean Vector Dalam Penyeleksian Tetangga Pada Algoritma KNN Said, Muhammad Al Ichsan Nur Rizqi; Faisal, Mohammad Reza; Kartini, Dwi; Budiman, Irwan; Saragih, Triando Hamonangan
Jurnal Sains dan Informatika Vol. 9 No. 2 (2023): Jurnal Sains dan Informatika
Publisher : Teknik Informatika, Politeknik Negeri Tanah Laut

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.34128/jsi.v9i2.376

Abstract

Algoritma K Nearest Neighbour (KNN) merupakan salah satu algoritma klasifikasi yang telah digunakan pada banyak penelitian, namun KNN memiliki beberapa kekurangan diantaranya adalah pada pemilihan jumlah tetangga terdekat. Jika jumlah tetangga terdekat terlalu kecil maka akan sensitif terhadap derau (noise) dan jika jumlah tetangga terdekat terlalu besar kemungkinan ada tetangga outlier dari kelas lain. Majority Voting juga merupakan metode yang sederhana dan ini bisa jadi masalah jika jarak bervariasi. Salah satu solusi untuk masalah outlier adalah menggunakan Local Mean Vector dengan menambahkan Harmonic Mean untuk membantunya. Penelitian ini bertujuan untuk mengetahui perbandingan kinerja teknik penyeleksian tetangga terakhir yang didapatkan menggunakan Local Mean Vector dan Harmonic Mean. Dari Hasil dari penelitian ini menunjukkan bahwa teknik penyeleksian tetanggal berbasis Local Mean Vector dan Harmonic Mean memberikan akurasi lebih baik yaitu sebesar 0,78 dibandingkan dengan teknik Majority Voting dengan akurasi sebesar 0.75.
Effect of SMOTE Variants on Software Defect Prediction Classification Based on Boosting Algorithm Rahmina Ulfah Aflaha; Rudy Herteno; Mohammad Reza Faisal; Friska Abadi; Setyo Wahyu Saputro
Jurnal Ilmiah Teknik Elektro Komputer dan Informatika Vol. 10 No. 2 (2024): June
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26555/jiteki.v10i2.28521

Abstract

Detecting software defects early on is critical for avoiding significant financial losses. However, building accurate software defect prediction models can be challenging due to class imbalance, where the data for defective modules is much less than for standard modules. This research addresses this issue using the imbalanced dataset NASA MDP. To address this issue, researchers have proposed new methods that combine data level balancing approaches with 14 variations of the SMOTE algorithm to increase the amount of defective module data. An algorithm-level approach with three boosting algorithms, Catboost, LightGBM, and Gradient Boosting, is applied to classify modules as defective or non-defective. These methods aim to improve the accuracy of software defect prediction. The results show that this new method can produce a more accurate classification than previous studies. The DSMOTE and Gradient Boosting pair with 0.9161 has the highest average accuracy (0.9161). The DSMOTE and Catboost model achieved the highest average AUC value (0.9637). The ADASYN kernel and Catboost showed the best ability to perform the average G-mean value (0.9154). The research contribution to software defect prediction involves developing new techniques and evaluating their effectiveness in addressing class imbalance.
Enhancing Natural Disaster Monitoring: A Deep Learning Approach to Social Media Analysis Using Indonesian BERT Variants Karlina Elreine Fitriani; Mohammad Reza Faisal; Muhammad Itqan Mazdadi; Fatma Indriani; Dodon Turianto Nugrahadi; Septyan Eka Prastya
Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol. 7 No. 1 (2025): February
Publisher : Jurusan Teknik Elektromedik, Politeknik Kesehatan Kemenkes Surabaya, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/t158qq37

Abstract

Social media has become a primary source of real-time information that can be leveraged by artificial intelligence to identify relevant messages, thereby enhancing disaster management. The rapid dissemination of disaster-related information through social media allows authorities to respond to emergencies more effectively. However, filtering and accurately categorizing these messages remains a challenge due to the vast amount of unstructured data that must be processed efficiently. This study compares the performance of IndoRoBERTa, IndoRoBERTa MLM, IndoDistilBERT, and IndoDistilBERT MLM in classifying social media messages about natural disasters into three categories: eyewitness, non-eyewitness, and don’t know. Additionally, this study analyzes the impact of batch size on model performance to determine the optimal batch size for each type of disaster dataset. The dataset used in this study consists of 1000 messages per category related to natural disasters in the Indonesian language, ensuring sufficient data diversity. The results show that IndoDistilBERT achieved the highest accuracy of 81.22%, followed by IndoDistilBERT MLM at 80.83%, IndoRoBERTa at 79.17%, and IndoRoBERTa MLM at 78.72%. Compared to previous studies, this study demonstrates a significant improvement in classification accuracy and model efficiency, making it more reliable for real-world disaster monitoring. Pre-training with MLM enhances IndoRoBERTa’s sensitivity and IndoDistilBERT’s specificity, allowing both models to better understand context and optimize classification results. Additionally, this study identifies the optimal batch sizes for each disaster dataset: 32 for floods, 128 for earthquakes, and 256 for forest fires, contributing to improved model performance. These findings confirm that this approach significantly improves classification accuracy, supporting the development of machine learning-based early warning systems for disaster management. This study highlights the potential for further model optimization to enhance real-time disaster response and improve public safety measures more effectively and efficiently.
Co-Authors Abdul Gafur Abdullayev, Vugar Achmad Zainudin Nur Adawiyah, Laila Admi Syarif Ahmad Rusadi Ahmad Rusadi Arrahimi - Universitas Lambung Mangkurat) Ahmad Rusadi Arrahimi - Universitas Lambung Mangkurat) Andi Farmadi Andi Farmadi Andi Farmadi Angga Maulana Akbar Annisa Rizqiana Arie Sapta Nugraha Arif, Nuuruddin Hamid Arifin Hidayat Azizah, Azkiya Nur Bachtiar, Adam Mukharil Bahriddin Abapihi Bayu Hadi Sudrajat Dike Bayu Magfira, Dike Bayu Djordi Hadibaya Dodon Turianto Nugrahadi Dwi Kartini Dwi Kartini Dwi Kartini Dwi Kartini, Dwi Emma Andini Fatma Indriani Fatma Indriani Fatma Indriani Favorisen R. Lumbanraja Fitra Ahya Mubarok Fitriyana, Silfia Friska Abadi Friska Abadi Friska Abadi Ghinaya, Helma Hanif Rahardian Herteno, Rudy Irwan Budiman Irwan Budiman Irwan Budiman Ivan Sitohang Julius Tunggono Jumadi Mabe Parenreng Junaidi, Ridha Fahmi Karlina Elreine Fitriani Keswani, Ryan Rhiveldi Kevin Yudhaprawira Halim Kurnianingsih, Nia Lilies Handayani Liling Triyasmono Lisnawati Mahmud Mahmud Mauldy Laya Mera Kartika Delimayanti Miftahul Muhaemen Muflih Ihza Rifatama Muhamad Ihsanul Qamil Muhammad Al Ichsan Nur Rizqi Said Muhammad Alkaff Muhammad Angga Wiratama Muhammad Fauzan Nafiz Muhammad Haekal Muhammad Haekal Muhammad Iqbal Muhammad Irfan Saputra Muhammad Itqan Mazdadi Muhammad Janawi Muhammad Khairi Ihsan Muhammad Mada Muhammad Mursyidan Amini Muhammad Rizky Adriansyah Muhammad Rusli Muhammad Sholih Afif Muhammad Zaien MUJIZAT KAWAROE Muliadi Muliadi Muliadi Muliadi Aziz Muliadi Muliadi Muliadi Muliadi Muliadi Muliadi Mustofa, Fahmi Charish Ngo, Luu Duc Nor Indrani Noryasminda Nugrahadi, Dodon Nurlatifah Amini Nursyifa Azizah Oni Soesanto Prastya, Septyan Eka Purnajaya, Akhmad Rezki Putri Nabella Radityo Adi Nugroho Radityo Adi Nugroho Rahayu, Fenny Winda Rahmad Ubaidillah Rahmat Ramadhani Rahmat Ramadhani Rahmina Ulfah Aflaha Ratna Septia Devi RAUDLATUL MUNAWARAH Reina Alya Rahma Reza Rendian Septiawan Riadi, Putri Agustina Rinaldi Riza Susanto Banner Rizal, Muhammad Nur Rizki, M. Alfi Rizky, Muhammad Hevny Rossyking, Favorisen Rozaq, Hasri Akbar Awal Rudy Herteno Rudy Herteno Rudy Herteno Rudy Herteno Said, Muhammad Al Ichsan Nur Rizqi SALLY LUTFIANI Salsabila Anjani Saputro, Setyo Wahyu Saragih, Triando Hamonangan Sarah Monika Nooralifa Sari, Risna Sa’diah, Halimatus Septyan Eka Prastya Septyan Eka Prastya Setyo Wahyu Saputro Setyo Wahyu Saputro Siti Aisyah Solechah Solly Aryza Sri Redjeki Sri Redjeki Sugiarto, Iyon Titok Sulastri Norindah Sari Suryadi, Mulia Kevin Tri Mulyani Triando Hamonangan Saragih Umar Ali Ahmad Utami, Juliyatin Putri Vina Maulida, Vina Wahyu Caesarendra Wahyu Dwi Styadi Wahyudi Wahyudi Wildan Panji Tresna Winda Agustina Yenni Rahman YILDIZ, Oktay Yudha Sulistiyo Wibowo Yunida, Rahmi