Claim Missing Document
Check
Articles

The Effectiveness of Data Imputations on Myocardial Infarction Complication Classification Using Machine Learning Approach with Hyperparameter Tuning Mazdadi, Muhammad Itqan; Saragih, Triando Hamonangan; Budiman, Irwan; Farmadi, Andi; Tajali, Ahmad
Jurnal Ilmiah Teknik Elektro Komputer dan Informatika Vol. 10 No. 3 (2024): September
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26555/jiteki.v10i3.29479

Abstract

Complications from Myocardial Infarction (MI) represent a critical medical emergency caused by the blockage of blood flow to the heart muscle, primarily due to a blood clot in a coronary artery narrowed by atherosclerotic plaque. Diagnosing MI involves physical examination, electrocardiogram (ECG) evaluation, blood sample analysis for specific heart enzyme levels, and imaging techniques such as coronary angiography. Proactively predicting acute myocardial complications can mitigate adverse outcomes, and this study focuses on early prediction using classification methods. Machine learning algorithms such as Support Vector Machine (SVM), Random Forest, and XGBoost were employed to classify patient medical records accurately. Techniques like K-Nearest Neighbors (KNN) imputation, Iterative imputation, and Miss Forest were used to handle incomplete datasets, preserving vital information. Hyperparameter optimization, crucial for model performance, was performed using Bayesian Optimization, which minimizes the objective function by modeling past evaluations. The contribution to this study is to see how much influence data imputation has on classification using machine learning methods on missing data and to see how much influence the optimization method has when performing hyperparameter tuning. Results demonstrated that the Iterative Imputation method yielded excellent performance with SVM and XGBoost algorithms. SVM achieved 100% accuracy, precision, sensitivity, F1 score, and AUC. XGBoost reached 99.4% accuracy, 100% precision, 79.6% sensitivity, an F1 score of 88.7%, and an AUC of 0.898. KNN Imputation with SVM showed results similar to Iterative Imputation with SVM, while Random Forest exhibited poor classification outcomes due to data imbalance, causing overfitting.
Implementasi Principal Component Analysis (PCA) dan Gap Statistic untuk Clustering Kanker Payudara pada Algoritma K-Means Afifa, Ridha; Mazdadi, Muhammad Itqan; Saragih, Triando Hamonangan; Indriani, Fatma; Muliadi, Muliadi
Sistemasi: Jurnal Sistem Informasi Vol 13, No 5 (2024): Sistemasi: Jurnal Sistem Informasi
Publisher : Program Studi Sistem Informasi Fakultas Teknik dan Ilmu Komputer

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32520/stmsi.v13i5.4015

Abstract

Breast cancer is one of the most common causes of death worldwide. Data mining can be utilized to detect breast cancer, where information is extracted from data to provide valuable insights. Clustering of breast cancer is conducted to assist medical professionals in grouping the characteristics of each cancer type. However, multicollinearity in breast cancer data can impact clustering results. To address this issue, dimensionality reduction through Principal Component Analysis (PCA) is employed. PCA can effectively handle multicollinearity issues and enhance computational efficiency. Additionally, the K-Means method has limitations in determining the optimal number of clusters. Therefore, the Gap Statistic method is employed to find the optimal K value suitable for breast cancer data. This study compares the evaluation results of the K-Means clustering model, the combined PCA-KMeans clustering model, and the combined PCA-GapStatistic-KMeans clustering model. The findings indicate that the evaluation results for the K-Means model with PCA dimensionality reduction and optimal Gap Statistic K are superior to the K-Means model without dimensionality reduction. The Gap Statistic suggests 2 clusters as the optimal number, with an evaluation result of 1.195513.
Effectiveness of SMOTE in Enhancing Adult Autism Spectrum Disorder Diagnosis Predictive Performance With Missforest Imputation And Random Forest Musyaffa, Muhammad Hafizh; Saragih, Triando Hamonangan; Nugrahadi, Dodon Turianto; Kartini, Dwi; Farmadi, Andi
Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol. 7 No. 2 (2025): May
Publisher : Jurusan Teknik Elektromedik, Politeknik Kesehatan Kemenkes Surabaya, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/ijeeemi.v7i2.66

Abstract

Autism Spectrum Disorder (ASD), originally described by Leo Kanner in 1943, is a complex developmental condition that manifests through social, emotional, and behavioral challenges, often including speech delays and difficulties in interpersonal interactions. Despite significant advancements in diagnostic criteria over the years, accurate diagnosis of ASD in adults remains challenging due to limited access to comprehensive datasets and inherent methodological constraints. The Autism Screening Adult dataset used in this study exemplifies these issues, as it contains missing values and exhibits a marked class imbalance, both of which can adversely affect model performance. To address these challenges, we proposed a framework that integrates Random Forest classification with MissForest imputation and the Synthetic Minority Over-sampling Technique (SMOTE). MissForest effectively imputes missing data by employing an iterative random forest approach that preserves the underlying structure of the data without relying on strict parametric assumptions. Meanwhile, SMOTE generates synthetic samples for the minority class, thereby balancing the dataset and reducing prediction bias. Experimental evaluation through 10-Fold Cross Validation demonstrated that the application of SMOTE significantly enhanced model performance. Notably, the overall accuracy improved from 70.17% to 79.32%, and the AUC-ROC increased from 47.13% to 85.84%, indicating a robust improvement in the model’s ability to distinguish between positive and negative cases. These results underscore the critical importance of addressing data imbalance and missing values in predictive modeling for ASD. The promising outcomes of this study provide a solid foundation for developing more reliable diagnostic tools for adult ASD, and future research may further refine feature selection and incorporate additional data sources to optimize performance even further.
Hybrid Feature Selection and Balancing Data Approach for Improved Software Defect Prediction Febrian, Muhamad Michael; Saputro, Setyo Wahyu; Saragih, Triando Hamonangan; Abadi, Friska; Herteno, Rudy
Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol. 7 No. 2 (2025): May
Publisher : Jurusan Teknik Elektromedik, Politeknik Kesehatan Kemenkes Surabaya, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/ijeeemi.v7i2.67

Abstract

Software Defect Prediction (SDP) plays a vital role in identifying defects within software modules. Accurate early detection of software defects can reduce development costs and enhance software reliability. However, SDP remains a significant challenge in the software development lifecycle. This study employs Particle Swarm Optimization (PSO) and addresses several challenges associated with its application, including noisy attributes, high-dimensional data, and imbalanced class distribution. To address these challenges, this study proposed a hybrid filter-based feature selection and class balancing method. The feature selection process incorporates Chi-Square (CS), Correlation-Based Feature Selection (CFS), and Correlation Matrix-Based Feature Selection (CMFS), which have been proven effective in reducing noisy and redundant attributes. Additionally, the Synthetic Minority Over-sampling Technique (SMOTE) is applied to mitigate class imbalance in the dataset. The K-Nearest Neighbors (KNN) algorithm is employed as the classification model due to its simplicity, non-parametric nature, and suitability for handling the feature subsets produced. Performance evaluation is conducted using the Area Under Curve (AUC) metric with a significance threshold of 0.05 to assess classification capability.  The proposed method achieved an AUC of 0.872, demonstrating its effectiveness in enhancing predictive performance. The proposed method was also superior to other combinations such as PSO SMOTE (0.0043), PSO SMOTE CS (0.0091), PSO SMOTE CFS (0.0111), and PSO SMOTE CFS CMFS (0.0007). The findings of this study show that the proposed method significantly enhances the efficiency and accuracy of PSO in software defect prediction tasks. This hybrid strategy demonstrates strong potential as a robust solution for future research and application in predictive software quality assurance.
Analysis of the Effect of Feature Extraction on Sentiment Analysis using BiLSTM: Monkeypox Case Study on X/Twitter Noryasminda; Saragih, Triando Hamonangan; Herteno, Rudy; Faisal, Mohammad Reza; Farmadi, Andi
Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol. 7 No. 2 (2025): May
Publisher : Jurusan Teknik Elektromedik, Politeknik Kesehatan Kemenkes Surabaya, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/ijeeemi.v7i2.73

Abstract

The monkeypox outbreak has again become a global concern due to its widespread spread in various countries. Information related to the disease is widely shared through social media, especially Twitter which is a major source of public opinion. However, the complexity of language and the diverse viewpoints of users often pose challenges in accurately analyzing sentiment. Therefore, sentiment analysis of tweets about monkeypox is important to understand public perception and its impact on the dissemination of health information. This research contributes to identifying the most effective word embedding-based feature extraction method for sentiment analysis of health issues on social media. The purpose of this study is to compare the performance of word embedding methods namely Word2Vec, GloVe, and FastText in sentiment analysis of tweets about monkeypox using the BiLSTM model. Data totaling 1511 tweets were collected through a crawling process using the Twitter API. After the data is collected, manual labeling is done into three sentiment categories, namely positive, negative, and neutral. Furthermore, the data is processed through a preprocessing stage which includes data cleaning, case folding, tokenization, stopword removal, and stemming. The evaluation results show that FastText with BiLSTM produces the highest accuracy of 90%, followed by Word2Vec at 89%, and GloVe at 87%. FastText proved to be more effective in reducing classification errors, especially in distinguishing between negative and positive sentiments due to its ability to capture subword information and broader context. These findings suggest that the use of FastText can improve the accuracy of sentiment analysis, especially on health issues that develop on social media, so that it can support data-driven decision making by relevant parties in handling information dissemination. 
Machine Learning Implementation for Sentiment Analysis on X/Twitter: Case Study of Class Of Champions Event in Indonesia Hafizah, Rini; Saragih, Triando Hamonangan; Muliadi, Muliadi; Indriani, Fatma; Mazdadi, Muhammad Itqan
Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol. 7 No. 2 (2025): May
Publisher : Jurusan Teknik Elektromedik, Politeknik Kesehatan Kemenkes Surabaya, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/ijeeemi.v7i2.81

Abstract

Sentiment analysis on social media is becoming an important approach in understanding public opinion towards an event. Twitter, as a microblogging platform, generates a large amount of data that can be utilized for this analysis. This study aims to evaluate and compare the performance of three classification algorithms, namely Support Vector Machine (SVM), Random Forest, and Extreme Gradient Boosting (XGBoost), in sentiment analysis related to the Clash of Champions event in Indonesia. To represent the text data, two feature extraction techniques are used, namely Term Frequency-Inverse Document Frequency (TF-IDF) and Bag of Words (BoW). In addition, Synthetic Minority Over-sampling Technique (SMOTE) is applied to handle data imbalance, while model optimization is performed using GridSearchCV. The research dataset consists of 1,000 tweets collected through web scraping, then manually processed and labeled before model training and testing. The results showed that the TF-IDF technique provided superior results compared to BoW. The Random Forest model with TF-IDF achieved the highest accuracy of 91%, while XGBoost with TF-IDF had the highest Area Under the Curve (AUC) of 0.91. The findings confirm that the selection of appropriate feature extraction techniques and algorithms can improve accuracy in sentiment analysis. This study can be applied in public opinion monitoring and data-driven decision-making. Future research can explore word embedding techniques and transformer-based deep learning models to improve semantic understanding and accuracy of sentiment analysis.
Application of Adaboost Algorithm with SMOTE and Optuna Techniques in Sleep Disorder Classification Anshory, Muhammad Naufal; Mazdadi, Muhammad Itqan; Saragih, Triando Hamonangan; Budiman, Irwan; Saputro, Setyo Wahyu
Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol. 7 No. 2 (2025): May
Publisher : Jurusan Teknik Elektromedik, Politeknik Kesehatan Kemenkes Surabaya, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/ijeeemi.v7i2.99

Abstract

Data imbalance is a serious challenge in developing machine learning models for sleep disorder classification. When models are trained on an uneven distribution of classes, classification performance for minority classes such as insomnia and sleep apnea is often low. As a result, the overall accuracy may seem elevated, yet the sensitivity to important cases to be weak. Therefore, this research aims to design and develop a robust sleep disorder classification model with the AdaBoost algorithm, with improved performance through the integration of two main approaches, namely data balancing technique utilizing SMOTE and hyperparameter optimization using Optuna. This research contributes by showing that the combination of the two approaches can significantly improve model performance, not only in terms of global accuracy, but also accuracy on previously overlooked minority classes. The dataset utilized is the Sleep Health and Lifestyle Dataset which consists of 374 synthesized data and is divided into three categories: insomnia, sleep apnea, and none. This method stages include data preprocessing, data division using train-test split (80:20), application of SMOTE to balance the class distribution, hyperparameter tuning using Optuna, and model training with the AdaBoost algorithm. Evaluation was performed using classification metrics: accuracy, precision, recall, and F1-score. Results showed that mix of SMOTE and Optuna yielded the best results, accuracy 90.6%, F1-score 0.83871 for insomnia, and 0.81250 for sleep apnea. This performance was consistently superior to scenarios with no SMOTE or no tuning. This confirms the importance of using combination strategies to obtain fair and accurate classification on medical data. Future research is recommended to use real datasets as well as test the capabilities of this research on other models such as XGBoost or LightGBM.
Comparative Study of Filter, Wrapper, and Hybrid Feature Selection Using Tree-Based Classifiers for Software Defect Prediction Rahmayanti, Rahmayanti; Herteno, Rudy; Saputro, Setyo Wahyu; Saragih, Triando Hamonangan; Abadi, Friska
Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol. 8 No. 1 (2026): February
Publisher : Jurusan Teknik Elektromedik, Politeknik Kesehatan Kemenkes Surabaya, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/ijeeemi.v8i1.294

Abstract

Software defect prediction (SDP) is essential for improving software reliability by enabling the early identification of modules that may contain defects before the release stage. SDP commonly exhibits redundant or non-contributory metrics, underscoring the need for feature selection to derive a more informative subset. To address this problem, the present study investigates and compares the effectiveness of three feature-selection strategies: SelectKBest (SKB), Recursive Feature Elimination (RFE), and the hybrid SKB+RFE, in enhancing the performance of tree-based classifiers on the NASA Metrics Data Program (MDP) data collections. The study utilizes three classification algorithms, namely Random Forest (RF), Extra Trees (ET), and Bagging (Decision Tree), with Area Under the Curve (AUC) serving as the primary metric for assessing model performance. Experimental results reveal that the RFE and Extra Trees combination yields the top performance, producing an average AUC of 0.7855. This is subsequently followed by the SKB+RFE+ET configuration, which achieves an AUC of 0.7809, and SKB+ET at 0.7776. These findings demonstrate that iterative wrapper-based approaches such as RFE can identify more relevant and effective feature subsets than filter or hybrid strategies, with the RFE+Extra Trees configuration yielding the strongest overall predictive performance and wrapper-based methods exhibiting higher stability across heterogeneous datasets. Even without hyperparameter tuning and relying solely on class-weighting rather than explicit resampling techniques, the findings offer empirical insight into the isolated influence of feature selection on predictive performance. Overall, the study confirms that RFE combined with Extra Trees offers the strongest predictive performance on NASA MDP data collections and forms a foundation for developing more adaptive and robust models.
Peningkatan Akurasi Model Boosting pada Prediksi Kesehatan Tidur Menggunakan Optuna Mazdadi, Muhammad Itqan; Saragih, Triando Hamonangan; Budiman, Irwan; Anshory, Muhammad Naufal
Jurnal Informatika Polinema Vol. 12 No. 2 (2026): Vol. 12 No. 2 (2026)
Publisher : UPT P2M State Polytechnic of Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33795/jip.v12i2.8878

Abstract

Kualitas tidur memiliki peran penting dalam menjaga kesehatan fisik maupun mental, sementara gangguan tidur dapat meningkatkan risiko berbagai penyakit kronis. Perkembangan machine learning membuka peluang untuk melakukan prediksi kesehatan tidur secara lebih akurat melalui pemanfaatan data gaya hidup. Penelitian ini berfokus pada penerapan algoritma boosting, yaitu XGBoost, LightGBM, AdaBoost, dan GradientBoosting, dengan dukungan teknik hyperparameter tuning berbasis Optuna untuk meningkatkan akurasi prediksi. Dataset yang digunakan adalah Sleep Health and Lifestyle Dataset yang memuat variabel demografis, kebiasaan hidup, serta kondisi tidur. Tahapan penelitian meliputi praproses data, pembagian data latih dan uji, pelatihan model, optimasi hyperparameter menggunakan Optuna dengan metode Tree-structured Parzen Estimator (TPE), serta evaluasi model menggunakan metrik akurasi. Hasil eksperimen menunjukkan bahwa tuning dengan Optuna memberikan peningkatan akurasi pada beberapa model, khususnya LightGBM dan AdaBoost, dengan nilai akurasi mencapai 93,3% dan 90,7%. Sementara itu, XGBoost dan GradientBoosting menunjukkan performa stabil dengan akurasi tetap tinggi baik sebelum maupun sesudah tuning. Temuan ini menegaskan bahwa efektivitas tuning bergantung pada karakteristik algoritma yang digunakan. Secara keseluruhan, penelitian ini membuktikan bahwa Optuna dapat menjadi solusi efektif dalam meningkatkan kinerja model boosting untuk prediksi kesehatan tidur. Sebagai arah penelitian lanjutan, disarankan penggunaan metrik evaluasi yang lebih beragam, penerapan teknik penyeimbangan data, serta eksplorasi integrasi dengan metode deep learning untuk memperkaya hasil analisis.
Comparasion Of Weather Classification Methods On Weather Images Using GLCM Features With Random Forest And Catboost Algoritms Noorhafizi, Muhammad; Saragih, Triando Hamonangan; Mazdadi, Muhammad Itqan; Muliadi, Muliadi; Herteno, Rudy; Rozaq, Hasri Awal Akbar
International Journal of Advances in Data and Information Systems Vol. 7 No. 1 (2026): April 2026 - International Journal of Advances in Data and Information Systems
Publisher : Indonesian Scientific Journal

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.59395/ijadis.v7i1.1456

Abstract

Weather image classification is an essential process for improving automated weather information systems. However, most existing studies rely on numerical meteorological data and rarely utilize the textural characteristics embedded in atmospheric imagery. This study addresses that limitation by applying the Gray Level Co-Occurrence Matrix (GLCM) for texture feature extraction combined with Random Forest (RF) and CatBoost algorithms for classification. The dataset, obtained from Kaggle, consists of 1,125 weather images categorized into four classes: cloudy, rain, shine, and sunrise. All images were uniformly normalized and augmented using four rotation angles (0°, 45°, 90°, 135°). GLCM features were extracted with a pixel distance of 1 and gray-level quantization of 8, generating four statistical attributes: contrast, correlation, energy, and homogeneity. Both algorithms were optimized through parameter tuning and evaluated using a 5-fold cross-validation scheme with an 80:20 split ratio. Results show that the Random Forest model (n_estimators = 100, max_depth = 10, random_state = 42) achieved the highest accuracy of 92.43% (±1.12), precision of 92.50%, recall of 92.43%, and F1-score of 92.42%. In comparison, CatBoost (iterations = 100, learning_rate = 0.1, depth = 6) achieved an accuracy of 68.88% (±2.31). The findings demonstrate that GLCM feature extraction combined with Random Forest offers superior stability and accuracy for weather image classification, providing a foundation for efficient and interpretable weather information systems.
Co-Authors AA Sudharmawan, AA Abadi, Friska Abdul Latief Abadi Abdullayev, Vugar Achmad Rizal Adawiyah, Laila Afifa, Ridha Ahmad Rusadi Arrahimi - Universitas Lambung Mangkurat) Ahmad Rusadi Arrahimi - Universitas Lambung Mangkurat) Ahmad Tajali Aida, Nor Ajwa Helisa Al Ghifari, Muhammad Akmal Alamudin, Muhammad Faiq Alfita Rakhmandasari Amelia Aditya Santika Andi Farmadi Andi Farmadi Anshari, Muhammad Ridha Anshory, Muhammad Naufal Ansyari, Muhammad Ridho Arif Darmawan Athavale, Vijay Anant Athavale, Vijay Annant Bachtiar, Adam Mukharil Bachtiar, Adam Mukharil Difa Fitria Dina Arifah Diny Melsye Nurul Fajri Diny Melsye Nurul Fajri Dodon Turianto Nugrahadi Dwi Kartini Dwi Kartini, Dwi Dzira Naufia Jawza Erdi, Muhammad Erlianita, Noor Faisal, Mohammad Reza Fatma Indriani Fatma Indriani Febrian, Muhamad Michael Friska Abadi Haekal, Muhammad Haekal, Muhammad Hafizah, Rini Hermiati, Arya Syifa Herteno, Rudy Huynh, Phuoc-Hai Ichwan Dwi Nugraha Indriani, Fatma Irwan Budiman Irwan Budiman Irwan Budiman Itqan Mazdadi, Muhammad Ivan Sitohang Jumadi Mabe Parenreng Keswani, Ryan Rhiveldi Lilies Handayani Lumbanraja, Favorisen R M. Khairul Rezki Mafazy, Muhammad Meftah Mariana Dewi Muhamad Fawwaz Akbar Muhammad Al Ichsan Nur Rizqi Said Muhammad Alkaff Muhammad Darmadi Muhammad Fauzan Nafiz Muhammad Haekal Muhammad Haekal Muhammad Ikhwan Rizki Muhammad Itqan Mazdadi Muhammad Mursyidan Amini Muhammad Nadim Mubaarok Muhammad Reza Faisal, Muhammad Reza Muhammad Rofiq Muliadi Muliadi Muliadi Muliadi Muliadi Muliadi Muliadi Muliadi Muliadi Musyaffa, Muhammad Hafizh Nafiz, Muhammad Fauzan Noorhafizi, Muhammad Noryasminda Nugraha, Muhammad Amir Nurcahyati, Ica Nurlatifah Amini Okta Muthia Sari Purwoko, Agus Putra, Aditya Maulana Perdana Radityo Adi Nugraha Radityo Adi Nugroho Rahmat Ramadhani Rahmat Ramadhani Rahmatullah, Satrio Wibowo Rahmayanti Rahmayanti Ramadhani, Rahmat Ratna Septia Devi Regina Reza Faisal, Mohammad Rezeki, Abdillah Rizki, M. Alfi Rozaq, Hasri Akbar Awal Rozaq, Hasri Awal Akbar Rudy Herteno Rudy Herteno Safitri, Yasmin Dwi Said, Muhammad Al Ichsan Nur Rizqi SALLY LUTFIANI Salsha Farahdiba Setyo Wahyu Saputro Siena, Laifansan Siti Aisyah Solechah Siti Napi'ah Suci Permata Sari Sulastri Norindah Sari Tajali, Ahmad Totok Wianto Vivi Nur Wijayaningrum Wahyu Caesarendra Wayan Firdaus Mahmudy Winda Agustina Yanche Kurniawan Mangalik YILDIZ, Oktay Yusuf Priyo Anggodo Zamzam, Yra Fatria