Claim Missing Document
Check
Articles

Comparative Study of Imbalanced Data Oversampling Techniques for Peer-to-Peer Landing Loan Prediction Muzayanah, Rini; Lestari, Apri Dwi; Jumanto, Jumanto; Prasetiyo, Budi; Pertiwi, Dwika Ananda Agustina; Muslim, Much Aziz
Scientific Journal of Informatics Vol 11, No 1 (2024): February 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v11i1.50274

Abstract

Purpose: Data imbalances that often occur in the classification of loan data on the Peer-to-Peer Lending platform cancause algorithm performance to be less than optimal, causing the resulting accuracy to decrease. To overcome thisproblem, appropriate resampling techniques are needed so that the classification algorithm can work optimally andprovide results with optimal accuracy. This research aims to find the right resampling technique to overcome theproblem of data imbalance in data lending on peer-to-peer landing platforms.Methods: This study uses the XGBoost classification algorithm to evaluate and compare the resampling techniquesused. The resampling techniques that will be compared in this research include SMOTE, ADACYN, Border Line, andRandom Oversampling.Results: The highest training accuracy was achieved by the combination of the XGBoost model with the Boerder Lineresampling technique with a training accuracy of 0.99988 and the combination of the XGBoost model with the SMOTEresampling technique. In accuracy testing, the combination with the highest accuracy score was achieved by acombination of the XGBoost model with the SMOTE resampling technique.Novelty: It is hoped that from this research we can find the most suitable resampling technique combined with theXGBoost sorting algorithm to overcome the problem of unbalanced data in uploading data on peer-to-peer lendingplatforms so that the sorting algorithm can work optimally and produce optimal accuracy.
Penanganan Ketidakseimbangan Data Ekstrim pada Sistem Prediksi Putro, Ari Nugroho; Muslim, Much Aziz
Techno.Com Vol. 24 No. 4 (2025): November 2025
Publisher : LPPM Universitas Dian Nuswantoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62411/tc.v24i4.15005

Abstract

Salah satu masalah utama dalam sistem prediksi adalah ketidakseimbangan data, di mana kelas tertentu sangat kurang terwakili dibandingkan dengan kelas lainnya. Ketidakseimbangan data dapat menyebabkan bias model, di mana model lebih mudah mendeteksi kelas mayoritas tetapi lemah dalam mendeteksi kelas minoritas. Terutama pada data dengan ketidakseimbangan ekstrem dengan IR >9, model memiliki akurasi tinggi tetapi performa recall rendah. Hal ini merugikan sistem prediksi yang memprioritaskan deteksi kelas minoritas. Penelitian ini bertujuan untuk meningkatkan recall pada dataset yang sangat tidak seimbang dengan menggunakan empat teknik penanganan ketidakseimbangan data, yaitu SMOTE dan OHIT pada level data, serta CSL dan CW pada level model. Teknik pada level data menyeimbangkan distribusi kelas dengan menambahkan data sintetis, sedangkan teknik pada level model meningkatkan sensitivitas terhadap kelas minoritas. Model yang digunakan sebagai baseline adalah LR untuk mengamati peningkatan recall dari keempat teknik penanganan ketidakseimbangan data. Dari hasil pengujian semua teknik penanganan ketidakseimbangan data, semuanya meningkatkan recall dengan margin sebesar 0,3243. Peningkatan recall tertinggi dicapai oleh LR-SMOTE dengan margin sebesar 0,3256. Penelitian ini menunjukkan bahwa recall model dapat ditingkatkan dengan menggunakan teknik penanganan ketidakseimbangan data. Kata kunci – ketidakseimbangan data ekstrem, sistem prediksi, recall, penanganan ketidakseimbangan data
Analysis and Visualization of Purchasing Pattern in Retail Product Transaction using Apriori Algorithm Febriani SM, N. Nelis; Setyoningrum, Nuk Ghurroh; Lodana, Mae; Pertiwi, Dwika Ananda Agustina; Muslim, Much Aziz
Journal of Information System Exploration and Research Vol. 4 No. 1 (2026): January 2026
Publisher : shmpublisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52465/joiser.v4i1.650

Abstract

The rapid growth of the retail industry generates large volumes of transaction data that can be analyzed to support data-driven business decision making. This study aims to analyze and visualize purchasing patterns in retail product transactions by applying data mining techniques using the Apriori algorithm and business intelligence visualization through Microsoft Power BI. The dataset consists of 1 million retail transactions collected from an open retail transaction repository. The research stages include data collection, transaction data preprocessing, implementation of the Apriori algorithm with a minimum support threshold of 0.002 and a minimum confidence of 0.5, and visualization of the analysis results through interactive dashboards using Power BI and a Python-based application developed with the Streamlit framework. The results indicate that the Apriori algorithm successfully identifies frequent product associations and generates 12 association rules that meet the criteria of strong association rules. Power BI visualizations provide comprehensive insights into transaction trends based on customer categories, store types, payment methods, seasons, and transaction regions. These findings are expected to assist retail companies in formulating marketing strategies, developing product recommendations, and optimizing inventory management in a more effective and data-driven manner. This study contributes by integrating large-scale association rule mining with interactive business intelligence visualization for retail decision support.
Optimizing Stacking Ensemble Models for Customer Churn Prediction in the Telecommunications Industry Rofik, Rofik; Unjung, Jumanto; Pertiwi, Dwika Ananda Agustina; Muslim, Much Aziz
JOIN (Jurnal Online Informatika) Vol 11 No 1 (2026)
Publisher : Department of Informatics, UIN Sunan Gunung Djati Bandung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15575/join.v11i1.1783

Abstract

One of the biggest challenges in the telecommunications industry is predicting churn, which is the condition when a customer unsubscribes and switches to another service provider. In an era of competitive market conditions, retaining customers is much more efficient than acquiring new customers. Conventional prediction models are often unable to capture the complexity of customer behavior patterns, resulting in a lower accuracy than optimal. This study aims to optimize customer churn prediction performance by developing a stacking ensemble model that combines several classification algorithms to improve model performance. Fourteen algorithms were tested, and the six algorithms with the best accuracy were selected as base learners, while Logistic Regression was selected as the meta-learner. The stacking model testing was carried out sequentially through a combination of 6 algorithms with the same meta-learner algorithm. Testing was also carried out with and without using the SMOTE data balancing method to evaluate the effect of data balancing on the prediction results. The results of this study show that the combination of the Adaboost, Ridge Classifier, and Logistic Regression algorithms can produce the highest accuracy of 82.97%, which exceeds the prediction performance of a single algorithm. This research contributes to demonstrating an effective stacking ensemble configuration for predicting customer churn in the telecommunications industry and emphasizes that the selection of the right algorithm combination has a greater impact on model performance than the number of algorithms used.
Enhanced Out-of-Fold Stacking with Feature Grouping and Model-Specific Transformations for Diabetes Prediction Improvement Putro, Ari Nugroho; Kharisma, Sidiq Noor; Al-Zahra, Gea Destadia; Muslim, Much Aziz; Pertiwi, Dwika Ananda Agustina
Journal of Student Research Exploration Vol. 4 No. 1 (2026): January 2026
Publisher : SHM Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52465/josre.v4i1.674

Abstract

Diabetes mellitus is a chronic disease with serious implications for global health. Early detection is essential to reduce these risks, and machine learning methods are widely used in diabetes prediction. However, improving accuracy remains a major challenge in the development of predictive models. This study proposes a stacking-based ensemble learning approach with an out-of-fold (OOF) scheme to improve classification performance. The proposed method consists of several systematic steps, namely (1) data preprocessing via median imputation of invalid values and feature transformation according to model characteristics, (2) the creation of base learners comprising Logistic Regression, Gaussian Naïve Bayes, Support Vector Machine, Random Forest, and XGBoost, (3) model training using Stratified Cross Validation 5 Fold to generate OOF predictions, (4) combining all OOF predictions into a meta-feature matrix, and (5) training an XGBoost-based meta-model to generate the final prediction. This approach enables the meta-model to optimally learn the relationships among the outputs of the baseline models. Experimental results show that the proposed method achieves an accuracy of 91.15%, precision of 90.65%, recall of 83.21%, and an F1-score of 86.77%. These results indicate that stacking is effective in improving the accuracy of diabetes predictions.
Co-Authors Afifah Ratna Safitri Agus Harjoko Ahmad, Kamilah Al-Zahra, Gea Destadia Alabid, Noralhuda N. Alamsyah - Aldi Nurzahputra Aldi Nurzahputra, Aldi Alfatah, Abdul Muis Alfatah, Abdul Muis Ali, Muazam Amanah Febrian Indriani Aminuyati Anggyi Trisnawan Putra Annegrat, Ahmed Mohamed Astuti, Winda Try Astuti, Winda Try Atikah Ari Pramesti, Atikah Ari Budi Prasetiyo Budi Prasetiyo, Budi Darmawan, Aditya Yoga Dewi Handayani Untari Ningsih Dinova, Dony Benaya Djuniharto Djun Doni Aprilianto Dullah, Ahmad Ubai Eka Listiana Endang Sugiharti, Endang Fadhilah, Muhammad Syafiq Fadli Dony Pradana Falasari, Anisa Farih, Habib al Florentina Yuni Arini Hadiq, Hadiq Hakim, M. Faris Al Hendi Susanto Imam Ahmad Ashari, Imam Ahmad Irfan, Mohammad Syarif Jeffry Nur Rifa’i Jumanto , Jumanto Jumanto Jumanto, Jumanto Jumanto Unjung Khan, Atta Ullah Kharisma, Sidiq Noor Larasati, Ukhti Ikhsani Larasati, Ukhti Ikhsani Lestari, Apri Dwi Listiana, Eka Listiana, Eka Lodana, Mae Maulana, Muhamad Irvan Miranita Khusniati moh minhajul mubarok Muhamad Anbiya Nur Islam Mustaqim, Amirul Muzayanah, Rini N. Nelis Febriani SM Nikmah, Tiara Lailatul Nina Fitriani, Nina Ningsih, Maylinna Rahayu Nugraha, Faizal Widya Nuk Ghurroh Setyoningrum Nur Astri Retno, Nur Astri Nurdin, Alya Aulia Nurriski, Yopi Julia Perbawawati, Anna Adi Perbawawati, Anna Adi Pertiwi, Dwika Ananda Agustina Priliani, Erlin Mega Priliani, Erlin Mega Purnawan, Dedy Putri Utami, Putri Putri, Salma Aprilia Huda Putriaji Hendikawati Putro, Ari Nugroho Qohar, Bagus Al Raharjo, Bagus Purbo Rahman, Raihan Muhammad Rizki Rahmanda, Primana Oky Rahmanda, Primana Oky Riza Arifudin Rofik Rofik, Rofik Roni Kurniawan Rukmana, Siti Hardiyanti Ryo Pambudi S.Pd. M Kes I Ketut Sudiana . Safri, Yofi Firdan Safri, Yofi Firdan Saiful Arifin Salahudin, Shahrul Nizam Sanjani, Fathimah Az Zahra Seivany, Ravenia Simanjuntak, Robert Panca R. Solehatin, Solehatin Sugiman Sugiman Sulistiana Syarifah, Aulia Tanga , Yulizchia Malica Pinkan Tanga, Yulizchia Malica Pinkan Tanzilal Mustaqim Trihanto, Wandha Budhi Trihanto, Wandha Budhi Triyana Fadila Varindya Ditta Iswari Vedayoko, Lucky Gagah Vedayoko, Lucky Gagah Wibowo, Kevyn Alifian Hernanda Yosza Dasril Yosza Dasril