Claim Missing Document
Check
Articles

Found 24 Documents
Search

Effects of Data Resampling on Predicting Customer Churn via a Comparative Tree-based Random Forest and XGBoost Rita Erhovwo Ako; Fidelis Obukohwo Aghware; Margaret Dumebi Okpor; Maureen Ifeanyi Akazue; Rume Elizabeth Yoro; Arnold Adimabua Ojugo; De Rosal Ignatius Moses Setiadi; Chris Chukwufunaya Odiakaose; Reuben Akporube Abere; Frances Uche Emordi; Victor Ochuko Geteloma; Patrick Ogholuwarami Ejeh
Journal of Computing Theories and Applications Vol. 2 No. 1 (2024): JCTA 2(1) 2024
Publisher : Universitas Dian Nuswantoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62411/jcta.10562

Abstract

Customer attrition has become the focus of many businesses today – since the online market space has continued to proffer customers, various choices and alternatives to goods, services, and products for their monies. Businesses must seek to improve value, meet customers' teething demands/needs, enhance their strategies toward customer retention, and better monetize. The study compares the effects of data resampling schemes on predicting customer churn for both Random Forest (RF) and XGBoost ensembles. Data resampling schemes used include: (a) default mode, (b) random-under-sampling RUS, (c) synthetic minority oversampling technique (SMOTE), and (d) SMOTE-edited nearest neighbor (SMOTEEN). Both tree-based ensembles were constructed and trained to assess how well they performed with the chi-square feature selection mode. The result shows that RF achieved F1 0.9898, Accuracy 0.9973, Precision 0.9457, and Recall 0.9698 for the default, RUS, SMOTE, and SMOTEEN resampling, respectively. Xgboost outperformed Random Forest with F1 0.9945, Accuracy 0.9984, Precision 0.9616, and Recall 0.9890 for the default, RUS, SMOTE, and SMOTEEN, respectively. Studies support that the use of SMOTEEN resampling outperforms other schemes; while, it attributed XGBoost enhanced performance to hyper-parameter tuning of its decision trees. Retention strategies of recency-frequency-monetization were used and have been found to curb churn and improve monetization policies that will place business managers ahead of the curve of churning by customers.
Outlier Detection Using Gaussian Mixture Model Clustering to Optimize XGBoost for Credit Approval Prediction De Rosal Ignatius Moses Setiadi; Ahmad Rofiqul Muslikh; Syahroni Wahyu Iriananda; Warto Warto; Jutono Gondohanindijo; Arnold Adimabua Ojugo
Journal of Computing Theories and Applications Vol. 2 No. 2 (2024): JCTA 2(2) 2024
Publisher : Universitas Dian Nuswantoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62411/jcta.11638

Abstract

Credit approval prediction is one of the critical challenges in the financial industry, where the accuracy and efficiency of credit decision-making can significantly affect business risk. This study proposes an outlier detection method using the Gaussian Mixture Model (GMM) combined with Extreme Gradient Boosting (XGBoost) to improve prediction accuracy. GMM is used to detect outliers with a probabilistic approach, allowing for finer-grained anomaly identification compared to distance- or density-based methods. Furthermore, the data cleaned through GMM is processed using XGBoost, a decision tree-based boosting algorithm that efficiently handles complex datasets. This study compares the performance of XGBoost with various outlier detection methods, such as LOF, CBLOF, DBSCAN, IF, and K-Means, as well as various other classification algorithms based on machine learning and deep learning. Experimental results show that the combination of GMM and XGBoost provides the best performance with an accuracy of 95.493%, a recall of 91.650%, and an AUC of 95.145%, outperforming other models in the context of credit approval prediction on an imbalanced dataset. The proposed method has been proven to reduce prediction errors and improve the model's reliability in detecting eligible credit applications.
Feature Fusion with Albumentation for Enhancing Monkeypox Detection Using Deep Learning Models Nizar Rafi Pratama; De Rosal Ignatius Moses Setiadi; Imanuel Harkespan; Arnold Adimabua Ojugo
Journal of Computing Theories and Applications Vol. 2 No. 3 (2025): JCTA 2(3) 2025
Publisher : Universitas Dian Nuswantoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62411/jcta.12255

Abstract

Monkeypox is a zoonotic disease caused by Orthopoxvirus, presenting clinical challenges due to its visual similarity to other dermatological conditions. Early and accurate detection is crucial to prevent further transmission, yet conventional diagnostic methods are often resource-intensive and time-consuming. This study proposes a deep learning-based classification model by integrating Xception and InceptionV3 using feature fusion to enhance performance in classifying Monkeypox skin lesions. Given the limited availability of annotated medical images, data augmentation was applied using Albumentation to improve model generalization. The proposed model was trained and evaluated on the Monkeypox Skin Lesion Dataset (MSLD), achieving 85.96% accuracy, 86.47% precision, 85.25% recall, 78.43% specificity, and an AUC score of 0.8931, outperforming existing methods. Notably, data augmentation significantly improved recall from 81.23% to 85.25%, demonstrating its effectiveness in enhancing sensitivity to positive cases. Ablation studies further validated that augmentation increased overall accuracy from 82.02% to 85.96%, emphasizing its role in improving model robustness. Comparative analysis with other models confirmed the superiority of our approach. This research enhances automated Monkeypox detection, offering a robust and efficient tool for low-resource clinical settings. The findings reinforce the potential of feature fusion and augmentation in improving deep learn-ing-based medical image classification, facilitating more reliable and accessible disease identification.
Integrating Hybrid Statistical and Unsupervised LSTM-Guided Feature Extraction for Breast Cancer Detection De Rosal Ignatius Moses Setiadi; Arnold Adimabua Ojugo; Octara Pribadi; Etika Kartikadarma; Bimo Haryo Setyoko; Suyud Widiono; Robet Robet; Tabitha Chukwudi Aghaunor; Eferhire Valentine Ugbotu
Journal of Computing Theories and Applications Vol. 2 No. 4 (2025): JCTA 2(4) 2025
Publisher : Universitas Dian Nuswantoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62411/jcta.12698

Abstract

Breast cancer is the most prevalent cancer among women worldwide, requiring early and accurate diagnosis to reduce mortality. This study proposes a hybrid classification pipeline that integrates Hybrid Statistical Feature Selection (HSFS) with unsupervised LSTM-guided feature extraction for breast cancer detection using the Wisconsin Diagnostic Breast Cancer (WDBC) dataset. Initially, 20 features were selected using HSFS based on Mutual Information, Chi-square, and Pearson Correlation. To address class imbalance, the training set was balanced using the Synthetic Minority Over-sampling Technique (SMOTE). Subsequently, an LSTM encoder extracted non-linear latent features from the selected features. A fusion strategy was applied by concatenating the statistical and latent features, followed by re-selection of the top 30 features. The final classification was performed using a Support Vector Machine (SVM) with RBF kernel and evaluated using 5-fold cross-validation and a held-out test set. Experimental results showed that the proposed method achieved an average training accuracy of 98.13%, F1-score of 98.13%, and AUC-ROC of 99.55%. On the held-out test set, the model reached an accuracy of 99.30%, precision of 100%, and F1-score of 99.05%, with an AUC-ROC of 0.9973. The proposed pipeline demonstrates improved generalization and interpretability compared to existing methods such as LightGBM-PSO, DHH-GRU, and ensemble deep networks. These results highlight the effectiveness of combining statistical selection and LSTM-based latent feature encoding in a balanced classification framework.