Setyo Wahyu Saputro
Lambung Mangkurat University

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Enhancing Software Defect Prediction through Hybrid Multi-Filter Feature Selection and Imbalance Handling Muhammad Khalid Maulana; Setyo Wahyu Saputro; Mohammad Reza Faisal; Radityo Adi Nugroho; As’ary Ramadhan
Journal of Computing Theories and Applications Vol. 3 No. 4 (2026): JCTA 3(4) 2026
Publisher : Universitas Dian Nuswantoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62411/jcta.15943

Abstract

Software Defect Prediction (SDP) aims to identify defective modules early in the software development lifecycle to improve software quality and reduce maintenance costs. However, SDP datasets commonly suffer from high dimensionality, feature redundancy, and class imbalance, which can degrade model performance and stability. This study proposes a hybrid feature selection framework to address these challenges and enhance prediction performance. The proposed approach integrates Combined Correlation and Mutual Information (CONMI), which combines the Pearson Correlation Coefficient (PCC) and Mutual Information (MI) to capture both linear and nonlinear feature relevance. The selected features are further refined through Top-K selection, correlation-based filtering to reduce multicollinearity, and Backward Elimination (BE) to obtain an optimal feature subset. To address class imbalance, SMOTE-Tomek is applied by combining over-sampling and data cleaning techniques. Experiments are conducted on twelve NASA MDP datasets using Logistic Regression (LR) and Naïve Bayes (NB) classifiers. The results show that the proposed framework consistently achieves the best performance, with Logistic Regression combined with SMOTE-Tomek obtaining the highest average AUC of 0.7923 ± 0.0714, while NB achieves 0.7554 ± 0.0580. Statistical analysis using a paired t-test indicates that the proposed method significantly outperforms MI+SMOTE-Tomek and BE+SMOTE-Tomek for Logistic Regression, whereas no significant differences are observed for NB. In addition to improving overall classification performance (AUC), the proposed approach also enhances minority class detection, as reflected in improved Recall and F1-score. Overall, the proposed hybrid framework provides an effective and reliable solution for software defect prediction, particularly for high-dimensional and imbalanced datasets.
Quantifying the Impact of Text Preprocessing on IndoBERT Fine-Tuning for Indonesian Informal Culinary Sentiment Analysis Rahmat Budianoor; Setyo Wahyu Saputro; Friska Abadi; Radityo Adi Nugroho; Andi Farmadi
Journal of Computing Theories and Applications Vol. 3 No. 4 (2026): JCTA 3(4) 2026
Publisher : Universitas Dian Nuswantoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62411/jcta.15980

Abstract

Indonesian culinary comments on social media platforms such as Instagram are characterized by informal spelling, regional language mixing, slang expressions, and emojis, posing substantial challenges for automated sentiment classification. While IndoBERT has demonstrated strong performance across Indonesian natural language processing tasks, the contribution of individual preprocessing components to fine-tuning performance on informal text remains underexplored, particularly in the culinary domain. This study addresses this gap by conducting a systematic preprocessing ablation study on IndoBERT-Base fine-tuning for Indonesian culinary sentiment classification, accompanied by a comparative evaluation against Naive Bayes with TF-IDF, SVM with TF-IDF, and BiLSTM as representative baselines. A dataset of 3,500 manually labeled Instagram culinary comments across three sentiment classes was used, with a stratified 80/10/10 split. Six preprocessing variants were evaluated under identical experimental conditions to isolate the contribution of each component. The results show that slang normalization is the most impactful single preprocessing step, yielding a macro F1-score gain of +0.0609 over the no-preprocessing baseline, while the full pipeline achieves an accuracy of 0.8800 and a macro F1-score of 0.8465. IndoBERT-Base with the full pipeline outperforms all baselines across all evaluation metrics. Per-class analysis reveals that the negative class achieves the lowest F1-score of 0.7600, with sarcastic expressions and Banjar regional vocabulary identified as primary sources of misclassification. These findings indicate that preprocessing decisions have a measurable and non-uniform effect on IndoBERT fine-tuning performance. In this study, slang normalization provides the most substantial individual contribution in bridging the vocabulary gap between informal user-generated text and the model’s pre-training distribution.