Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : Scientific Journal of Informatics

Integrating C4.5 and K-Nearest Neighbor Imputation with Relief Feature Selection for Enhancing Breast Cancer Diagnosis Purwinarko, Aji; Budiman, Kholiq; Widiyatmoko, Arif; Sasi, Fitri Arum; Hardyanto, Wahyu
Scientific Journal of Informatics Vol. 12 No. 1: February 2025
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v12i1.21673

Abstract

Purpose: Breast cancer remains a significant cause of mortality among women, requiring accurate diagnostic methods. Traditional classification models often face accuracy challenges due to missing values and irrelevant features. This investigation advances the classification of breast cancer through the amalgamation of the C4.5 algorithm with K-Nearest Neighbor (KNN) imputation and Relief feature selection methodologies, thereby augmenting data integrity and enhancing classification efficacy. Methods: The Wisconsin Breast Cancer Database (WBCD) was the core reference for evaluating the proposed methodology. KNN imputation addressed missing values, while Relief selected the most relevant features. The C4.5 algorithm executed training by utilizing data segregations in the corresponding proportions of 70:30, 80:20, and 90:10, with its efficiency gauged through a range of metrics, particularly accuracy, precision, recall, and F1-score. Result: This innovative methodology achieved the highest classification accuracy of 98.57%, surpassing several existing models. Particularly noteworthy, the strategy being analyzed exhibited remarkable success relative to PSO-C4.5 (96.49%), EBL-RBFNN (98.40%), Gaussian Naïve Bayes (97.50%), and t-SNE (98.20%), demonstrating associated advancements of 2.08%, 0.17%, 1.07%, and 0.37%. These results confirm its effectiveness in handling missing values and selecting relevant features. Novelty: Unlike prior studies that addressed missing values and feature selection separately, this research integrates both techniques, enhancing classification accuracy and computational efficiency. The findings suggest that this approach provides a reliable breast cancer diagnosis method. Future work could explore deep learning integration and validation on larger datasets to improve generalizability.
Improving Sentiment Analysis with a Context-Aware RoBERTa–BiLSTM and Word2Vec Branch Hardyanto, Wahyu; Aryani, Nila Prasetya; Andestian, Defin; Sugiyanto; Setyaningrum, Wahyu; Mardiansyah, M Fadil; Islam, Muhamad Anbiya Nur; Purwinarko, Aji
Scientific Journal of Informatics Vol. 12 No. 4: November 2025
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v12i4.35918

Abstract

Purpose: We improve the accuracy of Twitter/X sentiment analysis with a hybrid model combining Word2Vec and the Robustly Optimized BERT Pretraining Approach (RoBERTa). However, Twitter/X text is noisy (slang/OOV) and ambiguous, so the performance of the pre-trained transformer decreases. Word2Vec is also limited to local contexts. Integrative studies of both are still limited. The idea is that Word2Vec is strong for slang/novel vocabulary (distributional semantics), while RoBERTa excels in contextual meaning; combining the two mitigates each other's weaknesses. Methods: The Sentiment140 dataset contains 1.6 million balanced tweets. The split is stratified; Word2Vec is trained solely on the training data. RoBERTa is pretrained (frozen in the first stage, then fine-tuned with some layers in the second stage). The Word2Vec and RoBERTa vectors are concatenated and processed using Bidirectional Long Short-Term Memory (BiLSTM) with sigmoid activation. Training utilizes TensorFlow and the Adam optimizer, incorporating dropout and early stopping. The decision threshold is optimized during the validation process. Result: The hybrid model achieved an accuracy of 88.09%, an F1-score of 88.09%, and an Area Under the Curve (AUC) ≈ 95.19% on the Receiver Operating Characteristic (ROC). No overfitting was observed, and the hybrid model outperformed both single baselines. The confusion matrix and ROC curve corroborate the findings. Novelty: The novelty lies in the fusion of distributional and contextual representations with a structured fusion mechanism. Limitations: Computational requirements and hyperparameter tuning are not yet extensive. Further directions: Systematic hyperparameter search and cross-validation across other large sentiment datasets to assess generalization.
Co-Authors - Kardoyo -, Murwatiningsih A.A. Ketut Agung Cahyawan W Abdur Rasyid Achmad Slamet Achmad Sopyan, Achmad Agus Hermanto Agus Priyono Aji Purwinarko, Aji Al Asy'Ari, Hasan Ali Djamhuri Amir Mahmud Amir Yusuf Andestian, Defin Andi Mardiana Paduppai Ani Rusilowati Arif Munandar Arif Widiyatmoko, Arif Arif Widyatmoko Arik Pujiyanti Aris Budiyono Aryani, Nila Prasetya Aryono Adhi, Aryono Azwar Anas Budi Naini Mindyarto, Budi Naini Budiman, Kholiq Cheristiyanto Cheristiyanto, Cheristiyanto Darsono, Teguh Dedi Prestiadi DEWI RAHMAWATI Djuniadi Djuniadi Durhan, Ferdinandus Durhan, Ferdinandus Dyah Maya Nihayah Ellianawati, Ellianawati Ernawati, Turini Ernawati, Turini Fitri Arum Sasi, Fitri Arum Gita Ayu Permatasari Guntur Cahyono, Guntur Hari Wibawanto Harjito Kusmanto Haryono Haryono Hidayanti, Arifatul I Gusti Agung Ketut Yoga, I Gusti Agung Ketut Indah Kusumaningrum, Wiwik Intana, Nila Muna Irfana, Shiva Isa Akhlis Islam, Muhamad Anbiya Nur Isnayanti, Riska Sukma Kosam, Darussalam Kosam, Darussalam Lestari, Putri Yunia Made Sudana, I Made Sudana, I Maman Rachman Mardiansyah, M Fadil Maria, Ulfa Martuti, Martuti Martuti, Martuti Masrukan Masrukan Masrukhi Masrukhi Masturi Masturi Meinofelia, Erika Milah, Isna Lukluil Mubasir, Yazid Mujiono - Mukarromah, Syarifatul Muslimah Susilayati Muslimah Susilayati Noor, Muhammad Elfin Noor, Muhammad Elfin Nurfiyani, Novi Tri Nurhasan Ropii Parmin Parmin Phany Ineke Putri, Phany Ineke Pranoto, Alfian Setya Putut Marwoto Retno Sri Iswari Sadimin Sadimin Saiful Ridlo Samsu, Ridwan Samsu, Ridwan Siswoyo, Rasdi Eko Siswoyo, Rasdi Eko Siti Wahyuni Siti Wahyuni Soesanto Soesanto Sri Wardhani Sucihatiningsih DWP Sugianto Sugianto SUGIYANTO Sugiyo Sugiyo, Sugiyo Sugiyono Sugiyono Suharto Linuwih Sulhadi Suprihatini, Noerhidayah Suprihatini, Noerhidayah Supriyadi Supriyadi Supriyadi Supriyadi Supriyadi Supriyadi Supriyadi Supriyadi Sutikno Sutikno Suwito Eko Pramono Titi Prihatin Tri Joko Raharjo Ummi Fauziyah Veronika Ndapaloka, Veronika Wahyu Setyaningrum Widya Nugraheni Widiningrum Wijanayu, Adiratna Wiji Wahyudi, Urip Muhayat Wiji Wahyudi, Urip Muhayat Winarto, Anton Windrajaya, Eko Rudy Wiujianna, Atri Wiwi Isnaeni Woro Sumarni Yeni Rima Liana Yozi Aulia Rahman, Yozi Aulia Yulita Nurbaiti