Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : Jurnal Teknik Informatika (JUTIF)

A Comprehensive Benchmarking Pipeline for Transformer-Based Sentiment Analysis using Cross-Validated Metrics Abidin, Dodo Zaenal; Afuan, Lasmedi; Toscany, Afrizal Nehemia; Nurhadi, Nurhadi
Jurnal Teknik Informatika (Jutif) Vol. 6 No. 4 (2025): JUTIF Volume 6, Number 4, Agustus 2025
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2025.6.4.4894

Abstract

Transformer-based models have significantly advanced sentiment analysis in natural language processing. However, many existing studies still lack robust, cross-validated evaluations and comprehensive performance reporting. This study proposes an integrated benchmarking pipeline for sentiment classification on the IMDb dataset using BERT, RoBERTa, and DistilBERT. The methodology includes systematic preprocessing, stratified 5-fold cross-validation, and aggregate evaluation through confusion matrices, ROC and precision-recall (PR) curves, and multi-metric classification reports. Experimental results demonstrate that all models achieve high accuracy, precision, recall, and F1-score, with RoBERTa leading overall (94.1% mean accuracy and F1), followed by BERT (92.8%) and DistilBERT (92.1%). All models exceed 0.97 in ROC-AUC and PR-AUC, confirming strong discriminative capability. Compared to prior approaches, this pipeline enhances result robustness, interpretability, and reproducibility. The provided results and open-source code offer a reliable reference for future research and practical deployment. This study is limited to the IMDb dataset in English, suggesting future work on multilingual, cross-domain, and explainable AI integration.
Enhancing Fake News Detection on Imbalanced Data Using Resampling Techniques and Classical Machine Learning Models Abidin, Dodo Zaenal; Siswanto, Agus; Saputra, Chindra; Betantiyo , Betantiyo; Nehemia Toscany, Afrizal
Jurnal Teknik Informatika (Jutif) Vol. 6 No. 5 (2025): JUTIF Volume 6, Number 5, Oktober 2025
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2025.6.5.5177

Abstract

Class imbalance remains a critical challenge in fake news detection, particularly in domains such as entertainment media where class distributions are highly skewed. This study evaluates seven resampling techniques—Random Oversampling, SMOTE, ADASYN, Random Undersampling, Tomek Links, NearMiss, and No Resampling—applied to three classical machine learning models: Logistic Regression, Support Vector Machine (SVM), and Random Forest. Using the imbalanced GossipCop dataset comprising 24,102 news headlines, the proposed pipeline integrates TF-IDF vectorization, stratified 3-fold cross-validation, and five evaluation metrics: F1-score, precision, recall, ROC AUC, and PR AUC. Experimental results show that oversampling methods, particularly SMOTE and Random Oversampling, substantially improve minority class (fake news) detection. Among all model–resampling combinations, SVM with SMOTE achieved the highest performance (F1-score = 0.67, PR AUC = 0.74), demonstrating its robustness in handling imbalanced short-text classification. Conversely, undersampling methods frequently reduced recall, especially with ensemble models like Random Forest. This approach enhances model robustness in fake news detection on skewed datasets and contributes a reproducible, domain-specific framework for developing more reliable misinformation classifiers.