Garuda - Garba Rujukan Digital

Jurnal Teknik Informatika (JUTIF)

Vol. 6 No. 4 (2025): JUTIF Volume 6, Number 4, Agustus 2025

Abidin, Dodo Zaenal (Unknown)
Afuan, Lasmedi (Unknown)
Toscany, Afrizal Nehemia (Unknown)
Nurhadi, Nurhadi (Unknown)

Publish Date
18 Aug 2025

Transformer-based models have significantly advanced sentiment analysis in natural language processing. However, many existing studies still lack robust, cross-validated evaluations and comprehensive performance reporting. This study proposes an integrated benchmarking pipeline for sentiment classification on the IMDb dataset using BERT, RoBERTa, and DistilBERT. The methodology includes systematic preprocessing, stratified 5-fold cross-validation, and aggregate evaluation through confusion matrices, ROC and precision-recall (PR) curves, and multi-metric classification reports. Experimental results demonstrate that all models achieve high accuracy, precision, recall, and F1-score, with RoBERTa leading overall (94.1% mean accuracy and F1), followed by BERT (92.8%) and DistilBERT (92.1%). All models exceed 0.97 in ROC-AUC and PR-AUC, confirming strong discriminative capability. Compared to prior approaches, this pipeline enhances result robustness, interpretability, and reproducibility. The provided results and open-source code offer a reliable reference for future research and practical deployment. This study is limited to the IMDb dataset in English, suggesting future work on multilingual, cross-domain, and explainable AI integration.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Jurnal Teknik Informatika (JUTIF)

Website

Abbrev

jurnal

Publisher

Universitas Jenderal Soedirman

Subject

Computer Science & IT

Description

Jurnal Teknik Informatika (JUTIF) is an Indonesian national journal, publishes high-quality research papers in the broad field of Informatics, Information Systems and Computer Science, which encompasses software engineering, information system development, computer systems, computer network, ...

Article Info

Abstract

A Comprehensive Benchmarking Pipeline for Transformer-Based Sentiment Analysis using Cross-Validated Metrics

Article Info

Abstract