Scientific Journal of Informatics
Vol. 12 No. 4: November 2025

Improving Sentiment Analysis with a Context-Aware RoBERTa–BiLSTM and Word2Vec Branch

Hardyanto, Wahyu (Unknown)
Aryani, Nila Prasetya (Unknown)
Andestian, Defin (Unknown)
Sugiyanto (Unknown)
Setyaningrum, Wahyu (Unknown)
Mardiansyah, M Fadil (Unknown)
Islam, Muhamad Anbiya Nur (Unknown)
Purwinarko, Aji (Unknown)



Article Info

Publish Date
16 Jan 2026

Abstract

Purpose: We improve the accuracy of Twitter sentiment analysis with a hybrid model combining Word to Vector (Word2Vec) and the Robustly Optimized BERT Pretraining Approach (RoBERTa). The idea is that Word2Vec is strong for slang/novel vocabulary (distributional semantics), while RoBERTa excels in contextual meaning; combining the two mitigates each other's weaknesses. Methods/Study design/approach: The Sentiment140 dataset contains 1.6 million balanced tweets. The split is stratified; Word2Vec is trained solely on the training data. RoBERTa is pretrained (frozen in the first stage, then fine-tuned with some layers in the second stage). The Word2Vec and RoBERTa vectors are concatenated and processed using Bidirectional Long Short-Term Memory (BiLSTM) with sigmoid activation. Training utilizes TensorFlow and the Adam optimizer, incorporating dropout and early stopping. The decision threshold is optimized during the validation process. The process supports caching and training resumes. Result/Findings: The hybrid model achieved an accuracy of 88.09%, an F1-score of 88.09 %, and an Area Under the Curve (AUC) ≈ 95.19% on the Receiver Operating Characteristic (ROC). No overfitting was observed, and the hybrid model outperformed both single baselines. The confusion matrix and ROC curve corroborate the findings. Novelty/Originality/Value: The novelty lies in the fusion of distributional and contextual representations with resource-efficient fine-tuning. Limitations: Computational requirements and hyperparameter tuning are not yet extensive. Further directions: systematic hyperparameter search and cross-validation across other large sentiment datasets to assess generalization.

Copyrights © 2025






Journal Info

Abbrev

sji

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management Electrical & Electronics Engineering Engineering

Description

Scientific Journal of Informatics (p-ISSN 2407-7658 | e-ISSN 2460-0040) published by the Department of Computer Science, Universitas Negeri Semarang, a scientific journal of Information Systems and Information Technology which includes scholarly writings on pure research and applied research in the ...