Garuda - Garba Rujukan Digital

Jurnal Teknik Informatika (JUTIF)

Vol. 7 No. 2 (2026): JUTIF Volume 7, Number 2, April 2026

Hidayat, Zidane (Unknown)
Cahyono, Hasan Dwi (Unknown)
Muslim, Fajar (Unknown)

Publish Date
15 Apr 2026

The effectiveness of cyberbullying detection is influenced by the availability of sufficient, diverse, and contextually rich training data, which is often limited in low-resource languages such as Indonesian. To address dataset limitations, researchers have extensively explored data augmentation (DA) as a promising approach to improving model performance. DA generates new data instances by applying transformations to existing data, thereby increasing both dataset size and variability. Prior studies have demonstrated that applying Easy Data Augmentation (EDA) with Support Vector Machine (SVM) classification improved cyberbullying detection performance, even when it faced challenges in capturing semantic and contextual nuances. In this paper, we investigated Indonesian DA methods using the Transformer-based GPT-2 model. The augmented sentences were evaluated and filtered based on context, semantics, diversity, and novelty, with similarity measures such as Euclidean Distance (ED), Cosine Similarity (CS), Jaccard Similarity (JS), and BLEU Score (BLS) ensuring the quality of the augmentation. Furthermore, we compared text classification performance using both SVM and the Transformer-based ALBERT model. Experimental results revealed that incorporating similarity measures and GPT-2 as a DA method failed to improve cyberbullying detection performance, potentially due to the semantic drift introduced by GPT-2 and the inadequacy of similarity measures in capturing nuanced contextual information. However, we found that ALBERT outperformed SVM as a classification model, achieving average F1-scores of 91.77% and 91.72%, respectively. This study contributes to the informatics field by exploring the potential of Transformer-based augmentation and similarity evaluation in enhancing low-resource text classification, while acknowledging the limitations in data quality and model adaptation.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Jurnal Teknik Informatika (JUTIF)

Website

Abbrev

jurnal

Publisher

Universitas Jenderal Soedirman

Subject

Computer Science & IT

Description

Jurnal Teknik Informatika (JUTIF) is an Indonesian national journal, publishes high-quality research papers in the broad field of Informatics, Information Systems and Computer Science, which encompasses software engineering, information system development, computer systems, computer network, ...

Article Info

Abstract

Comparative Analysis of GPT-2 Augmentation, ALBERT, and Similarity Measures for Cyberbullying Detection

Article Info

Abstract