Journal of Applied Data Sciences
Vol 6, No 3: September 2025

Enhancing Aspect-Based Sentiment Analysis in Tourism Reviews Through Hybrid Data Augmentation

Iswari, Ni Made Satvika (Unknown)
Afriliana, Nunik (Unknown)



Article Info

Publish Date
20 Jul 2025

Abstract

The increasing reliance on online reviews in tourism has made User-Generated Content (UGC) an invaluable resource for understanding visitor perceptions. However, extracting meaningful insights from these reviews remains challenging due to their unstructured nature, aspect imbalance, and the prevalence of code-mixing between languages such as Indonesian and English—particularly in multicultural destinations like Bali. Aspect-Based Sentiment Analysis (ABSA) offers a promising solution by associating sentiment polarity with specific aspects of tourist experiences. Yet, its performance is often constrained by limited and imbalanced datasets, especially for underrepresented aspects such as sanitation and amenities. This study proposes a hybrid data augmentation framework that integrates three complementary strategies: generative augmentation using ChatGPT, semantic filtering via Sentence-BERT (SBERT), and domain refinement through Masked Language Modeling (MLM). The framework is designed to improve ABSA performance on multilingual tourism reviews by generating synthetic aspect-relevant data while preserving semantic integrity and contextual nuance. Using 398 reviews of Kuta Beach in Bali, we evaluate the effectiveness of the proposed approach across five tourism aspects: scenery, dusk, surf, amenities, and sanitation. Results show that the hybrid strategy reduces hallucination rates from 12% (using ChatGPT alone) to 3.8%, increases F1-scores for underrepresented aspects by up to 5.1%, and improves cross-lingual alignment (Cohen’s κ = 0.78). These improvements demonstrate the synergy between generative and semantic augmentation in addressing real-world ABSA challenges. The proposed method not only advances the state of multilingual ABSA but also offers practical implications for tourism analytics, allowing destination managers to better understand and respond to aspect-specific visitor feedback. The framework is extensible to other low-resource domains, were linguistic diversity and data scarcity present similar limitations.

Copyrights © 2025






Journal Info

Abbrev

JADS

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management

Description

One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes ...