Journal of Computing Theories and Applications
Vol. 1 No. 4 (2024): JCTA 1(4) 2024

SentiGEN: Synthetic Data Generator for Sentiment Analysis

Sundarreson, Pushpika (Unknown)
Kumarapathirage, Sapna (Unknown)



Article Info

Publish Date
27 Apr 2024

Abstract

Obtaining high-quality, diverse, accurate datasets for sentiment analysis has always been a significant challenge. Traditional approaches include annotators, which may introduce bias to datasets and are also time-consuming and expensive. These types of datasets may also not represent the variety needed to train robust and generalizable sentiment analysis models. This study introduces a novel combination of techniques to approach the problem with a novel solution. The proposed system, SentiGEN includes the use of a transformer, T5, fine-tuned and optimized using an evolutionary algorithm to generate high-quality, diverse, accurate data for sentiment analysis. The generated data is validated using XLNet to ensure high sentiment accuracy. This combination of technologies has proven successful based on the results derived from evaluating multiple models. From complex transformers such as BERT to more straightforward approaches like KNN, those trained using synthetic data demonstrated superior performance compared to their counterparts trained on real data. This enhancement in predictive accuracy was observed when evaluated on benchmark datasets such as SST-2 and Yelp. SentiGEN can generate high-quality, diverse, accurate, realistic data for sentiment analysis and successfully increased the performance of models trained on synthetic data compared to the same model trained on real data.

Copyrights © 2024






Journal Info

Abbrev

jcta

Publisher

Subject

Computer Science & IT Decision Sciences, Operations Research & Management

Description

Journal of Computing Theories and Applications (JCTA) is a refereed, international journal that covers all aspects of foundations, theories and the practical applications of computer science. FREE OF CHARGE for submission and publication. All accepted articles will be published online and accessed ...