Engineering, Mathematics and Computer Science Journal (EMACS)
Vol. 7 No. 2 (2025): EMACS

The Impact of Text Preprocessing in Sarcasm Detection on Indonesian Social Media Contents

Jeremy, Nicholaus Hendrik (Unknown)



Article Info

Publish Date
31 May 2025

Abstract

Sarcasm is a way to convey something but delivered in the opposite way. This behavior is common on social media, where there are plenty of examples. On natural language processing, the task on its own is difficult primarily due to the lack of context. To add another layer of difficulty, communication in social media is done colloquially. One sacrasm benchmark, IdSarcasm, has alleviated one key issue in the development of sarcasm detection. However, there has not been an attempt to further preprocess the input before feeding them into the model. Pre-trained language models always use preprocessed corpus to ensure that the model is built upon quality dataset. Based on the current condition of IdSarcasm, further preprocessing step is necessary to ensure better quality. Specifically, the additional steps needed are handling HTML code, code-mixing, and colloquial writing which consists of shortened form, extended form, spelling variation, and reduplication. Several scenarios are created to observe the effect of additional preprocessing steps. Each additional preprocessing step is also tested to observe the effect of the preprocessing step independently. We prove that preprocessing step is still prevalent for data sourced from social media, and we recommend IndoNLU’s IndoBERT or large multilingual model to be used for sarcasm classification.

Copyrights © 2025






Journal Info

Abbrev

EMACS

Publisher

Subject

Civil Engineering, Building, Construction & Architecture Computer Science & IT Engineering Industrial & Manufacturing Engineering Mathematics

Description

Engineering, MAthematics and Computer Science (EMACS) Journal invites academicians and professionals to write their ideas, concepts, new theories, or science development in the field of Information Systems, Architecture, Civil Engineering, Computer Engineering, Industrial Engineering, Food ...