Citra Lestari
Universitas Ciputra Surabaya

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Penggunaan Kamus Singkatan Kata Bahasa Indonesia Sehari-Hari dalam Pembangkitan Fitur Teks Citra Lestari; Kenny Jihiro; Andreas Lim; Daniel Aprilio; Franciscus Valentinus
Jurnal Informatika Universitas Pamulang Vol 8, No 2 (2023): JURNAL INFORMATIKA UNIVERSITAS PAMULANG
Publisher : Teknik Informatika Universitas Pamulang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32493/informatika.v8i2.29306

Abstract

Natural Language Processing (NLP) research on Indonesian language is relatively slow compared to other languages, such as English or Chinese. Most of the researches are dealing with Indonesian formal textes. Some NLP researches that are dealing with Indoensian informal texts are having quite difficulty since Indonesian informal language usually combines formal language, daily language, and local language. In addition, there is a habit in Indoensians to use abbreviation in texting. These cause great difficulty in features generation process, where machines fail to identify stopwords and form lemmas from the bag of words. There are actually dictionaries that can be used to do lemming process for Indonesian forma language, daily language, local languages, and even Indoensian formal abbrevations. But there is stil no dictionary for Indoensian informal abbrevations. This research made an Indonesian informal abbrevations dictionary from 4000 Indonesian tweets.  The dictionary contains 706 unique abbrevations as its corpus. The dictionary then used to generate features. In this research, the features generation only used this dictionary to measure its signiicancy. The feature generation with the Indonesian informal abbrevations dictionary were tested with Indonesian tweets about Covid-19 Vaccine. The features generation process was able to identify 2262 abbrevations wotj 71,09% of them were identified as stopwords. To take a further step, the features generated then being tested to figure out their impact in sentiment analysis. The sentiment analysis used Multi-Layer Perceptron. Unfortunately, those features didn’t increase the performance of the sentiment analysis. The accuracy decreased by 3,5% while the precision, recall, and F1-Score decreased in range of 0,02 – 0,04. With this result, it can be concluded that the use of this dictionaty alone for lemming process is not enough. It needs to be combined with other dictionary to have more optimal result.