Jurnal Ilmu Siber dan Teknologi Digital
Vol. 1 No. 1 (2022): November

Model Klasifikasi Berbasis Multiclass Classification dengan Kombinasi Indobert Embedding dan Long Short-Term Memory untuk Tweet Berbahasa Indonesia

Thariq Iskandar Zulkarnain Maulana Putra (Program Studi Ilmu Komputer, Departemen Ilmu Komputer dan Elektronika, Fakultas Matematika dan Ilmu Pengetahuan Alam, Universitas Gadjah Mada, Yogyakarta)
Suprapto Suprapto (Program Studi Ilmu Komputer, Departemen Ilmu Komputer dan Elektronika, Fakultas Matematika dan Ilmu Pengetahuan Alam, Universitas Gadjah Mada, Yogyakarta)
Arif Farhan Bukhori (Program Studi Ilmu Komputer, Departemen Ilmu Komputer dan Elektronika, Fakultas Matematika dan Ilmu Pengetahuan Alam, Universitas Gadjah Mada, Yogyakarta)



Article Info

Publish Date
11 Nov 2022

Abstract

Purpose: This research aims to improve the performance of the text classification model from previous studies, by combining the IndoBERT pre-trained model with the Long Short-Term Memory (LSTM) architecture in classifying Indonesian-language tweets into several categories. Method: The classification text based on multiclass classification was used in this research, combined with pre-trained IndoBERT namely Long Short-Term Memory (LTSM). The dataset was taken using crawling method from API Twitter. Then, it will be compared with Word2Vec-LTSM and fined-tuned IndoBERT. Result: The IndoBERT-LSTM model with the best hyperparameter combination scenario (batch size of 16, learning rate of 2e-5, and using average pooling) managed to get an F1-score of 98.90% on the unmodified dataset (0.70% increase from the Word2Vec-LSTM model and 0.40% from the fine-tuned IndoBERT model) and 92.83% on the modified dataset (4.51% increase from the Word2Vec-LSTM model and 0.69% from the fine-tuned IndoBERT model). However, the improvement from the fine-tuned IndoBERT model is not very significant and the Word2Vec-LSTM model has a much faster total training time.

Copyrights © 2022






Journal Info

Abbrev

jisted

Publisher

Subject

Computer Science & IT Education Engineering Other

Description

Jurnal Ilmu Siber dan Teknologi Digital (JISTED) is a national, open-access and peer-reviewed journal welcoming high-quality manuscripts of original articles, reports and literature reviews in the field of software engineering and information technology. Jurnal Ilmu Siber dan Teknologi Digital ...