Building of Informatics, Technology and Science
Vol 6 No 2 (2024): September 2024

Optimasi LSTM Mengurangi Overfitting untuk Klasifikasi Teks Menggunakan Kumpulan Data Ulasan Film Kaggle IMDB

Alkhairi, Putrama (Unknown)
Windarto, Agus Perdana (Unknown)
Efendi, Muhamad Masjun (Unknown)



Article Info

Publish Date
16 Sep 2024

Abstract

This study aims to develop and optimize a Long Short-Term Memory (LSTM) model to reduce overfitting in text classification using the Kaggle IMDB movie review dataset. Overfitting is a common problem in machine learning that causes the model to overfit to the training data, thus degrading its performance on the test data. In this study, various optimization techniques such as regularization, dropout, and careful training methods are applied to improve the generalization of the LSTM model. This study shows that overfitting reduction techniques, such as dropout and the use of the RMSProp optimizer, significantly improve the performance of the Long Short-Term Memory (LSTM) model in IMDB movie review text classification. The optimized LSTM model achieves an accuracy of 83.45%, an increase of 2.07% compared to the standard model which has an accuracy of 81.38%. The precision of the optimized model increases to 89.65%, compared to 84.46% in the standard model, although the recall is slightly lower (75.69% compared to 76.91%). The F1-score of the optimized model is also higher, which is 82.07% compared to 80.53% in the standard model. The experimental results show that the techniques successfully improve the accuracy and reliability of the text classification model, with better performance on the test data. This research makes a significant contribution to understanding and overfitting in deep learning models in the context of natural language processing, and offers insights into best practices in applying LSTM models to text classification.

Copyrights © 2024






Journal Info

Abbrev

bits

Publisher

Subject

Computer Science & IT

Description

Building of Informatics, Technology and Science (BITS) is an open access media in publishing scientific articles that contain the results of research in information technology and computers. Paper that enters this journal will be checked for plagiarism and peer-rewiew first to maintain its quality. ...