Indonesian Journal of Electrical Engineering and Computer Science
Vol 35, No 2: August 2024

The impact of feature extraction techniques on the performance of text data classification models

Maiti, Abdallah (Unknown)
Abarda, Abdallah (Unknown)
Hanini, Mohamed (Unknown)



Article Info

Publish Date
01 Aug 2024

Abstract

Sentiment analysis is a crucial discipline that focuses on the interpretation of feelings and points of view in textual data. Our study aims to assess the impact of different feature extraction methods on the accuracy of opinion research models. Techniques such as bag-of-words (BoW), term frequency-inverse document frequency (TF-IDF), Word2Vec, global vectors (GloVe) and bidirectional encoder representations from transformers (BERT) were used with three machine learning algorithms and three deep learning networks as classifiers. The IMDB movie review dataset was used for evaluation. The results showed that combining BERT with LSTM, CNN and RNN improved performance, achieving an accuracy rate of 94%, precision of 94.14%, recall of 93.27% and an F1 score of 89.33%. These results highlight the significant contribution of ERTB to model performance, outperforming other feature extraction techniques in text classification. The study concludes that the fusion of BERT and LSTM significantly improves model accuracy for opinion retrieval, recommending BERT as the main feature extraction method for optimizing performance in NLP tasks.

Copyrights © 2024