Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : International Journal of Electrical and Computer Engineering

Multi-label text classification of Indonesian customer reviews using bidirectional encoder representations from transformers language model Nuzulul Khairu Nissa; Evi Yulianti
International Journal of Electrical and Computer Engineering (IJECE) Vol 13, No 5: October 2023
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijece.v13i5.pp5641-5652

Abstract

Customer review is a critical resource to support the decision-making process in various industries. To understand how customers perceived each aspect of the product, we can first identify all aspects discussed in the customer reviews by performing multi-label text classification. In this work, we want to know the effectiveness of our two proposed strategies using bidirectional encoder representations from transformers (BERT) language model that was pre-trained on the Indonesian language, referred to as IndoBERT, to perform multi-label text classification. First, IndoBERT is used as feature representation to be combined with convolutional neural network-extreme gradient boosting (CNN-XGBoost). Second, IndoBERT is used both as the feature representation as well as the classifier to directly solve the classification task. Additional analysis is performed to compare our results with those using multilingual BERT model. According to our experimental results, our first model using IndoBERT as feature representation shows significant performance over some baselines. Our second model using IndoBERT as both feature representation and classifier can significantly enhance the effectiveness of our first model. In summary, our proposed models can improve the effectiveness of the baseline using Word2Vec-CNN-XGBoost by 19.19% and 6.17%, in terms of accuracy and F-1 score, respectively.
Enhanced TextRank using weighted word embedding for text summarization Evi Yulianti; Nicholas Pangestu; Meganingrum Arista Jiwanggi
International Journal of Electrical and Computer Engineering (IJECE) Vol 13, No 5: October 2023
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijece.v13i5.pp5472-5482

Abstract

The length of a news article may influence people’s interest to read the article. In this case, text summarization can help to create a shorter representative version of an article to reduce people’s read time. This paper proposes to use weighted word embedding based on Word2Vec, FastText, and bidirectional encoder representations from transformers (BERT) models to enhance the TextRank summarization algorithm. The use of weighted word embedding is aimed to create better sentence representation, in order to produce more accurate summaries. The results show that using (unweighted) word embedding significantly improves the performance of the TextRank algorithm, with the best performance gained by the summarization system using BERT word embedding. When each word embedding is weighed using term frequency-inverse document frequency (TF-IDF), the performance for all systems using unweighted word embedding further significantly improve, with the biggest improvement achieved by the systems using Word2Vec (with 6.80% to 12.92% increase) and FastText (with 7.04% to 12.78% increase). Overall, our systems using weighted word embedding can outperform the TextRank method by up to 17.33% in ROUGE-1 and 30.01% in ROUGE-2. This demonstrates the effectiveness of weighted word embedding in the TextRank algorithm for text summarization.