So, Raymond
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Multiword target-independent transformer-based model for financial sentiment analysis in colloquial Cantonese Chun Fai Chu, Carlin; So, Raymond; Kan Lam Kwong, Ernest; Chan, Andy
Bulletin of Electrical Engineering and Informatics Vol 14, No 3: June 2025
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/eei.v14i3.8963

Abstract

Tokenization process decomposes a multi-word-span instrument name into several tokens and the transformer attention mechanism handles each token individually, thus hindering the treatment of the related tokens as a single entity. The existence of multiple instruments in a single message further exaggerates the complications and results in low predictive performance. This study proposed the use of sequentially tagged target-independent sentinel tokens to encapsulate multiword instrument aspects for natural language inference model fine-tuning. The encapsulation not only facilitated the attention mechanism to handle an instrument name as a single entity but also enabled the model to handle unseen instruments effectively. Our empirical analysis was based on 5,178 manually annotated instrument–sentiment pairs originated from finance discussion board messages that addressed sentiments of one to four instruments in a single post. The proposed approach consistently outperformed the direct bidirectional encoder representations from transformers (BERT) based approach in terms of recall, precision, and F1-score when handling financial commentaries written in colloquial Cantonese. This study demonstrated the potential benefits of target-independent sentinel token encapsulation for natural language inference. The underlying logic of multiword target-independent encapsulation was expected to hold for other languages, including Chinese, Japanese, and Thai.