International Journal of Artificial Intelligence
Vol 12 No 1: June 2025

Evaluation of Perplexity and Syntactic Handling Capabilities of ClueAI Models on Japanese Medical Texts

Haga, Tatsuhiro (Unknown)
Matsumoto, Keiyo (Unknown)
Asahiko, Ippei (Unknown)
Mizoguchi, Shunzo (Unknown)



Article Info

Publish Date
28 Jun 2025

Abstract

This study aims to evaluate the effectiveness of a large Japanese language model, ClueAI, tailored to the medical domain, in the task of predicting Japanese medical texts. The background of this study is the limitations of general language models, including multilingual models such as multilingual BERT, in handling linguistic complexity and specific terminology in Japanese medical texts. The research methodology includes fine-tuning the ClueAI model using the MedNLP corpus, with a MeCab-based tokenization approach through the Fugashi library. The evaluation is carried out using the perplexity metric to measure the model's generalization ability in predicting texts probabilistically. The results show that ClueAI that has been tailored to the medical domain produces lower perplexity values than the multilingual BERT baseline, and is better able to understand the context and sentence structure of medical texts. MeCab-based tokenization is proven to contribute significantly to improving prediction accuracy through more precise morphological analysis. However, the model still shows weaknesses in handling complex syntactic structures such as passive sentences and nested clauses. This study concludes that domain adaptation provides improved performance, but limitations in linguistic generalization remain a challenge. Further research is recommended to explore models that are more sensitive to syntactic structures, expand the variety of medical corpora, and apply other Japanese language models in broader medical NLP tasks such as clinical entity extraction and classification.

Copyrights © 2025






Journal Info

Abbrev

ijai

Publisher

Subject

Computer Science & IT Decision Sciences, Operations Research & Management

Description

The aim is to publish high-quality articles dedicated to Artificial Intelligence. IJAI published in biannual, and in Indonesian, Malay and ...