Tsaniya, Hilya
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

RadEval: A novel semantic evaluation framework for radiology report Tsaniya, Hilya; Fatichah, Chastine; Suciati, Nanik
International Journal of Advances in Intelligent Informatics Vol 11, No 4 (2025): November 2025
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26555/ijain.v11i4.2151

Abstract

The evaluation of automatically generated radiology reports remains a critical challenge, as conventional metrics fail to capture the semantic, clinical, and contextual correctness required for automatic medical analysis. This study proposes RadEval, a semantic-aware evaluation framework, to assess the quality of generated radiology reports. This method integrates domain-specific knowledge and contextual embeddings to evaluate the quality of generated radiology reports using a four-level scoring system. Given a reference report and a predicted report from a radiology image, RadEval performs scoring evaluation by first extracting relevant medical entities using a fine-tuned biomedical NER model. These entities are normalized through ontology mapping using RadLex concept identifiers to resolve lexical variation. Then, semantically related entities were clustered using BioBERT's contextual embeddings to capture deeper semantic similarity. In addition, predicted abnormality tags are incorporated to weight clinically significant terms during score aggregation. The final semantic score reflects a weighted combination of exact match, ontology match, and contextual similarity, modulated by tag importance. Experiments were conducted on the MIMIC-CXR dataset, which contains over 200,000 report pairs. Comparative evaluations show that RadEval outperforms traditional metrics, achieving an F1-score of 0.69, compared to 0.56 for BERTScore. Using this method, a more precise clinical interpretation of the predicted report was captured from the reference report. These findings suggest that RadEval method provides a more accurate and clinically aligned framework for evaluating the medical report generation model.