Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2021 - 2026

0.23

P-Index

This Author published in this journals

All Journal International Journal of Graphic Design

Chen, Yushan

Unknown Affiliation

Author-ID : 9811675

Arts Computer Science & IT

Published : 1 Documents Claim Missing Document

Claim Missing Document

Articles

Trust-Calibrated Multilingual RAG for Humanitarian Information Platforms: Empirical Evaluation on OMoS-QA for Migration Information Access Chen, Yushan; Xu, Haosen
International Journal of Graphic Design Vol. 4 No. 1 (2026): April | IJGD: International Journal of Graphic Design
Publisher : University of Science and Computer Technology

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51903/ijgd.v4i1.3552

Humanitarian information platforms increasingly serve migrants, refugees, and crisis-affected users who need correct answers about housing, schooling, legal procedures, benefits, health, and emergency services. In this setting, a wrong answer is more harmful than a missing answer, so multilingual question-answering systems must not only retrieve and summarize relevant content but also calibrate when to answer, when to abstain, and how to communicate uncertainty to the user. This paper develops a trust-calibrated multilingual retrieval-augmented generation (RAG) design for humanitarian information platforms and evaluates it on the public OMoS-QA benchmark for migration information access. The study combines two empirical layers. First, we run a direct page-retrieval evaluation over the full public corpus and compare BM25, word-level TF-IDF, character-level TF-IDF, and a lexical-character hybrid retriever. Second, we reanalyze the officially scored benchmark outputs released with OMoS-QA for sentence-level answer extraction, question-level no-answer detection, multilingual transfer, and cross-language transfer. All numerical results are empirically measured; no illustrative placeholders are used. The hybrid retriever reaches 69.4% recall at rank 1, 82.6% at rank 3, and 86.1% at rank 5, outperforming the sparse baselines. On same-language answer extraction, DeBERTa achieves the strongest balanced F1 (62.5 German, 64.9 English), while Llama-3-70B and GPT-3.5-Turbo obtain the strongest no-answer detection results. Explicit answerability prompting raises Llama-3-70B recall on unanswerable questions to 83.6% in German and 78.2% in English. Multilingual experiments show moderate degradation for French and larger losses for Arabic and Ukrainian, while cross-language transfer remains surprisingly robust. Based on these findings, the paper formulates a design contribution for graphic and interaction design: a trust-calibrated evidence-card pattern that combines evidence highlighting, citation links, uncertainty cues, and escalation to human support. The result is a benchmark-grounded interface logic for safer public-interest LLM applications rather than a user-validated final interface.

Co-Authors Xu, Haosen

Title

Found 1 Documents
Search

Abstract

Title Search

Found 1 Documents Search

Abstract

Title

Found 1 Documents
Search