International Journal of Graphic Design
Vol. 4 No. 1 (2026): April | IJGD: International Journal of Graphic Design

Trust-Calibrated Multilingual RAG for Humanitarian Information Platforms: Empirical Evaluation on OMoS-QA for Migration Information Access

Chen, Yushan (Unknown)
Xu, Haosen (Unknown)



Article Info

Publish Date
15 Apr 2026

Abstract

Humanitarian information platforms increasingly serve migrants, refugees, and crisis-affected users who need correct answers about housing, schooling, legal procedures, benefits, health, and emergency services. In this setting, a wrong answer is more harmful than a missing answer, so multilingual question-answering systems must not only retrieve and summarize relevant content but also calibrate when to answer, when to abstain, and how to communicate uncertainty to the user. This paper develops a trust-calibrated multilingual retrieval-augmented generation (RAG) design for humanitarian information platforms and evaluates it on the public OMoS-QA benchmark for migration information access. The study combines two empirical layers. First, we run a direct page-retrieval evaluation over the full public corpus and compare BM25, word-level TF-IDF, character-level TF-IDF, and a lexical-character hybrid retriever. Second, we reanalyze the officially scored benchmark outputs released with OMoS-QA for sentence-level answer extraction, question-level no-answer detection, multilingual transfer, and cross-language transfer. All numerical results are empirically measured; no illustrative placeholders are used. The hybrid retriever reaches 69.4% recall at rank 1, 82.6% at rank 3, and 86.1% at rank 5, outperforming the sparse baselines. On same-language answer extraction, DeBERTa achieves the strongest balanced F1 (62.5 German, 64.9 English), while Llama-3-70B and GPT-3.5-Turbo obtain the strongest no-answer detection results. Explicit answerability prompting raises Llama-3-70B recall on unanswerable questions to 83.6% in German and 78.2% in English. Multilingual experiments show moderate degradation for French and larger losses for Arabic and Ukrainian, while cross-language transfer remains surprisingly robust. Based on these findings, the paper formulates a design contribution for graphic and interaction design: a trust-calibrated evidence-card pattern that combines evidence highlighting, citation links, uncertainty cues, and escalation to human support. The result is a benchmark-grounded interface logic for safer public-interest LLM applications rather than a user-validated final interface.

Copyrights © 2026






Journal Info

Abbrev

ijgd

Publisher

Subject

Arts Computer Science & IT

Description

This journal is a peer-reviewed and open Visual Communication Design Publication Journal. The fields of study in this journal include the sub-groups of Performing Arts, Arts, journalistic, Crafts, Media, and Design. The Art, Design, and Media ...