cover
Contact Name
Mars Caroline Wibowo
Contact Email
garuda@apji.org
Phone
+628122925000
Journal Mail Official
agus.wibowo@stekom.ac.id
Editorial Address
Jl. Majapahit No.304, Pedurungan Kidul, Kec. Pedurungan, Semarang, Provinsi Jawa Tengah, 52361
Location
Kota semarang,
Jawa tengah
INDONESIA
International Journal of Graphic Design
ISSN : 29880343     EISSN : 29879434     DOI : 10.51903
Core Subject : Science, Art,
This journal is a peer-reviewed and open Visual Communication Design Publication Journal. The fields of study in this journal include the sub-groups of Performing Arts, Arts, journalistic, Crafts, Media, and Design. The Art, Design, and Media Research
Articles 51 Documents
Trust-Calibrated Multilingual RAG for Humanitarian Information Platforms: Empirical Evaluation on OMoS-QA for Migration Information Access Chen, Yushan; Xu, Haosen
International Journal of Graphic Design Vol. 4 No. 1 (2026): April | IJGD: International Journal of Graphic Design
Publisher : University of Science and Computer Technology

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51903/ijgd.v4i1.3552

Abstract

Humanitarian information platforms increasingly serve migrants, refugees, and crisis-affected users who need correct answers about housing, schooling, legal procedures, benefits, health, and emergency services. In this setting, a wrong answer is more harmful than a missing answer, so multilingual question-answering systems must not only retrieve and summarize relevant content but also calibrate when to answer, when to abstain, and how to communicate uncertainty to the user. This paper develops a trust-calibrated multilingual retrieval-augmented generation (RAG) design for humanitarian information platforms and evaluates it on the public OMoS-QA benchmark for migration information access. The study combines two empirical layers. First, we run a direct page-retrieval evaluation over the full public corpus and compare BM25, word-level TF-IDF, character-level TF-IDF, and a lexical-character hybrid retriever. Second, we reanalyze the officially scored benchmark outputs released with OMoS-QA for sentence-level answer extraction, question-level no-answer detection, multilingual transfer, and cross-language transfer. All numerical results are empirically measured; no illustrative placeholders are used. The hybrid retriever reaches 69.4% recall at rank 1, 82.6% at rank 3, and 86.1% at rank 5, outperforming the sparse baselines. On same-language answer extraction, DeBERTa achieves the strongest balanced F1 (62.5 German, 64.9 English), while Llama-3-70B and GPT-3.5-Turbo obtain the strongest no-answer detection results. Explicit answerability prompting raises Llama-3-70B recall on unanswerable questions to 83.6% in German and 78.2% in English. Multilingual experiments show moderate degradation for French and larger losses for Arabic and Ukrainian, while cross-language transfer remains surprisingly robust. Based on these findings, the paper formulates a design contribution for graphic and interaction design: a trust-calibrated evidence-card pattern that combines evidence highlighting, citation links, uncertainty cues, and escalation to human support. The result is a benchmark-grounded interface logic for safer public-interest LLM applications rather than a user-validated final interface.