Garuda - Garba Rujukan Digital

Narra J

Vol. 5 No. 1 (2025): April 2025

Sallam, Malik (Unknown)
Alasfoor, Israa M. (Unknown)
W. Khalid, Shahad (Unknown)
Al-Mulla, Rand I. (Unknown)
Al-Farajat, Amwaj (Unknown)
M. Mijwil, Maad (Unknown)
Zahrawi, Reem (Unknown)
Sallam, Mohammed (Unknown)
Egger, Jan (Unknown)
Al-Adwan, Ahmad S. (Unknown)

Publish Date
08 Apr 2025

The rapid evolution of generative artificial intelligence (genAI) has ushered in a new era of digital medical consultations, with patients turning to AI-driven tools for guidance. The emergence of Chinese-developed genAI models such as DeepSeek-R1 and Qwen-2.5 presented a challenge to the dominance of OpenAI’s ChatGPT. The aim of this study was to benchmark the performance of Chinese genAI models against ChatGPT-4o and to assess disparities in performance across English and Arabic. Following the METRICS checklist for genAI evaluation, Qwen-2.5, DeepSeek-R1, and ChatGPT-4o were assessed for completeness, accuracy, and relevance using the CLEAR tool in common patient ophthalmology queries. In English, Qwen-2.5 demonstrated the highest overall performance (CLEAR score: 4.43±0.28), outperforming both DeepSeek-R1 (4.31±0.43) and ChatGPT-4o (4.14±0.41), with p=0.002. A similar hierarchy emerged in Arabic, with Qwen-2.5 again leading (4.40±0.29), followed by DeepSeek-R1 (4.20±0.49) and ChatGPT-4o (4.14±0.41), with p=0.007. Each tested genAI model exhibited near-identical performance across the two languages, with ChatGPT-4o demonstrating the most balanced linguistic capabilities (p=0.957), while Qwen-2.5 and DeepSeek-R1 showed a marginal superiority for English. An in-depth examination of genAI performance across key CLEAR components revealed that Qwen-2.5 consistently excelled in content completeness, factual accuracy, and relevance in both English and Arabic, setting a new benchmark for genAI in medical inquiries. Despite minor linguistic disparities, all three models exhibited robust multilingual capabilities, challenging the long-held assumption that genAI is inherently biased toward English. These findings highlight the evolving nature of AI-driven medical assistance, with Chinese genAI models being able to rival or even surpass ChatGPT-4o in ophthalmology-related queries.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Narra J

Website

Abbrev

main

Publisher

PT Narra Sains Indonesia

Subject

Biochemistry, Genetics & Molecular Biology Health Professions Immunology & microbiology Medicine & Pharmacology Public Health

Description

Narra J is a multidisciplinary journal and it is published three times (April, August, December) a year. The objective is to promote articles on infection, public health, global health, tropical infection, one health and diseases in tropics. Narra J publishes original research work across all ...

Article Info

Abstract

Chinese generative AI models (DeepSeek and Qwen) rival ChatGPT-4 in ophthalmology queries with excellent performance in Arabic and English

Article Info

Abstract