This Author published in this journals
All Journal Narra J
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Chinese generative AI models (DeepSeek and Qwen) rival ChatGPT-4 in ophthalmology queries with excellent performance in Arabic and English Sallam, Malik; Alasfoor, Israa M.; W. Khalid, Shahad; Al-Mulla, Rand I.; Al-Farajat, Amwaj; M. Mijwil, Maad; Zahrawi, Reem; Sallam, Mohammed; Egger, Jan; Al-Adwan, Ahmad S.
Narra J Vol. 5 No. 1 (2025): April 2025
Publisher : Narra Sains Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52225/narra.v5i1.2371

Abstract

The rapid evolution of generative artificial intelligence (genAI) has ushered in a new era of digital medical consultations, with patients turning to AI-driven tools for guidance. The emergence of Chinese-developed genAI models such as DeepSeek-R1 and Qwen-2.5 presented a challenge to the dominance of OpenAI’s ChatGPT. The aim of this study was to benchmark the performance of Chinese genAI models against ChatGPT-4o and to assess disparities in performance across English and Arabic. Following the METRICS checklist for genAI evaluation, Qwen-2.5, DeepSeek-R1, and ChatGPT-4o were assessed for completeness, accuracy, and relevance using the CLEAR tool in common patient ophthalmology queries. In English, Qwen-2.5 demonstrated the highest overall performance (CLEAR score: 4.43±0.28), outperforming both DeepSeek-R1 (4.31±0.43) and ChatGPT-4o (4.14±0.41), with p=0.002. A similar hierarchy emerged in Arabic, with Qwen-2.5 again leading (4.40±0.29), followed by DeepSeek-R1 (4.20±0.49) and ChatGPT-4o (4.14±0.41), with p=0.007. Each tested genAI model exhibited near-identical performance across the two languages, with ChatGPT-4o demonstrating the most balanced linguistic capabilities (p=0.957), while Qwen-2.5 and DeepSeek-R1 showed a marginal superiority for English. An in-depth examination of genAI performance across key CLEAR components revealed that Qwen-2.5 consistently excelled in content completeness, factual accuracy, and relevance in both English and Arabic, setting a new benchmark for genAI in medical inquiries. Despite minor linguistic disparities, all three models exhibited robust multilingual capabilities, challenging the long-held assumption that genAI is inherently biased toward English. These findings highlight the evolving nature of AI-driven medical assistance, with Chinese genAI models being able to rival or even surpass ChatGPT-4o in ophthalmology-related queries.