It was rather surprising that Windows users readily embraced Copilot, even trusting it with translation projects. Surely, not many users would trust its accuracy in providing cross-language explanations for prompts solely based on the developer's claims. Building on that, this research aimed to test it in a manner distinct from other assessments. Researchers evaluated how accurately Copilot interpreted and understood the advanced Arabic prose from the intricate works of Alfiyah ibn Malik and Nadham Al-Imrithy. The aim was to understand Copilot’s strengths and weaknesses in terms of literal accuracy, terminological-analogical mastery, and contextual depth. Using a mixed-method approach under the Collect-Measure-Repeat (CMR) framework of Responsible AI, the researchers conducted qualitative performance assessments with three experts and quantitative evaluations using METEOR (Metric for Evaluation of Translation with Explicit Ordering). The results showed that although Copilot had no issues comprehending and translating simple Arabic commands, especially word-for-word, it struggled with contextual understanding for many of the complex texts and displayed numerous inconsistencies when the instructions were vague. Copilot's performance issues in context saturation were evident during iterative phases. This led to the conclusion that, while Copilot is competent enough to attempt the challenging task of interpreting complex linguistic structures, it still needs human assistance and cross-references.
                        
                        
                        
                        
                            
                                Copyrights © 2025