The increasing use of artificial intelligence in academic writing has raised concerns about the accuracy and coherence of AI-generated texts, particularly in underrepresented languages like Malay. This study evaluates the performance of ChatGPT in generating Malay academic texts by comparing AI-generated outputs with expert corrected versions, focusing on grammatical errors, structural inconsistencies, and lexical inaccuracies. A comparative analysis was conducted on two datasets: Trained Dataset (TD), where prompts included detailed context, and Untrained Dataset (UTD). ChatGPT-generated texts were reviewed by Malay linguistics and translation experts, who identified and corrected grammatical errors. A quantitative and qualitative analysis assessed error frequency and categorized linguistic challenges. Findings reveal that UTD contained significantly more grammatical errors (87 errors) than TD (18 errors), demonstrating the role of structured prompts in enhancing text quality. Common errors in UTD included incorrect sentence structure (27.59%), omission of names (14.94%), and inappropriate word choices (11.49%). While TD showed improved grammatical accuracy, errors in phrase structure, conjunction usage, and affixation persisted. The study concludes that AI-generated Malay texts lack syntactic stability, requiring expert intervention and model refinement. These findings highlight the need for linguistic adaptation, expanded training datasets and the integration of expert to enhance AI-generated Malay academic writing. Ultimately, this study presents a case study that provides empirical evidence that context-aware prompt engineering and expert-in-the-loop approaches are essential for enhancing the quality of AI outputs, especially in non-English settings. It also advocates for the development of AI models that can capture nuances and linguistic diversity, vital for inclusive education for all.
Copyrights © 2025