Journal of Applied Science, Engineering, Technology, and Education
Vol. 8 No. 1 (2026)

Developing a Bilingual English-Arabic Dataset for Textbook Question Answering: A Hybrid Translation and Validation Approach

Amani Jamal (King Abdulaziz University)



Article Info

Publish Date
08 Apr 2026

Abstract

Textbook Question Answering has been a central feature of educational artificial intelligence enabling curriculumaligned machine reading to support personalized learning and diagnostic testing. While there is significant advancement in English-language TQA datasets, there is still a lag in Arabic because of a lack of sufficient highquality domain-specific resources. A new bilingual English-Arabic TQA data is presented in this paper, and it was created using a hybrid translation and validation method. It combines machine translation of CK12-QA dataset with Google sheet translator. Semantic consistency was evaluated using automated metrics based on multilingual sentence embeddings and translation quality scores. Cosine similarity (0.87) and BLEU score (38.5) confirmed strong semantic equivalence and translation reliability across the bilingual dataset. These results demonstrate robust linguistic alignment and completeness. This approach is a balance between conflicting scalability and accuracy in long-standing semantic drift, morphological variation and in context misalignment issues in Arabic education datasets compared to previous efforts to use machine translation or mini-batch annotation only. Output dataset has a parallel format structure of English-Arabic question-answer pair that facilitates simple cross-lingual research in multiple-choice and textbook conditions. By focusing on K-12 science curriculum in specific subject areas, this contribution can enable improved monolingual and cross-lingual educational QA applications model training and testing. This does not only make AI-based learning more inclusive among Arabic students but also provides impetus to creation of cross-lingual transfer learning and benchmarking in TQA. The sources and information are openly published in an attempt to further increase the reproducibility, verifiable peer cooperation and further promote the development of AI in multilingual education

Copyrights © 2026






Journal Info

Abbrev

asci

Publisher

Subject

Civil Engineering, Building, Construction & Architecture Computer Science & IT Electrical & Electronics Engineering Industrial & Manufacturing Engineering Other

Description

Journal of Applied Science, Engineering, Technology, and Education (ASCI) is an international wide scope, peer-reviewed open access journal for the publication of original papers concerned with diverse aspects of science application, technology and ...