Pashto, a major language in Afghanistan and Pakistan, faces persistent orthographic inconsistencies regarding the dual graphemes Yā ("ی", U+06CC and "ې", U+06D0). These graphemes represent distinct phonological and morphological functions but are frequently used interchangeably, leading to ambiguity that adversely affects literacy acquisition, digital text processing, and educational practices. This study employs a convergent mixed-methods design, analyzing a stratified corpus of over 2.1 million Pashto words from print and digital sources (2000–2024), alongside 120 educator surveys and 30 expert interviews with linguists, curriculum developers, and software engineers. Quantitative corpus analysis reveals a 68% inconsistency rate in dual Yā usage, significantly reducing Optical Character Recognition (OCR) accuracy by an average of 23% (±2.5%). Qualitative data highlight challenges educators and developers face due to a lack of standardization, particularly in early-grade literacy instruction and digital tool development. Drawing on orthographic theory, sociolinguistics, educational psychology, and Unicode standards, the study proposes a comprehensive, Unicode-compliant orthographic framework. Pilot implementation in three Kabul schools demonstrated a 22% improvement in reading fluency (p=0.013) and an 18% reduction in spelling errors (p=0.021), supporting Sustainable Development Goal 4 (quality education). The findings provide a robust, empirically grounded pathway for orthographic reform, emphasizing the need for coordinated policy interventions, teacher training, and technological updates. This interdisciplinary approach enhances linguistic clarity and promotes educational equity and digital inclusion for Pashto speakers globally.
Copyrights © 2025