This study develops and evaluates machine-learning models based on Convolutional Neural Networks (CNNs) for recognizing images of Arabic vocabulary (mufradat) and for deploying these models on resource-constrained mobile devices. Whereas most prior research on Arabic-script recognition has concentrated on isolated characters executed on desktop hardware, the recognition of whole words—whose connected and visually similar glyphs increase classification difficulty—remains comparatively underexplored, particularly for on-device educational use. To address this gap, the study contributes (i) a purpose-built image dataset of fifteen academic Arabic words, (ii) a systematic comparison between a CNN trained from scratch and a MobileNetV2 transfer-learning model, and (iii) a quantified analysis of mobile deployment. An experimental approach was adopted using 3,000 images (200 per class) compiled from tablet handwriting and Microsoft Word screen-captured images, partitioned through a stratified 70/15/15 training, validation, and testing split. Both models were trained using the Adam optimizer (learning rate 1×10⁻⁴), a batch size of 32, and 50 epochs. The from-scratch five-convolution model attained 94.4% test accuracy (loss 0.26; macro-averaged F1-score 0.95), whereas the MobileNetV2 model attained 99.1% accuracy (loss 0.20; macro-averaged F1-score 0.99). After conversion to TensorFlow Lite, the MobileNetV2 model required only 9.1 MB of storage and 42 ms per inference on a mid-range Android device, compared with 103 MB and 180 ms for the from-scratch model, confirming its suitability for real-time use. The findings demonstrate that transfer learning achieves higher accuracy with markedly fewer parameters and a smaller computational footprint, providing an efficient foundation for mobile-assisted Arabic vocabulary learning.