Crack detection in buildings is a crucial step in maintaining structural integrity at an early stage and preventing further damage. This study aims to improve the accuracy of crack severity classification in digital images by applying five on-the-fly data augmentation techniques (flip, rotate, zoom, translation, and contrast) combined with the MobileNetV2 architecture. The augmentation techniques are performed dynamically during the training process without storing the transformed images, making the process more efficient in terms of storage, computation time, and adaptability to data variations. This study utilized a dataset of 900 images and achieved a classification accuracy of 93%, which is higher than the previous approach using MobileNetV1 with offline augmentation that only reached 89%. Previous research was limited to static augmentation approaches and less efficient CNN architectures. This study addresses those limitations by integrating dynamic augmentation and a lightweight architecture. It contributes to enhancing the efficiency and accuracy of crack image classification models in the context of limited data and low-computation systems, with strong potential for implementation in automated detection systems on mobile or edge computing devices.