This study presents a comparative analysis of three convolutional neural network (CNN) architectures—MobileNetV2, Xception, and EfficientNet-B0—for classifying retinal fundus images into four categories: Cataract, Diabetic Retinopathy, Glaucoma, and Normal. Using a dataset of 4,217 images, the models were trained with transfer learning, image augmentation, and regularization techniques, and evaluated through 5-fold cross-validation. EfficientNet-B0 achieved the highest mean accuracy (0.85) and demonstrated stable performance across all metrics, while MobileNetV2 provided competitive accuracy with lower computational requirements, making it suitable for resource-limited environments. Xception showed the lowest and least stable performance, indicating a higher tendency to overfit. External validation with clinical images revealed a significant drop in accuracy for all models, highlighting challenges related to domain shift and limited generalization. Grad-CAM analysis also showed difficulties in detecting subtle pathological features in Diabetic Retinopathy and Glaucoma. The study is limited by the small dataset size, reliance on a single data source, and the absence of additional clinical information. Future work should incorporate larger and more diverse datasets, apply domain adaptation strategies, and integrate multimodal clinical data to enhance robustness and clinical applicability.
Copyrights © 2026