Purpose: The study evaluates EfficientNetB3 and VGG16 deep learning architectures for image classification, focusing on stability, accuracy, and interpretability. It uses Gradient-weighted Class Activation Mapping to improve transparency and robustness. The research aims to create reliable AI-based diagnostic tools. Methods: The study used a dataset of 4,217 color retinal fundus images divided into four classes: cataract, diabetic retinopathy, glaucoma, and normal. The dataset was divided into 70% for training, 10% for validation, and 20% for testing. The researchers used a transfer learning approach with EfficientNetB3 and VGG16 models, pretrained on ImageNet. Real-time augmentation was applied to prevent overfitting and improve generalization. The models were compiled with the Adam optimizer and trained with categorical cross-entropy loss. Early stopping was implemented to allocate computational resources efficiently and reduce overfitting. A learning rate scheduler (ReduceLROnPlateau) was added to adjust the learning rate if no significant improvement was made concerning validation loss. EfficientNetB3 was more efficient in model size, possessing only 12 million parameters compared to VGG16's 138 million, making it suitable for resource-constrained mobile or embedded systems. The final evaluation was done on the held-out test set. Result: The EfficientNetB3 architecture outperforms VGG16 in classification accuracy and loss value stability, with an average accuracy of 93%. It also exhibits better transparency and predicted accuracy, making it a reliable model for medical image categorization. Novelty: This work introduces a novel framework integrating EfficientNetB3 architecture, stratified cross-valuation, L2 regularization, and Grad-CAM-based interpretability, focusing on openness and explainability in model evaluation.