Early and accurate detection of stroke is critical for timely medical intervention and improved patient outcomes. This study explores the application of deep learning models, particularly the Vision Transformer (ViT), for the automated classification of brain stroke from medical images. A curated dataset of brain scans was used to train and evaluate the ViT model, which was benchmarked against a widely used convolutional neural network (CNN), ResNet18. Both models were trained using transfer learning techniques under identical preprocessing and training configurations to ensure fair comparison. The results indicate that the ViT model significantly outperforms ResNet18 in terms of validation accuracy, class-wise precision, and recall, achieving a peak accuracy of 99.60%. Visual analyses, including confusion matrices and sample prediction comparisons, reveal that ViT is more robust in detecting subtle stroke patterns. However, ViT requires more computational resources, which may limit its deployment in real-time or low-resource settings. These findings suggest that transformer-based architectures are highly effective for medical image classification tasks, particularly in stroke diagnosis, and offer a viable alternative to traditional CNN-based approaches.