Breast cancer ranks among the leading causes of death in women worldwide. Early detection through mammographic image analysis plays a crucial role in increasing survival rates. However, manual interpretation of mammograms requires expert knowledge and is prone to errors. This study aims to develop a breast cancer classification model using mammography images based on the Vision Transformer (ViT) architecture without employing transfer learning. The dataset used is the Digital Database for Screening Mammography (DDSM), consisting of two categories: benign and malignant. To address class imbalance, undersampling and data augmentation techniques (flipping, rotation, cropping, and noise injection) were applied. All images were normalized and resized to 224×224 pixels to match the ViT input requirements. The model was trained for five epochs with a batch size of 16. Evaluation on the test data was conducted using seven metrics: accuracy, precision, recall, F1-score, Matthews Correlation Coefficient (MCC), Cohen’s Kappa Score, and Area Under the Curve (AUC). The results show that the model achieved an accuracy of 92.50%, precision of 90.48%, recall of 95.00%, F1-score of 92.68%, MCC of 85.11%, Kappa Score of 85.00%, and AUC of 95.75%. These findings indicate that the Vision Transformer is highly effective for mammographic image classification and holds potential as a reliable tool for automated breast cancer diagnosis support.