Breast cancer is one of the leading causes of death among women worldwide. Early and accurate detection plays a vital role in improving survival rates and guiding effective treatment. In this study, we propose a deep learning-based model for automatic breast cancer detection using mammogram images. The model is divided into three phases: preprocessing, segmentation, and classification. The first two phases, image enhancement and segmentation, were developed and validated in our previous works. Both phases were designed in a robust manner using learning networks; the usage of VGG-16 in preprocessing and U-net in segmentation helps in enhancing the overall classification performance. In this paper, we focus on the classification phase and introduce a novel hybrid deep learning based model that combines the strengths of Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). This model captures both fine-grained image details and the broader global context, making it highly effective for distinguishing between benign and malignant breast tumors. We also include attention-based feature fusion and Grad CAM visualizations to make predictions more explainable for clinical use and reference. The model was tested on multiple benchmark datasets, DDSM, INbreast, and MIAS, and a combination of all three datasets, and achieved excellent results, including 100% accuracy on MIAS and over 99% accuracy on other datasets. Compared to recent deep learning models, our method outperforms existing approaches in both accuracy and reliability. This research offers a promising step toward supporting radiologists with intelligent tools that can improve the speed and accuracy of breast cancer diagnosis.