Deepfake detection has become increasingly vital in the era of sophisticated fake media generation techniques. Threats posed by these deep fakes make deep fake detection inevitable. Research on Deep fake detection has been conducted extensively. But problems like resource intensive models, generalizability across datasets are still existing. To overcome the above problems, we propose a framework which embraces the transfer learning and lightweight architecture of xception model. The framework consists of three major inherent steps for deep-fake detection. The first step involves a feature extractor that uses the pretrained Xception as the backbone. The feature extractor has two branches for global and local feature extraction. The global feature branch uses the pre-trained Xception for feature extraction, while the local feature branch uses the xception model enhanced through Convolutional Block Attention Module (CBAM) enhanced to effectively extract deepfake-specific features and contrastive learning to equip Xception with discriminative power for feature extraction. Once the local and global features are extracted, two separate Random Forest classifiers are trained on these features. Finally, the predicted probabilities from these two models are ensembled using a logistic regression meta-model. To avoid the effects of class imbalance on the model performance, care was taken to balance samples in each category through augmentations. The model is trained on Face Forensics++ dataset and evaluated for cross datasets on Celeb-Df and UADFV datasets. Given that generalization across datasets is a major challenge faced by deepfake detection models, we integrate domain adaptation where our model performs noticeably well minimal fine-tuning using 10 % data. The proposed framework showed significant improvements with a 5% increase in accuracy, a 1% increase in ROC, and a 2% increase in precision compared to state-of-the-art (SOTA) models.