Prediction of facial attractiveness greatly depends on the subjective terminology applied according to the diverse cultural, social and psychological considerations. This task is important for applications in many fields, such as aesthetics, entertainment, wardrobe recommendations, etc., and requires accurate and robust models. Current methods predominantly adopt a single model, which is unable to learn the diverse attributes that can influence the quality of facial beauty. In order to overcome these challenges, this study proposes a hybrid transfer learning framework for feature extraction and prediction that combines ResNet50 and InceptionV3. In this methodology, Multi-task Cascaded Convolutional Networks (MTCNN) is used for accurate face detection and preprocessing, then features extraction is done using pretrained ResNet50 and InceptionV3 architectures. The features extracted are then normalized and fused together and passed through a dense classification layer with application of dropouts and regularization in order to make the model robust. The CelebA dataset was used to train the model, utilizing class weights to account for imbalanced data and callbacks to optimize performance. Test accuracy and F1 Score of the proposed model is found to be 83.58% and 0.8384 respectively, which shows good generalization on unseen data. The validation frames the performance of the hybrid framework which leverages the complementary strengths of multiple CNNs, and thus provides robust performance.