Memes are a popular internet format and spread quickly across many social media platforms. People can express their ideas, criticisms, interests, or aversions through memes. But in some cases, other people may interpret memes differently and feel bad about it. This variation in meme interpretation is a challenge in sentiment analysis, as a meme can be judged negative or positive by different individuals. Therefore, there is a need for an automated system that can consistently predict the sentiment polarity of memes. A meme is multimodal content that could consist of visual and textual components, which is suitable for a sentiment polarity analysis study. To model a system that effectively leverages multimodal features, the model needs to understand the meme features. This study proposes a joint deep learning model—BERT and Densenet121—that concatenates text, image, and cluster features based on the extracted face encodings. To assess the context of the texts better, BERT was trained with a sarcasm dataset. The widely used ‘SemEval 2020 Task 8: Memotion Analysis dataset’ was used in this study due to its comprehensive annotation of meme-based sentiment and sarcasm, which aligns with this study’s approach. The result demonstrates that the model achieved 0.3738 Accuracy (+2.52%) and 0.3735 Weighted F1 (+1.04%), while maintaining competitive Macro F1 (0.3047). This result shows effective sarcasm adaptation on imbalanced datasets, with improved ability to detect positive and neutral sentiments and reduce sarcastic false negatives compared to the base model. This highlights the effectiveness of integrating sarcasm detection into the model framework for robust sentiment classification in memes.
Copyrights © 2026