Upon completing their shopping experience on an e-commerce platform, users have the opportunity to leave a review. By analyzing reviews, businesses can gain insight into customer emotions, while researchers and policymakers can monitor social dynamics. Large Language Models (LLMs) utilization is identified as a promising methodology for emotion analysis. LLMs have revolutionized natural language processing capabilities, yet their performance in non-English languages, such as Indonesian, necessitates a comprehensive evaluation. This research objective is to perform a comprehensive analysis and comparison of Deepseek-LLM-7B-Chat and Qwen1.5-7B-Chat, two prominent open-source Large Language Models, for the emotion classification of Indonesian product reviews. By leveraging the PRDECT-ID dataset, this study evaluates the performance of both models in a few-shot learning scenario through prompt engineering. The methodology outlines the data preprocessing pipeline, a detailed few-shot prompt engineering strategy tailored to each model's characteristics, model inference execution, and performance assessment using the accuracy, precision, recall, and F1-score metrics. Analytical results reveal DeepSeek achieved an accuracy of 43.41%, exhibiting a considerably superior ability to comprehend instructions compared to Qwen, which attained a maximum accuracy of only 20.35% and often yielded near-random predictions. An in-depth error analysis indicates that this performance gap is likely attributable to factors such as pre-training data bias and tokenization mismatches with the Indonesian language. This research offers empirical evidence regarding the comparative strengths and weaknesses of DeepSeek and Qwen, providing a diagnostic benchmark that underscores the significance of instruction tuning and robust multilingual representation for Indonesian NLP tasks.
Copyrights © 2025