The evaluation of academic services in Indonesian higher education is often hindered by the scarcity of labeled datasets (cold start problem) and the complexity of culturally implicit feedback. This study evaluates the efficacy of Zero-Shot Classification by benchmarking two distinct inference paradigms: the Direct Multilingual approach (XLM-RoBERTa-large-xnli) and the Translate-Test approach (Facebook/BART-large-mnli). Using a dataset of 280 student reviews validated by human annotators ( =0.844), the research reveals a significant performance trade-off. While XLM-RoBERTa greater robustness in maintaining global performance equilibrium (Macro F1-Score: 0.67), it exhibits a pronounced ‘Politeness Bias’, frequently failing to detect negative reviews masked by courteous language (Recall: 0.48). Conversely, the Translate-Test approach (BART) shows higher sensitivity in capturing negative sentiments (Recall: 0.77). Qualitative analysis suggests that the translation process potentially functions as a dual-mechanism: acting as a cultural decontextualization filter that isolates implicit criticism and a denoising layer that normalizes informal slang and typographical errors. However, this enhanced sensitivity results in an approximate 2.6x increase in computational latency and weaker neutral class detection. These findings indicate that while XLM-RoBERTa offers balanced generalization for broad analysis, the Translate-Test strategy is highly effective for accurately uncovering latent student grievances obscured by local linguistic styles.
Copyrights © 2025