Large Language Models (LLMs) have shown tremendous potential in intent classification tasks, yet their practical deployment in low-resource language environments remains underexplored. This study presents an informatics-based evaluation framework to compare three LLM architectures—GPT-Neo (fine-tuned), Mistral, and Phi-2.0 (zero-shot inference)—on Indonesian intent classification. The methodology integrates classic informatics approaches such as stratified sampling, label encoding, model evaluation using Scikit-learn, and a REST API-based local inference pipeline via the Ollama framework. The study also benchmarks computational efficiency by profiling execution times on consumer-grade hardware. GPT-Neo achieved 100% accuracy after fine-tuning, while Mistral and Phi-2.0 scored approximately 55% and 18%, respectively, in zero-shot settings. The hybrid architecture designed in this work demonstrates how LLMs can be systematically evaluated and deployed using lightweight, modular informatics workflows. Results suggest that fine-tuned lightweight models are viable for high-accuracy deployment, while zero-shot models enable rapid prototyping under constrained resources.
Copyrights © 2025