Purpose - Government websites in Indonesia face persistent content injection threats, including online gambling embedding, webshell installation, and SEO cloaking, that conventional File Integrity Monitoring (FIM) cannot adequately detect. Existing approaches have not yet integrated multi-model LLM Coder analytics with Retrieval-Augmented Generation (RAG) in an on-premise, host-based architecture tailored for government CSIRT operations. Methods - This study designs, implements, and evaluates a four-zone system integrating an event-driven file monitoring agent (Agent-Watcher), automated orchestration, and a Multi-Model LLM Coder analytics engine augmented with a 16,508-document Qdrant-based RAG knowledge base, fully deployed on-premise. An ablation study evaluated using five metrics (Accuracy, Precision, Recall, F1-Score, and Specificity) compared five models (Qwen 2.5 Coder 7B, CodeGemma 7B, DeepSeek Coder 6.7B, CodeLlama 7B, and StarCoder2 7B) under two scenarios (LLM Only vs. LLM + RAG) using 3,000 unseen PHP, JavaScript, and Python samples. Findings - RAG improved performance in three of five models. CodeGemma 7B achieved the best balanced profile (F1-Score 99.27%), while Qwen 2.5 Coder 7B maintained 100% Precision with zero false positives across languages. DeepSeek Coder 6.7B and StarCoder2 7B degraded under RAG, indicating architecture-dependent RAG compatibility. Research Implication - This study contributes a reproducible all-metric evaluation and proposes a layered deployment strategy (CodeGemma as primary detector, Qwen as validator) for data-sovereign government CSIRT operations.Originality – Existing approaches have not yet integrated multi-model LLM Coder analytics with Retrieval-Augmented Genereation (RAG) in an on premise, host-based architecture tailored for government CSIRT operations.
Copyrights © 2026