Faishal Azhiman Suryadi
Universitas Indonesia

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Multi-Model LLM Coder with RAG for Indonesian Government Web Content Injection Detection: An Ablation Study in an On-Premise CSIRT Architecture Faishal Azhiman Suryadi; Kalamullah Ramli
Journal of Vocational, Informatics and Computer Education Vol 4, No 1 (2026): March 2026
Publisher : Academic Bright Collaboration

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.66053/voice.v4i1.449

Abstract

Purpose - Government websites in Indonesia face persistent content injection threats, including online gambling embedding, webshell installation, and SEO cloaking, that conventional File Integrity Monitoring (FIM) cannot adequately detect. Existing approaches have not yet integrated multi-model LLM Coder analytics with Retrieval-Augmented Generation (RAG) in an on-premise, host-based architecture tailored for government CSIRT operations. Methods - This study designs, implements, and evaluates a four-zone system integrating an event-driven file monitoring agent (Agent-Watcher), automated orchestration, and a Multi-Model LLM Coder analytics engine augmented with a 16,508-document Qdrant-based RAG knowledge base, fully deployed on-premise. An ablation study evaluated using five metrics (Accuracy, Precision, Recall, F1-Score, and Specificity) compared five models (Qwen 2.5 Coder 7B, CodeGemma 7B, DeepSeek Coder 6.7B, CodeLlama 7B, and StarCoder2 7B) under two scenarios (LLM Only vs. LLM + RAG) using 3,000 unseen PHP, JavaScript, and Python samples. Findings - RAG improved performance in three of five models. CodeGemma 7B achieved the best balanced profile (F1-Score 99.27%), while Qwen 2.5 Coder 7B maintained 100% Precision with zero false positives across languages. DeepSeek Coder 6.7B and StarCoder2 7B degraded under RAG, indicating architecture-dependent RAG compatibility. Research Implication - This study contributes a reproducible all-metric evaluation and proposes a layered deployment strategy (CodeGemma as primary detector, Qwen as validator) for data-sovereign government CSIRT operations.Originality – Existing approaches have not yet integrated multi-model LLM Coder analytics with Retrieval-Augmented Genereation (RAG) in an on premise, host-based architecture tailored for government CSIRT operations.