Garuda - Garba Rujukan Digital

Journal of Vocational, Informatics and Computer Education

Vol 4, No 1 (2026): March 2026

Faishal Azhiman Suryadi (Universitas Indonesia)
Kalamullah Ramli (Universitas Indonesia)

Publish Date
05 Apr 2026

Purpose - Government websites in Indonesia face persistent content injection threats, including online gambling embedding, webshell installation, and SEO cloaking, that conventional File Integrity Monitoring (FIM) cannot adequately detect. Existing approaches have not yet integrated multi-model LLM Coder analytics with Retrieval-Augmented Generation (RAG) in an on-premise, host-based architecture tailored for government CSIRT operations. Methods - This study designs, implements, and evaluates a four-zone system integrating an event-driven file monitoring agent (Agent-Watcher), automated orchestration, and a Multi-Model LLM Coder analytics engine augmented with a 16,508-document Qdrant-based RAG knowledge base, fully deployed on-premise. An ablation study evaluated using five metrics (Accuracy, Precision, Recall, F1-Score, and Specificity) compared five models (Qwen 2.5 Coder 7B, CodeGemma 7B, DeepSeek Coder 6.7B, CodeLlama 7B, and StarCoder2 7B) under two scenarios (LLM Only vs. LLM + RAG) using 3,000 unseen PHP, JavaScript, and Python samples. Findings - RAG improved performance in three of five models. CodeGemma 7B achieved the best balanced profile (F1-Score 99.27%), while Qwen 2.5 Coder 7B maintained 100% Precision with zero false positives across languages. DeepSeek Coder 6.7B and StarCoder2 7B degraded under RAG, indicating architecture-dependent RAG compatibility. Research Implication - This study contributes a reproducible all-metric evaluation and proposes a layered deployment strategy (CodeGemma as primary detector, Qwen as validator) for data-sovereign government CSIRT operations.Originality – Existing approaches have not yet integrated multi-model LLM Coder analytics with Retrieval-Augmented Genereation (RAG) in an on premise, host-based architecture tailored for government CSIRT operations.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Journal of Vocational, Informatics and Computer Education

Website

Abbrev

VOICE

Publisher

PT. Academic Bright Collaboration

Subject

Computer Science & IT Education

Description

1. Informatics and Computing Research addressing the design, development, implementation, and evaluation of computing technologies relevant to educational, professional, and digital learning environments, including but not limited to: Artificial Intelligence and Machine Learning Deep Learning and ...

Article Info

Abstract

Multi-Model LLM Coder with RAG for Indonesian Government Web Content Injection Detection: An Ablation Study in an On-Premise CSIRT Architecture

Article Info

Abstract