Garuda - Garba Rujukan Digital

Journal of Information Systems Engineering and Business Intelligence

Vol. 11 No. 3 (2025): October

Hadiprakoso, Raden Budiarto (Unknown)
Wilujengning , Wiyar (Unknown)
Amiruddin, Amiruddin (Unknown)

Publish Date
28 Oct 2025

Background: Prompt injection attacks are methods that exploit the instruction‐following nature of fine‐tuned large language models (LLMs), leading to the execution of unintended or malicious commands. This vulnerability shows the limitation of traditional defenses, including static filters, keyword blocks, and multi‐LLMs cross‐checks, which lack semantic understanding or incur high latency and operational overhead. Objective: This study aimed to develop and evaluate a lightweight adaptive framework capable of detecting and neutralizing prompt injection attacks in real-time. Methods: Prompt-Shield Framework (PSF) was developed around a locally hosted Llama 3.2 API. This PSF integrated three modules, namely Context-Aware Parsing (CAP), Output Validation (OV), and Self-Feedback Loop (SFL), to pre-filter inputs, validate outputs, and iteratively refine detection rules. Subsequently, five scenarios were tested, comprising baseline (without any defenses), CAP only, OV only, CAP+OV, and CAP+OV+SFL. The evaluation was performed over a near-balanced dataset of 1,405 adversarial and 1,500 benign prompt, measuring classification performance through confusion matrices, precision, recall, and accuracy. Results: The results showed that baseline achieved 63.06% accuracy (precision = 0.678; recall = 0.450), while OV only improved performance to 79.28% (precision = 0.796; recall = 0.768). CAP reached 84.68% accuracy (precision = 0.891; recall = 0.779), while CAP+OV yielded 95.25% accuracy (precision = 0.938; recall = 0.966). Finally, integrating SFL over 10 epochs further improved performance to 97.83% accuracy (precision = 0.980; recall = 0.975) and reduced the false-negative count from 48 (CAP+OV) to 35 (CAP+OV+SFL). Conclusion: The results show the significance of using multiple defenses, such as contextual understanding, OV, and adaptive learning fusion, which are needed for efficient prompt injection mitigation. This shows that PSF framework is an effective solution to protect LLMs against advancing threats. Moreover, further studies should aim to refine the adaptive thresholds in CAP and OV, particularly in multilingual or highly specialized environments, and examine other forms of SFL solutions for better efficiency. Keywords: Prompt Injection, LLMs Security, Jailbreak, Natural Language Processing

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Journal of Information Systems Engineering and Business Intelligence

Website

Abbrev

JISEBI

Publisher

Universitas Airlangga

Subject

Computer Science & IT

Description

Jurnal ini menerima makalah ilmiah dengan fokus pada Rekayasa Sistem Informasi ( Information System Engineering) dan Sistem Bisnis Cerdas (Business Intelligence) Rekayasa Sistem Informasi ( Information System Engineering) adalah Pendekatan multidisiplin terhadap aktifitas yang berkaitan dengan ...

Article Info

Abstract

Adaptive Multi‑Layer Framework for Detecting and Mitigating Prompt Injection Attacks in Large Language Models

Article Info

Abstract