Journal of Information Systems Engineering and Business Intelligence
Vol. 11 No. 3 (2025): October

Adaptive Multi‑Layer Framework for Detecting and Mitigating Prompt Injection Attacks in Large Language Models

Hadiprakoso, Raden Budiarto (Unknown)
Wilujengning , Wiyar (Unknown)
Amiruddin, Amiruddin (Unknown)



Article Info

Publish Date
28 Oct 2025

Abstract

Background: Prompt injection attacks are methods that exploit the instruction‐following nature of fine‐tuned large language models (LLMs), leading to the execution of unintended or malicious commands. This vulnerability shows the limitation of traditional defenses, including static filters, keyword blocks, and multi‐LLMs cross‐checks, which lack semantic understanding or incur high latency and operational overhead. Objective: This study aimed to develop and evaluate a lightweight adaptive framework capable of detecting and neutralizing prompt injection attacks in real-time. Methods: Prompt-Shield Framework (PSF) was developed around a locally hosted Llama 3.2 API. This PSF integrated three modules, namely Context-Aware Parsing (CAP), Output Validation (OV), and Self-Feedback Loop (SFL), to pre-filter inputs, validate outputs, and iteratively refine detection rules. Subsequently, five scenarios were tested, comprising baseline (without any defenses), CAP only, OV only, CAP+OV, and CAP+OV+SFL. The evaluation was performed over a near-balanced dataset of 1,405 adversarial and 1,500 benign prompt, measuring classification performance through confusion matrices, precision, recall, and accuracy. Results: The results showed that baseline achieved 63.06% accuracy (precision = 0.678; recall = 0.450), while OV only improved performance to 79.28% (precision = 0.796; recall = 0.768). CAP reached 84.68% accuracy (precision = 0.891; recall = 0.779), while CAP+OV yielded 95.25% accuracy (precision = 0.938; recall = 0.966). Finally, integrating SFL over 10 epochs further improved performance to 97.83% accuracy (precision = 0.980; recall = 0.975) and reduced the false-negative count from 48 (CAP+OV) to 35 (CAP+OV+SFL). Conclusion: The results show the significance of using multiple defenses, such as contextual understanding, OV, and adaptive learning fusion, which are needed for efficient prompt injection mitigation. This shows that PSF framework is an effective solution to protect LLMs against advancing threats. Moreover, further studies should aim to refine the adaptive thresholds in CAP and OV, particularly in multilingual or highly specialized environments, and examine other forms of SFL solutions for better efficiency.  Keywords: Prompt Injection, LLMs Security, Jailbreak, Natural Language Processing

Copyrights © 2025






Journal Info

Abbrev

JISEBI

Publisher

Subject

Computer Science & IT

Description

Jurnal ini menerima makalah ilmiah dengan fokus pada Rekayasa Sistem Informasi ( Information System Engineering) dan Sistem Bisnis Cerdas (Business Intelligence) Rekayasa Sistem Informasi ( Information System Engineering) adalah Pendekatan multidisiplin terhadap aktifitas yang berkaitan dengan ...