Claim Missing Document
Check
Articles

A CNN–LSTM–DQN Policy with Prioritized Experience Replay for Cost-Aware Intrusion Detection on CSE-CIC-IDS2018 Rushendra; Kalamullah Ramli; Prima Dewi Purnamasari
Journal of Embedded Systems, Security and Intelligent Systems Vol 7 No 2 (2026): June 2026
Publisher : Program Studi Teknik Komputer

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.59562/jessi.v7i2.2629

Abstract

Purpose – This study aims to examine intrusion detection as a policy learning problem and determine how replay strategy alone controls the operational trade-off between attack recall and alert volume under extreme class imbalance. Design/methods/approach – A controlled ablation study was conducted using a fixed CNN–LSTM feature extractor, fixed feature set, and fixed reward structure combined with a Deep Q-Network (DQN) agent. Three configurations were compared: naïve DQN without replay, uniform experience replay with a target network, and Prioritized Experience Replay (PER). Experiments used the CSE-CIC-IDS2018 dataset, consisting of 10,788,508 training flows and 2,697,128 testing flows, with attack events occurring at fewer than 70 per million flows. Performance was assessed through recall and alerts per million flows (ARMF). Findings - Supervised CNN–LSTM baselines achieved recall above 95% but generated 31,000–45,000 ARMF. Naïve DQN reduced ARMF to 383 but sharply decreased recall to 42.47%. Uniform replay improved recall to 84.95% but increased ARMF to 12,728. PER achieved the most balanced operating point, reaching 91.40% recall at 1,031 ARMF, approximately 30 times fewer alerts than the supervised CNN–LSTM reference, with a 5.91-percentage-point recall cost. Research implications/limitations – The findings indicate that replay distribution is a critical operational design variable for controlling alert volume in highly imbalanced intrusion detection settings. However, the study is limited to a fixed backbone, feature set, reward shape, and dataset. Originality/value – This study demonstrates that replay strategy can substantially reshape IDS operating points independently of model architecture or feature representation.