Yunhe Li
Computer and Information Technology, University of Pennsylvania, PA, USA

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Language-Guided Feature Selection for DDoS and Intrusion Detection on CICIDS2017 Yunhe Li; Shenghan Lu
Journal of Technology Informatics and Engineering Vol. 4 No. 1 (2025): APRIL | JTIE : Journal of Technology Informatics and Engineering
Publisher : University of Science and Computer Technology

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51903/jtie.v4i1.531

Abstract

This paper reports a complete empirical study of language-guided feature selection for DDoS and intrusion detection on the CICIDS2017 MachineLearningCSV flow data. The central question is whether an LLM-style semantic reading of CICFlowMeter feature names can reduce the feature set while preserving detection performance and lowering false alarms. The experiment used the eight labeled CICIDS2017 CSV sessions, removed only non-finite numeric rows, and retained 2,827,876 flows with 78 original numeric features. A semantic feature screen selected 32 features describing service context, duration, packet and byte volume, flow rates, inter-arrival timing, TCP flags, window sizes, and active/idle behavior. The evaluation compared all features with the language-selected set under full-corpus binary and multiclass stochastic logistic regression, DDoS-specific Random Forest, DDoS-specific stochastic logistic regression, and a compact multilayer perceptron. The best DDoS result was obtained by Random Forest with the selected features: F1 = 0.999896, false-positive rate = 0.000068, and eight errors on 67,714 test flows. The selected features reduced the DDoS Random Forest training time by 23.78% and reduced full-corpus SGD training time by about one half, although the full feature set was stronger for the full binary linear model. Ablation showed that TCP flag/window and destination-port semantics produced the largest DDoS degradation when removed. The findings support language-guided feature selection as a practical compression step for latency-sensitive DDoS mitigation, while retaining all features remains advisable for broad multiclass intrusion detection when a linear learner is used.