SQL Injection (SQLi) and Cross-Site Scripting (XSS) remain severe threats to web application security, particularly as attackers employ increasingly sophisticated obfuscation techniques to bypass conventional detection systems. This research constructs a machine learning framework using ensemble learning — specifically combining Random Forest and XGBoost — integrated with character-level n-gram feature extraction. The methodology involved rigorous data curation of a large-scale dataset, refining 156,636 raw samples into 151,783 unique entries to ensure high-quality training data. By extracting 10,000 character-level n-gram features, the model captures the intricate structural patterns of complex and obfuscated payloads. Experimental results show consistent and measurable performance: the proposed ensemble model achieved an overall accuracy of 99.67%. Stability was confirmed through a 5-fold cross-validation process, yielding a mean accuracy of 99.64% and a standard deviation of 0.0003. These findings are reinforced by ROC AUC scores of 1.0000 for XSS and 0.9999 for SQLi, indicating near-perfect discriminative capability. The combination of character-level representation and ensemble learning produces a precise and resilient solution for safeguarding modern web environments against dynamic and evolving cyber threats.
Copyrights © 2026