Purpose – SMS spam remains a persistent cybersecurity threat, with 68% of mobile users exposed to unsolicited messages. Existing lightweight classifiers suffer from two compounding problems: feature representations that fail to capture semantic spam patterns, and class imbalance that biases probabilistic classifiers toward the majority class. This study proposes a unified pipeline that resolves both problems simultaneously. Methods – A dual feature extraction scheme combining TF-IDF with 12 empirically validated semantic features feeds a two-stage Chi-Square and Binary Particle Swarm Optimization (BPSO) feature selection pipeline. A Prior-Corrected Multinomial Naive Bayes (PC-MNB) recalibrates class priors at inference time to counteract Random Oversampling bias. Experiments were conducted on the UCI SMS Spam Collection. Findings – The proposed model achieved 98.07% accuracy, 95.45% macro F1, and 96.64% spam precision with only 4 false positives across 903 legitimate messages reducing false alarms by 89.5% over the strongest baseline. Research implications – Evaluation is limited to English-language SMS; generalization to multilingual corpora remains unvalidated. The rule-based semantic features are brittle against adversarial obfuscation, and BPSO incurs a one-time offline training cost of 10–25 minutes. Originality – This study is the first to integrate dual semantic-statistical feature extraction, filter-wrapper hybrid selection, and inference-time prior correction into a single CPU-deployable pipeline for SMS spam detection, distinguished from prior CS-BPSO work by domain, feature architecture, and probabilistic calibration mechanism. Future work will explore multilingual validation and SHAP-based explainability.
Copyrights © 2026