Shenghan Lu
Information Technology, Fordham University, NY, USA

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Uncertainty-Aware Medical Vision–Language Classification on a Lightweight MedMNIST-Compatible Biomedical Patch Benchmark Shenghan Lu; Xiaohan Chang; Tracey Zou
Journal of Technology Informatics and Engineering Vol. 5 No. 2 (2026): AUGUST | JTIE : Journal of Technology Informatics and Engineering
Publisher : University of Science and Computer Technology

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51903/jtie.v5i2.530

Abstract

Medical image classifiers can be accurate while still being unsafe to use when their confidence values are poorly calibrated or when their predictions are communicated in language that overstates diagnostic certainty. This paper presents an uncertainty-aware medical vision-language classification workflow for lightweight 28×28 biomedical images. The target setting is MedMNIST-style classification, where images are standardized to small spatial sizes and where compact CNN, residual, and transformer models can be trained on ordinary hardware. The official MedMNIST v2 collection contains 12 two-dimensional and 6 three-dimensional biomedical image subsets; however, the execution environment used for this manuscript could read the official documentation but could not fetch binary Zenodo files. Three lightweight models were trained and evaluated across three random seeds: a 53,380-parameter CNN, a 392,092-parameter tiny residual network, and a 77,956-parameter tiny Vision Transformer. Each model used the same 2,240/320/640 train/validation/test split, AdamW optimization, and validation-set temperature scaling. The evaluated metrics were top-1 accuracy, macro one-vs-rest ROC-AUC, negative log likelihood, multiclass Brier score, expected calibration error, predictive entropy, and confusion-matrix/class-level metrics. TinyViT achieved the highest mean calibrated top-1 accuracy, 0.9906 ± 0.0016, while SmallCNN achieved the best mean macro ROC-AUC, 0.9993 ± 0.0005, and the best mean post-calibration ECE, 0.0115 ± 0.0028. Temperature scaling reduced ECE for all models, with reductions of 0.1153 for SmallCNN, 0.0853 for TinyResNet, and 0.1189 for TinyViT. A deterministic language-card module converted calibrated predictions into patient-friendly decision-support text that explicitly includes confidence, uncertainty, visual cue wording, and a non-diagnostic safety caveat.
Language-Guided Feature Selection for DDoS and Intrusion Detection on CICIDS2017 Yunhe Li; Shenghan Lu
Journal of Technology Informatics and Engineering Vol. 4 No. 1 (2025): APRIL | JTIE : Journal of Technology Informatics and Engineering
Publisher : University of Science and Computer Technology

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51903/jtie.v4i1.531

Abstract

This paper reports a complete empirical study of language-guided feature selection for DDoS and intrusion detection on the CICIDS2017 MachineLearningCSV flow data. The central question is whether an LLM-style semantic reading of CICFlowMeter feature names can reduce the feature set while preserving detection performance and lowering false alarms. The experiment used the eight labeled CICIDS2017 CSV sessions, removed only non-finite numeric rows, and retained 2,827,876 flows with 78 original numeric features. A semantic feature screen selected 32 features describing service context, duration, packet and byte volume, flow rates, inter-arrival timing, TCP flags, window sizes, and active/idle behavior. The evaluation compared all features with the language-selected set under full-corpus binary and multiclass stochastic logistic regression, DDoS-specific Random Forest, DDoS-specific stochastic logistic regression, and a compact multilayer perceptron. The best DDoS result was obtained by Random Forest with the selected features: F1 = 0.999896, false-positive rate = 0.000068, and eight errors on 67,714 test flows. The selected features reduced the DDoS Random Forest training time by 23.78% and reduced full-corpus SGD training time by about one half, although the full feature set was stronger for the full binary linear model. Ablation showed that TCP flag/window and destination-port semantics produced the largest DDoS degradation when removed. The findings support language-guided feature selection as a practical compression step for latency-sensitive DDoS mitigation, while retaining all features remains advisable for broad multiclass intrusion detection when a linear learner is used.