Journal of Technology Informatics and Engineering
Vol. 5 No. 2 (2026): AUGUST | JTIE : Journal of Technology Informatics and Engineering

Uncertainty-Aware Medical Vision–Language Classification on a Lightweight MedMNIST-Compatible Biomedical Patch Benchmark

Shenghan Lu (Information Technology, Fordham University, NY, USA)
Xiaohan Chang (Computer Science, University of Connecticut, CT, USA)
Tracey Zou (Computer Science, UCB, CA, USA)



Article Info

Publish Date
13 Jun 2026

Abstract

Medical image classifiers can be accurate while still being unsafe to use when their confidence values are poorly calibrated or when their predictions are communicated in language that overstates diagnostic certainty. This paper presents an uncertainty-aware medical vision-language classification workflow for lightweight 28×28 biomedical images. The target setting is MedMNIST-style classification, where images are standardized to small spatial sizes and where compact CNN, residual, and transformer models can be trained on ordinary hardware. The official MedMNIST v2 collection contains 12 two-dimensional and 6 three-dimensional biomedical image subsets; however, the execution environment used for this manuscript could read the official documentation but could not fetch binary Zenodo files. Three lightweight models were trained and evaluated across three random seeds: a 53,380-parameter CNN, a 392,092-parameter tiny residual network, and a 77,956-parameter tiny Vision Transformer. Each model used the same 2,240/320/640 train/validation/test split, AdamW optimization, and validation-set temperature scaling. The evaluated metrics were top-1 accuracy, macro one-vs-rest ROC-AUC, negative log likelihood, multiclass Brier score, expected calibration error, predictive entropy, and confusion-matrix/class-level metrics. TinyViT achieved the highest mean calibrated top-1 accuracy, 0.9906 ± 0.0016, while SmallCNN achieved the best mean macro ROC-AUC, 0.9993 ± 0.0005, and the best mean post-calibration ECE, 0.0115 ± 0.0028. Temperature scaling reduced ECE for all models, with reductions of 0.1153 for SmallCNN, 0.0853 for TinyResNet, and 0.1189 for TinyViT. A deterministic language-card module converted calibrated predictions into patient-friendly decision-support text that explicitly includes confidence, uncertainty, visual cue wording, and a non-diagnostic safety caveat.

Copyrights © 2026






Journal Info

Abbrev

jtie

Publisher

Subject

Computer Science & IT

Description

Power Engineering Telecommunication Engineering Computer Engineering Control and Computer Systems Electronics Information technology Informatics Data and Software engineering Biomedical ...