Mi, Gaotian
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

A Lightweight Medical Foundation Model for Cross-Modal Multi-Task Pretraining and Parameter-Efficient Few-Shot Transfer on MedMNIST Mi, Gaotian; Ye, Tong; Wood, Dan
Journal of Technology Informatics and Engineering Vol. 4 No. 3 (2025): DECEMBER | JTIE : Journal of Technology Informatics and Engineering
Publisher : University of Science and Computer Technology

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51903/jtie.v4i3.492

Abstract

Medical imaging has rapidly adopted pre-trained backbones, yet many transfer-learning pipelines remain expensive to train and difficult to adapt when data, compute, or privacy constraints limit full fine-tuning. We present STMedFM, a lightweight medical multi-task backbone baseline designed for fast prototyping across 2D images and 3D volumes. STMedFM uses modality-specific convolutional stems (2D and 3D) and a shared low-depth encoder, and it supports parameter-efficient transfer via Low-Rank Adaptation (LoRA) and bottleneck adapters. We pretrain STMedFM with supervised multi-task learning on four MedMNIST tasks (PathMNIST, BloodMNIST, DermaMNIST, and OrganMNIST3D) using official train/validation/test splits. We then compare (i) training from scratch, (ii) full fine-tuning from the multi-task checkpoint, and (iii) parameter-efficient fine-tuning (LoRA or adapters) that updates only a small fraction of parameters. Under a fixed compute budget (200 pretraining steps; 120 fine-tuning steps for 2D tasks; 50 steps for the 3D task), multi-task pretraining improved performance on PathMNIST (test accuracy 0.568 → 0.634; macro AUROC 0.886 → 0.914) and preserved most gains under PEFT (LoRA AUROC 0.909; Adapter AUROC 0.913) while training only 4,041–5,225 parameters versus 160,105 for full fine-tuning. For DermaMNIST, pretraining increased macro AUROC from 0.746 (Scratch, weighted) to 0.756 (Pretrain+Full), with similar AUROC under LoRA (0.760) and Adapter (0.763). In contrast, BloodMNIST and OrganMNIST3D showed mixed behavior, including cases where Scratch outperformed pretrained variants, indicating that transfer in this compact shared encoder is task-dependent and budget-sensitive. Calibration results were similarly non-monotonic: methods with better AUROC did not always achieve lower ECE. Overall, our results show that a small cross-modal multi-task model can serve as a practical MedMNIST-scale transfer baseline and that LoRA/adapters offer substantial parameter savings when task alignment is favorable. STMedFM should therefore be viewed as a lightweight supervised multi-task backbone on benchmark-scale tasks rather than a broadly general medical foundation model.