Journal of Technology Informatics and Engineering
Vol. 4 No. 3 (2025): DECEMBER | JTIE : Journal of Technology Informatics and Engineering

A Lightweight Medical Foundation Model for Cross-Modal Multi-Task Pretraining and Parameter-Efficient Few-Shot Transfer on MedMNIST

Mi, Gaotian (Unknown)
Ye, Tong (Unknown)
Wood, Dan (Unknown)



Article Info

Publish Date
25 Aug 2025

Abstract

Medical imaging has rapidly adopted pre-trained backbones, yet many transfer-learning pipelines remain expensive to train and difficult to adapt when data, compute, or privacy constraints limit full fine-tuning. We present STMedFM, a lightweight medical multi-task backbone baseline designed for fast prototyping across 2D images and 3D volumes. STMedFM uses modality-specific convolutional stems (2D and 3D) and a shared low-depth encoder, and it supports parameter-efficient transfer via Low-Rank Adaptation (LoRA) and bottleneck adapters. We pretrain STMedFM with supervised multi-task learning on four MedMNIST tasks (PathMNIST, BloodMNIST, DermaMNIST, and OrganMNIST3D) using official train/validation/test splits. We then compare (i) training from scratch, (ii) full fine-tuning from the multi-task checkpoint, and (iii) parameter-efficient fine-tuning (LoRA or adapters) that updates only a small fraction of parameters. Under a fixed compute budget (200 pretraining steps; 120 fine-tuning steps for 2D tasks; 50 steps for the 3D task), multi-task pretraining improved performance on PathMNIST (test accuracy 0.568 → 0.634; macro AUROC 0.886 → 0.914) and preserved most gains under PEFT (LoRA AUROC 0.909; Adapter AUROC 0.913) while training only 4,041–5,225 parameters versus 160,105 for full fine-tuning. For DermaMNIST, pretraining increased macro AUROC from 0.746 (Scratch, weighted) to 0.756 (Pretrain+Full), with similar AUROC under LoRA (0.760) and Adapter (0.763). In contrast, BloodMNIST and OrganMNIST3D showed mixed behavior, including cases where Scratch outperformed pretrained variants, indicating that transfer in this compact shared encoder is task-dependent and budget-sensitive. Calibration results were similarly non-monotonic: methods with better AUROC did not always achieve lower ECE. Overall, our results show that a small cross-modal multi-task model can serve as a practical MedMNIST-scale transfer baseline and that LoRA/adapters offer substantial parameter savings when task alignment is favorable. STMedFM should therefore be viewed as a lightweight supervised multi-task backbone on benchmark-scale tasks rather than a broadly general medical foundation model.

Copyrights © 2025






Journal Info

Abbrev

jtie

Publisher

Subject

Computer Science & IT

Description

Power Engineering Telecommunication Engineering Computer Engineering Control and Computer Systems Electronics Information technology Informatics Data and Software engineering Biomedical ...