Garuda - Garba Rujukan Digital

Xiaohan Chang

Computer Science, University of Connecticut, CT, USA

Author-ID : 10212058

Computer Science & IT

Published : 2 Documents Claim Missing Document

Claim Missing Document

Articles

Distilling VMAF into an Edge-Deployable Quality Predictor: A Pilot Shot-Level Proxy with LLM-Ready Quality Tokens Xiaohan Chang; Heyu Wang
Journal of Technology Informatics and Engineering Vol. 4 No. 2 (2025): AUGUST | JTIE : Journal of Technology Informatics and Engineering
Publisher : University of Science and Computer Technology

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51903/jtie.v4i2.522

This pilot study evaluates whether a compact student model can approximate VMAF well enough to support low-latency release guarding on edge-class CPU environments. The corpus comprises a 62.31-second Big Buck Bunny excerpt at 1280 × 720 and 25 fps, segmented into 13 shots. Twelve distorted variants were generated by crossing H.264/AVC and H.265/HEVC with 180p, 240p, and 360p delivery resolutions and two quality levels per codec-resolution pair, yielding 156 shot-level samples. Frame-level VMAF scores were aggregated into shot-level teacher labels, and a student proxy consumed 14 low-cost no-reference features derived from decoded frames and stream metadata. Shot-grouped five-fold cross-validation was used to prevent content leakage across train-test splits. On this corpus, a 50-tree gradient-boosted decision tree achieved MAE = 6.56 VMAF points, RMSE = 8.32, and Pearson r = 0.913. Relative to simple regressors, the student reduced MAE by approximately 21.5% versus bitrate-only regression and 10.7% versus metadata-only regression. In a single CPU-only benchmark, predictor latency was 0.484 ms per sample and the full decode-feature-predict chain averaged 42.61 ms versus 1117.41 ms for the teacher, corresponding to a 26.22× end-to-end speed-up. As a thresholded guard, the same student reached F1 = 0.826, 0.893, and 0.900 at 60, 70, and 80 VMAF respectively. These findings support the feasibility of a practical edge proxy on this specific pilot corpus, but they should not be interpreted as broad generalization across content classes or production ladders. The paper also introduces an LLM-ready token interface intended for downstream reporting rather than for replacing the underlying quality measurement

Uncertainty-Aware Medical Vision–Language Classification on a Lightweight MedMNIST-Compatible Biomedical Patch Benchmark Shenghan Lu; Xiaohan Chang; Tracey Zou
Journal of Technology Informatics and Engineering Vol. 5 No. 2 (2026): AUGUST | JTIE : Journal of Technology Informatics and Engineering
Publisher : University of Science and Computer Technology

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51903/jtie.v5i2.530

Medical image classifiers can be accurate while still being unsafe to use when their confidence values are poorly calibrated or when their predictions are communicated in language that overstates diagnostic certainty. This paper presents an uncertainty-aware medical vision-language classification workflow for lightweight 28×28 biomedical images. The target setting is MedMNIST-style classification, where images are standardized to small spatial sizes and where compact CNN, residual, and transformer models can be trained on ordinary hardware. The official MedMNIST v2 collection contains 12 two-dimensional and 6 three-dimensional biomedical image subsets; however, the execution environment used for this manuscript could read the official documentation but could not fetch binary Zenodo files. Three lightweight models were trained and evaluated across three random seeds: a 53,380-parameter CNN, a 392,092-parameter tiny residual network, and a 77,956-parameter tiny Vision Transformer. Each model used the same 2,240/320/640 train/validation/test split, AdamW optimization, and validation-set temperature scaling. The evaluated metrics were top-1 accuracy, macro one-vs-rest ROC-AUC, negative log likelihood, multiclass Brier score, expected calibration error, predictive entropy, and confusion-matrix/class-level metrics. TinyViT achieved the highest mean calibrated top-1 accuracy, 0.9906 ± 0.0016, while SmallCNN achieved the best mean macro ROC-AUC, 0.9993 ± 0.0005, and the best mean post-calibration ECE, 0.0115 ± 0.0028. Temperature scaling reduced ECE for all models, with reductions of 0.1153 for SmallCNN, 0.0853 for TinyResNet, and 0.1189 for TinyViT. A deterministic language-card module converted calibrated predictions into patient-friendly decision-support text that explicitly includes confidence, uncertainty, visual cue wording, and a non-diagnostic safety caveat.

Title

Found 2 Documents
Search

Abstract

Abstract

Title Search

Found 2 Documents Search

Abstract

Abstract

Title

Found 2 Documents
Search