Heyu Wang
Electrical and Computer Engineering, Rice University, TX, USA

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Distilling VMAF into an Edge-Deployable Quality Predictor: A Pilot Shot-Level Proxy with LLM-Ready Quality Tokens Xiaohan Chang; Heyu Wang
Journal of Technology Informatics and Engineering Vol. 4 No. 2 (2025): AUGUST | JTIE : Journal of Technology Informatics and Engineering
Publisher : University of Science and Computer Technology

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51903/jtie.v4i2.522

Abstract

This pilot study evaluates whether a compact student model can approximate VMAF well enough to support low-latency release guarding on edge-class CPU environments. The corpus comprises a 62.31-second Big Buck Bunny excerpt at 1280 × 720 and 25 fps, segmented into 13 shots. Twelve distorted variants were generated by crossing H.264/AVC and H.265/HEVC with 180p, 240p, and 360p delivery resolutions and two quality levels per codec-resolution pair, yielding 156 shot-level samples. Frame-level VMAF scores were aggregated into shot-level teacher labels, and a student proxy consumed 14 low-cost no-reference features derived from decoded frames and stream metadata. Shot-grouped five-fold cross-validation was used to prevent content leakage across train-test splits. On this corpus, a 50-tree gradient-boosted decision tree achieved MAE = 6.56 VMAF points, RMSE = 8.32, and Pearson r = 0.913. Relative to simple regressors, the student reduced MAE by approximately 21.5% versus bitrate-only regression and 10.7% versus metadata-only regression. In a single CPU-only benchmark, predictor latency was 0.484 ms per sample and the full decode-feature-predict chain averaged 42.61 ms versus 1117.41 ms for the teacher, corresponding to a 26.22× end-to-end speed-up. As a thresholded guard, the same student reached F1 = 0.826, 0.893, and 0.900 at 60, 70, and 80 VMAF respectively. These findings support the feasibility of a practical edge proxy on this specific pilot corpus, but they should not be interpreted as broad generalization across content classes or production ladders. The paper also introduces an LLM-ready token interface intended for downstream reporting rather than for replacing the underlying quality measurement
Layout-Aware Progressive PDF Rendering: AI Prioritization of PDF Slices to Reduce Time-to-Functional-First-Frame on FUNSD Heyu Wang; Yuxuan Ren
Journal of Technology Informatics and Engineering Vol. 4 No. 2 (2025): AUGUST | JTIE : Journal of Technology Informatics and Engineering
Publisher : University of Science and Computer Technology

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51903/jtie.v4i2.523

Abstract

Progressive PDF rendering is attractive because users rarely need every visible pixel at once; they need the semantically useful parts of the current viewport early enough for reading and interaction. This paper studies whether layout-aware AI can prioritize PDF slices more effectively than geometric or density-based heuristics. We reconstruct vector PDFs from official FUNSD form annotations and evaluate a tile scheduler that predicts tile utility from inexpensive layout and preview features before high-resolution rendering begins. The empirical study covers 26 reconstructed documents from the FUNSD test split that were fully processed in the present environment, four viewport scenarios, and measured clip-render timings for all visible tiles. The main configuration uses an 8×10 grid and a random-forest regressor trained with page-level 5-fold GroupKFold, then compares the learned scheduler with row-major visible-first, center-first, ink-density, text-density, a hand-tuned layout heuristic, full-page rendering, and an oracle upper bound. The proposed model reaches TTFF-90 in 14.21 ms, compared with 15.18 ms for the best non-AI heuristic, 20.48 ms for full-page rendering, and 24.09 ms for row-major rendering. It also achieves Utility@20ms of 0.941, AUC@25ms of 0.730, NDCG@10 of 0.963, and Recall@10 of 0.969. The results show that slice rendering is not inherently beneficial: the summed visible-tile cost in the main 8×10 setting is 28.80 ms, which is higher than the full-page cost of 20.48 ms, so scheduling quality determines whether slicing improves or harms TTFF. A coarser 6×8 grid reduces AI TTFF-90 to 10.58 ms, while the densest pages favor a full-page fallback. Paired Wilcoxon signed-rank tests over the page-scenario cases yield p < .001 for TTFF-90 improvements of the proposed model over every non-AI baseline. However, those tests should be interpreted as case-level rather than document-level evidence.