Service design workflows often begin with low-fidelity sketches that must be quickly translated into interactive prototypes. This paper studies the Sketch-to-Web problem: generating HTML/CSS prototypes from hand-drawn UI sketches and evaluating fidelity with both structural and visual metrics. Because the original Sketch2Code benchmark is distributed primarily as compressed artifacts that are not executable in our restricted runtime, we construct Sketch2Code-Synth, a size-matched and protocol-matched instantiation containing 731 hand-drawn-style sketches paired with 484 webpage prototypes while preserving the same sketch-to-HTML task interface. We implement a lightweight constrained sketch-to-HTML baseline (ProtoVLM) that combines HOG-based template recognition with template-conditioned HTML/CSS instantiation. We compare ProtoVLM against three baselines (kNN retrieval, heuristic computer vision layout extraction, and majority-template generation) and an oracle upper bound. Evaluation uses (i) DOM tree edit distance computed on a containment-induced layout tree, (ii) element-level IoU with Hungarian matching, and (iii) wireframe SSIM on 200×150 rasterized layouts. On the held-out test split (97 pages, 147 sketches), ProtoVLM achieves a mean tree edit distance of 2.224, mean element IoU of 0.755, and mean SSIM of 0.474. Relative to kNN retrieval, the main gain is in localization stability (IoU 0.755 vs. 0.697), while structural distance is similar (TED 2.224 vs. 2.422). Because the benchmark uses a controlled template library and wireframe renderings, the results should be interpreted as evidence on constrained layout recognition and prototype normalization rather than unconstrained real-world sketch understanding. In this setting, SSIM measures layout resemblance only, not interface realism or usability.
Copyrights © 2025