Chen, Yushan
Unknown Affiliation

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

From Hand-Drawn Sketches to Interactive Web Prototypes: A Reproducible Vision-Language Approach with Structural and Visual Consistency Evaluation Chen, Yushan; Li, Maoxi
Journal of Technology Informatics and Engineering Vol. 4 No. 2 (2025): AUGUST | JTIE : Journal of Technology Informatics and Engineering
Publisher : University of Science and Computer Technology

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51903/jtie.v4i2.490

Abstract

Service design workflows often begin with low-fidelity sketches that must be quickly translated into interactive prototypes. This paper studies the Sketch-to-Web problem: generating HTML/CSS prototypes from hand-drawn UI sketches and evaluating fidelity with both structural and visual metrics. Because the original Sketch2Code benchmark is distributed primarily as compressed artifacts that are not executable in our restricted runtime, we construct Sketch2Code-Synth, a size-matched and protocol-matched instantiation containing 731 hand-drawn-style sketches paired with 484 webpage prototypes while preserving the same sketch-to-HTML task interface. We implement a lightweight constrained sketch-to-HTML baseline (ProtoVLM) that combines HOG-based template recognition with template-conditioned HTML/CSS instantiation. We compare ProtoVLM against three baselines (kNN retrieval, heuristic computer vision layout extraction, and majority-template generation) and an oracle upper bound. Evaluation uses (i) DOM tree edit distance computed on a containment-induced layout tree, (ii) element-level IoU with Hungarian matching, and (iii) wireframe SSIM on 200×150 rasterized layouts. On the held-out test split (97 pages, 147 sketches), ProtoVLM achieves a mean tree edit distance of 2.224, mean element IoU of 0.755, and mean SSIM of 0.474. Relative to kNN retrieval, the main gain is in localization stability (IoU 0.755 vs. 0.697), while structural distance is similar (TED 2.224 vs. 2.422). Because the benchmark uses a controlled template library and wireframe renderings, the results should be interpreted as evidence on constrained layout recognition and prototype normalization rather than unconstrained real-world sketch understanding. In this setting, SSIM measures layout resemblance only, not interface realism or usability.  
Automatic Detection and Explanation of Dark Patterns from Interface Microcopy: Empirical Comparison of BERT-Style Encoders, RoBERTa-Style Encoders, and LLM-Style Decoders on the ec-darkpattern Dataset Xu, Haosen; Chen, Yushan; Med, Aron
Journal of Technology Informatics and Engineering Vol. 4 No. 3 (2025): DECEMBER | JTIE : Journal of Technology Informatics and Engineering
Publisher : University of Science and Computer Technology

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51903/jtie.v4i3.491

Abstract

Dark patterns (also called deceptive design patterns) are interface choices that steer or pressure users into unintended actions such as rushed purchases, unnecessary disclosures, or hard-to-cancel subscriptions. In e-commerce, many dark patterns are expressed directly in microcopy (e.g., button labels, banners, and inline messages), which makes text-only detection attractive for scalable auditing. This paper presents a fully reproducible experimental study on ec-darkpattern, a text-based dataset of e-commerce interface strings with balanced binary labels (1,178 dark pattern vs. 1,178 non-dark pattern) and seven dark pattern categories. We compare (i) a rule-based lexicon baseline, (ii) hashed n-gram linear models, (iii) a lightweight BERT-style bidirectional transformer encoder with word tokenization, (iv) a lightweight RoBERTa-style bidirectional transformer encoder with character tokenization, and (v) an LLM-style causal decoder trained as a classifier on the same inputs. On a fixed 80/10/10 split with seed 42, the best-performing model is a hashing + linear SVM baseline (F1=0.9437, ROC-AUC=0.9810), while the BERT-style encoder achieves F1=0.9038 (ROC-AUC=0.9695), the RoBERTa-style encoder achieves F1=0.8907 (ROC-AUC=0.9573), and the LLM-style decoder achieves F1=0.7884 (ROC-AUC=0.8808). These results should be interpreted as a controlled comparison under low-resource, no-pretraining conditions on a single fixed split, rather than as a general ranking of encoder-style versus decoder-style transformers. To support explainability, we generate token-level attributions using gradient-based saliency, summarize them as key phrases, and estimate explanation consistency via top-k token overlap on an exploratory 20-instance sample (mean Jaccard up to 0.7482 between the two character-based transformers). Finally, we curate an error-case library that links misclassifications to their most influential phrases. Within this short-microcopy setting, the findings show that lexical baselines are especially strong, while transformer directionality and tokenization change both accuracy and the stability of highlighted cues.