Automated strabismus screening using video is difficult in unconstrained settings, where brief events such as blinking, head movement, or tracking errors can easily be mistaken for true ocular misalignment. The objective of this study is to improve diagnostic specificity while maintaining sensitivity in automated pre-screening scenarios. To address this problem, a temporal analysis framework, termed the Temporal Cross-Eye Regression Network (T-CER-Net), is proposed. The method introduces the Cross-Eye Regression Error (CERE), a scale- and position-invariant temporal signal that characterizes deviations in binocular coordination by measuring prediction error between the two eyes. Rather than relying on frame-level deviation estimates, the approach analyzes extended CERE sequences using a Transformer Encoder to assess temporal consistency. In addition, the training procedure explicitly accounts for real-world variability through oversampling of normal sequences containing common artifacts and the use of class weighting. The proposed method was evaluated against static threshold-based classifiers and a CNN–LSTM temporal baseline. On a held-out test set, T-CER-Net achieved an area under the ROC curve of 0.9140, with a sensitivity of 0.8421 and a specificity of 0.8500, showing improved robustness to noise-induced false positives. The findings suggest that treating binocular misalignment as a temporal pattern, together with attention-based sequence analysis, offers a practical and robust basis for automated strabismus pre-screening in real-world settings.
Copyrights © 2026