Crowd counting remains a challenge within computer vision due to many factors that affect the performance of available methods such as occlusion, scale variability, and perspective distortion. Additionally, many labels associated with crowd counting systems have high levels of noise caused by various real-world conditions. Although crowd counting methodologies have improved accuracy over recent years, the majority of crowd counting models still rely on clean real-time supervision and lack systems that can correct for dynamically corrupted labels, resulting in low robustness for crowd counting models when deployed in real-world applications. In this work we present a Noise-Robust Crowd Counting with Label Correction (NRCC-LC) framework to obtain reliable density estimates from noisy supervision. To accomplish this, our approach uses a combined CNN-Transformer architecture to capture both locally- and globally-relevant visual information (i.e., image content and context), along with a Noise-Robust Module (NRM) and a Dynamic Label Correction (DLC) mechanism. Our principle experimental results evaluated across four benchmark datasets: ShanghaiTech Part A, ShanghaiTech Part B, NWPU-Crowd, and JHU-Crowd++, indicate that the NRCC-LC exhibits competitive performance with respect to existing state-of-the-art crowd-counting methods; most notably, producing per-image MAEs of 97.8 and 392.3 on NWPU-Crowd. These experimental results additionally have real-world implications for improving public safety and urban planning; thus, through our novel method of noise-aware feature learning combined with iterative label correction, we can establish the potential of automated monitoring systems in complex, real-world environments to be significantly more reliable.
Copyrights © 2026