This study develops a modular CNN encoder–decoder framework for single-image dehazing by replacing the conventional bottleneck with interchangeable token-mixing modules such as FNet, Spatial-FNet, MLP-Mixer, and gMLP-style designs. The pipeline integrates adaptive preprocessing (CLAHE and histogram matching), photometric augmentations, and training on a controlled subset of the SOTS dataset. Comprehensive quantitative and qualitative evaluations demonstrate substantial improvements over a baseline CNN, with mean PSNR increasing from approximately 18.4 dB to the 23.0–24.0 dB range and SSIM rising from about 0.75 to roughly 0.89–0.91. However, several variants require careful hyperparameter selection and loss-weight tuning to achieve stable performance. The results offer practical guidance for deployment in real-world vision systems.
Copyrights © 2026