Recognition problems, including object detection, scene understanding, and fine-grained categorisation, are popular subjects in computer vision. However, it is challenging to model spatial coherence and contextual dependencies in response to changes in configurations. Human Vs computers' ability in perception-although convolutional neural networks (CNNs) do well in the extraction of features, they have high dependence on local receptive fields and are not able to capture long-range spatial relationships and high-order interactions. To alleviate the shortcomings of the current approaches, we present an enhanced hybrid CNNs two dimensional hidden Markov model (2D-HMM) framework that combines 2D-HMM, Markov random fields (MRF) and variational autoencoders (VAEs) into a single model. The model employs 2D-HMMs for pairwise spatial modelling, MRFs for higher order context, and VAEs for stable latent representation learning. Tested on the MNIST and CIFAR-10 benchmark datasets, our approach consistently outperforms the state-of-the-art performance by 98.2% and 89.5%, respectively, with high robustness to noise and occlusion. Results from ablation studies further show that MRFs improve recall by 1.6% and VAEs improve precision by 1.3%, suggesting that they complement each other sufficiently with respect to overall testing performance. This work unifies deep learning and probabilistic graphical models, leading to more interpretable, scalable, and accurate recognition systems.
Copyrights © 2026