PROtek : Jurnal Ilmiah Teknik Elektro
Vol 13 No 2 (2026): Protek : Jurnal Ilmiah Teknik Elektro

Transformer Models in Digital Image Processing: A Systematic Review of Architectures and Applications

Manar Abdulkareem Al-Abaji (University of Mosul)
Meaad Salih (University of Mosul)
Maher Khalaf Hussein (University of Mosul)



Article Info

Publish Date
06 Jun 2026

Abstract

The past few years have seen the explosive and profound revolution in the field of digital image processing, where Transformer-based architectures have dominated a wide range of tasks and replaced the long-standing convolutional counterparts, because the self-attention mechanism in Transformer models, originating from natural language processing, is able to capture long-range spatial relationships in images much more effectively than the inherently limited receptive fields of Convolutional Neural Networks (CNNs). In this paper, we conduct a comprehensive systematic review of Transformer architectures for digital image processing from 2020 to 2026, and we cover the key foundational models, such as Vision Transformer (ViT), Swin Transformer, DeiT and BEiT, and their numerous variants. We follow the development path of these models from simple image classification to complex tasks including object detection, semantic and instance segmentation, image restoration, medical imaging, and generative image synthesis, and we identify four major trends in architectural designs, i.e., purely Transformer-based vision models, CNN-Transformer hybrid architectures, hierarchical windowed attention networks, and diffusion-Transformer fusion models. We also provide a structured comparative analysis of 42 influential methods on 18 benchmark datasets, including their performance trajectories, computational and memory trade-offs, and emerging best practices in model designs. Finally, we also elaborate on the open challenges, such as the quadratic computational cost of standard attention, requirement for large-scale pre-training data, and domain generalization limitations, and summarize the future directions, e.g., more efficient attention, tighter integration of multi-modal information, and light-weight Transformer designs for edge and resource-constrained devices, therefore, this review is a rigorous and timely reference for researchers and practitioners who are interested in improving visual intelligence with Transformer-based methods.

Copyrights © 2026






Journal Info

Abbrev

protk

Publisher

Subject

Computer Science & IT Control & Systems Engineering Electrical & Electronics Engineering Energy

Description

PROtek adalah jurnal ilmiah teknik elektro yang pertama kali dipublikasikan pada September 2013. Jurnal PROtek berada di bawah asuhan Program Studi Teknik Elektro Fakultas Teknik Universitas Khairun, yang merupakan wadah ilmiah untuk menyebarluaskan hasil-hasil penelitian dan kajian analisis yang ...