Recent advancements in object detection have demonstrated remarkable performance in autonomous systems; however, most deep learning models still suffer significant accuracy degradation under low-light or adversarial conditions. This study proposes an Explainable Transformer-Based Object Detection (ETOD) framework that integrates Vision Transformer (ViT) architecture with Explainable Artificial Intelligence (XAI) mechanisms to achieve robust and interpretable object detection in adverse environments. The proposed ETOD model employs a dual-branch structure: (i) a low-light enhancement module that uses contrastive illumination normalization to recover critical features, and (ii) a transformer-based detection head optimized for global contextual reasoning. To ensure explainability, Grad-CAM and attention visualization maps are incorporated to highlight the model’s focus regions, providing interpretive insights for human operators and safety auditors. Experimental evaluation was conducted using benchmark datasets (ExDark, BDD100K-Night, and COCO-Adversarial) with simulated adversarial perturbations and low-illumination conditions. The proposed ETOD achieved a 12.8% improvement in mAP over standard DETR and 17.5% higher robustness against adversarial attacks while maintaining real- time inference on edge GPUs. Qualitative analysis demonstrates that the explainability module provides clear visual cues that correlate strongly with detected object boundaries. The findings suggest that integrating transformer- based detection with explainable reasoning mechanisms offers a promising pathway for trustworthy and safety-critical perception systems in autonomous vehicles and drones
Copyrights © 2025