In modern agriculture, quickly identifying agricultural pests is essential for maintaining high crop yields and ensuring global food security. In diverse and dynamic agricultural environments, traditional pest detection methods exhibit reduced accuracy, limited scalability, and lack interpretability. In this study, EfficientNetV2-L and Grad-CAM were used to significantly enhance pest detection system performance and transparency. EfficientNetV2-L, a fast and resource-efficient model, excels particularly in computationally constrained environments. Traditional CNN models, including EfficientNetV2-L, are criticized as uninterpretable "black boxes" despite their high accuracy. To address this issue, Grad-CAM was used to generate salient maps that visually show the most influential areas of the input image in the model’s decision-making process. This combination not only provides superior pest detection accuracy but also provides actionable insights into the model’s predictions, which is an important feature for building trust among agricultural practitioners. Our experimental results show a 15% improvement in detection accuracy compared to conventional models, especially in identifying visually similar-looking pest species that are often misclassified. In addition, the enhanced interpretability provided by Grad-CAM has led to a deeper understanding of the model’s behaviour, enabling iterative adjustments and improvements that further enhance the reliability of the system. The practical implications of these findings are significant: this integrated model offers a robust solution that can be seamlessly applied to real-time agricultural monitoring systems. With the early detection and proper classification of pests, this model can be used as a more effective pest management strategy to minimize crop damage and increase agricultural productivity. This research not only advances the technological frontier of pest detection but also contributes to broader goals related to sustainable agriculture and food security. Future research will focus on expanding the applicability of this model across different agricultural contexts, improving its adaptability to different environmental conditions, and further optimizing its performance through advanced techniques such as transfer learning and ensemble methods.