Despite growing interest in automated waste detection, existing surveys either focus on a narrow set of models or lack systematic comparisons across object detection paradigms. This review addresses that gap by examining recent advances in deep learning for waste management, spanning two-stage detectors (Faster region-based convolutional neural network (Faster R-CNN) and Mask region-based convolutional neural network (Mask R-CNN)), single-shot frameworks (you only look once version 1 (YOLO)v1 to YOLOv11), and emerging Transformer-based models (ViT-WM and AL-DETR). Faster R-CNN achieved category-level accuracy of 91.68% and overall accuracy of 89.68%, while Mask R-CNN reported AP values between 26.2% and 34.5% across varied datasets. YOLO models demonstrated strong real-time capability, with YOLOv5 reaching a mAP@0.5 of 92.96% and YOLOv8 achieving 97.63% accuracy with precision and recall above 93%. Transformer-based approaches are especially promising: ViT-WM achieved 98.17% accuracy, the highest among reviewed models, and AL-DETR reported a mAP of 58.9% while integrating active learning (AL) strategies to reduce reliance on extensive labeled data. These results emphasize YOLO’s efficiency for real-time waste sorting and the potential of Transformer architectures for handling complex, cluttered environments. Remaining challenges include dataset variability, computational demand, and limited standardized benchmarks. Future research should prioritize developing comprehensive datasets, optimizing Transformers for real-time use, and leveraging AL to enhance generalizability with reduced annotation effort.
Copyrights © 2026