Detecting small and partially hidden objects in rivers and water bodies remains a major challenge for real-time waste detection systems. These objects are often missed due to their small size, low contrast, and cluttered surroundings. Further complicating the task is the lack of dedicated datasets focused on small floating debris, limiting the development of more capable detection models. To bridge this gap, we developed D_six, a custom dataset of 495 high-resolution images capturing six classes of floating waste under real-world conditions. In this study, we improve the YOLOv5s object detection model by integrating atrous convolutions at three key backbone layers: P1/2, P3/8, and P5/32. These layers represent different scales of the feature pyramid, and the strategic placement of atrous convolution at each level plays a crucial role in helping the model recognize small and occluded objects more effectively. Using a dilation rate of 6, the model’s receptive field is expanded without increasing its size or slowing it down. When trained and evaluated on the D_six data set, the FloYO-Net (Floating Object YOLO Network) consistently outperformed the standard YOLOv5s, achieving a mean Average Precision (mAP@0.5) of 0.828 and mAP@0.5:0.95 of 0.509, compared to 0.787 and 0.498 respectively. Improvements were especially notable for hard-to-detect items like plastic bottles and plastic drink containers, with average precision gains of 6.6% and 7.1%, respectively. These results demonstrate that atrous convolution — when thoughtfully placed — can significantly improve detection accuracy, making it a powerful enhancement for real-time environmental cleanup systems.
Copyrights © 2025