Detecting multiple spermatozoa in microscopic videos remains a complex challenge due to their small size, high velocity, frequent overlap, and inconsistent illumination. This study introduces an enhanced real-time detection framework using the YOLOv5 deep learning algorithm, representing a significant advancement over previous Computer-Assisted Sperm Analysis (CASA) systems that primarily relied on classical image processing or earlier YOLO versions (e.g., YOLOv3, YOLOv4). Unlike these predecessors, the proposed YOLOv5-based model integrates Cross Stage Partial (CSP) architecture and optimized feature pyramid networks, allowing for superior detection of small, fast-moving spermatozoa with reduced computational complexity and model size. A curated dataset of sperm motility videos was processed through standardized steps—frame extraction, contrast enhancement, and manual annotation—to ensure uniformity and data quality. The model, trained via transfer learning on images of 640×640 pixels over 50 epochs, achieved a precision of 0.6333, recall of 0.627, and mAP@0.5 of 0.618, while maintaining real-time performance at 93 frames per second (FPS). Compared to YOLOv4, the proposed framework reduced training time by two-thirds (from 3 hours to 1 hour) and decreased model size from 244 MB to 13.8 MB, without compromising accuracy. These improvements establish YOLOv5 as a lightweight and scalable AI model for sperm detection, enabling automated, objective, and reproducible motility assessment. Clinically, this approach enhances the precision and consistency of male fertility diagnostics, paving the way toward AI-driven reproductive health evaluation and more accessible fertility screening solutions in both advanced and resource-limited laboratory settings.