Accurate and real-time people counting is essential for crowd management and public safety, yet achieving precision in high-density environments remains a challenge due to severe visual occlusion. While the recently released YOLO11 architecture introduces advanced features such as C3k2 and C2PSA modules, its performance as a pre-trained model for people counting tasks has not been fully explored. This study evaluates the efficacy of a head-detection-based fine-tuning strategy using the YOLO11 model, compared against the default pre-trained baseline. The fine-tuning performance is analyzed across three distinct scenarios: S1 (full fine-tuning at 960 pixels), S2 (partial backbone freezing at 960 pixels), and S3 (partial freezing at 640 pixels). The fine-tuning process was conducted using the CC_Mach_1 dataset from Roboflow Universe, which consists of high-density images annotated for head detection. The results demonstrate that the baseline pre-trained YOLO11, which relies on full-body features, exhibits extremely limited performance with an mAP@0.5 of 0.017 and a Mean Absolute Error (MAE) of 100.3. In contrast, the fine-tuned scenarios achieved substantial improvements, led by S1 which reached the highest accuracy with an mAP@0.5 of 0.682 and reduced the MAE by 62% to 37.8. While S2 remained highly competitive with an MAE of 39.6, the performance in S3 declined to 46.9, confirming that lower input resolutions limit the model's ability to identify small-scale head features. These findings provide empirical evidence that domain-specific fine-tuning for head detection substantially improves the robustness of YOLO11 against occlusion. Beyond technical accuracy, this detection-based approach offers a more computationally efficient alternative to traditional density-map-based methods, making it highly suitable for deployment in real-time surveillance systems for large-scale public monitoring.