Flooding is one of the most frequent natural disasters in Indonesia, particularly in densely populated areas such as urban regions. The main cause is the delayed response in anticipating rising river water levels. One contributing factor is the continued use of manual river water monitoring systems. However, these systems often face challenges under various lighting and weather conditions. This study presents a comparison of two segmentation models, YOLOv11 and Mask R-CNN, for river water level detection. These models are evaluated for their application in real-time water level monitoring systems for dams and rivers under diverse lighting conditions. Data was gathered from publicly available sources, including river monitoring CCTV footage and social media content related to river activities, followed by annotation for model training. The YOLOv11 model, implemented using the Ultralytics framework and PyTorch library, achieved a mean Average Precision (mAP) at IoU (Intersection over Union) 50-95 of 99.657% and recall of 99.930%, demonstrating exceptional detection accuracy. The Mask R-CNN model, developed with Detectron2, attained an Average Precision (AP) at IoU 50-95 of 98.620% and a recall of 99.200%, also exhibiting high accuracy. Both models were tested in real-time scenarios, where they accurately detected water-level objects, although challenges arose under complex environmental conditions such as low light or water turbidity. To further enhance model performance, future work will focus on incorporating diverse environmental data and optimizing model parameters. In conclusion, YOLOv11 model offers higher accuracy and better resource efficiency, making it more suitable for real-time water level monitoring applications.