Real-time streaming architectures are redefining the landscape of urban transit analytics by enabling low latency, data driven decision making. This study evaluates and compares the real time data processing capabilities of public transit systems in London, New York, and Singapore. The objective is to determine how architectural choices, data freshness, and machine learning integration influence key performance indicators such as latency, ETA accuracy, and anomaly detection. The methodology involves a multi city case study, where Kafka based pipelines integrated with Apache Flink and Spark were assessed for ingestion, processing, and service delivery. Datasets included GTFS Realtime, SIRI feeds, and contextual APIs (e.g., speed bands and crowd density). Metrics for evaluation included feed latency, mean absolute error (MAE) and root mean square error (RMSE) for ETA, and response times for anomaly detection. The results demonstrate that Singapore’s transit system outperformed its counterparts with the lowest latency (~12s), highest ETA accuracy (MAE = 18s; RMSE = 25s), and superior anomaly detection via multi sensor fusion. London and New York, while technologically robust, faced constraints due to longer feed update intervals and integration complexities. Kafka ML's online learning enhanced model adaptability, significantly reducing ETA prediction errors across dynamic conditions. Furthermore, stress testing revealed Singapore’s architecture as the most resilient under peak load. The study concludes that the effectiveness of real-time urban transit systems depends on harmonizing streaming infrastructure... Singapore’s architecture may serve as a potential reference model for other cities, while recognizing contextual differences in implementation. Singapore’s architecture offers a scalable template for other cities. Ethical considerations, including data governance and passenger privacy, are essential for sustainable implementation.
Copyrights © 2025