Civil Registration and Vital Statistics (CRVS) systems in archipelagic contexts likeIndonesia face persistent challenges in location data standardisation due to free-text entries thatvary in spelling, formatting, and granularity. This study introduces a multi-stage hybridframework that systematically converts these unstructured entries into official administrativecodes using deterministic matching, fuzzy probabilistic matching, and geocoding. This studyprocessed 841,126 birth and death records using Python (Pandas, RapidFuzz, Geopy).Cumulatively, all stages achieved a combined match rate of 85.44% for births and 67.12% fordeaths. The layered pipeline ensured speed, precision, and coverage for real-world CRVS data.The findings demonstrate enhanced geographic precision in vital statistics, enabling morereliable public health and demographic applications. Future improvements may includetransformer-based embeddings, active learning for ambiguous records, and uncertainty-awaregeocoding techniques. This framework establishes a scalable, robust pathway for elevating thegranularity and reliability of geolocated vital event data.
Copyrights © 2025