Skip to main content

Abstract: Precision geocoding in a multifunctional diabetes data integration system to support predictive modeling and population health analytics

M. Barnes1, M. Clements1,2

1Children's Mercy Kansas City, Pediatrics, Kansas City, United States, 2University of Missouri-Kansas City School of Medicine, Pediatrics, Kansas City, United States

Introduction: Socioeconomic status (SES) influences diabetes related outcomes and SES-related features may improve the performance of predictive models for diabetes-related outcomes. The steps required to obtain these data can be time consuming, expensive, and can expose Protected Health Information to outside services.

Objectives: We sought to implement a method to automatically geocode patient address data entirely within a clinic-hosted diabetes data integration system (D-Data Dock), in order to support SES- and other geolocation-derived feature engineering.

Methods: DeGAUSS (DecentralizedGeomarkerAssessment for Multi-SiteStudies) is standalone software capable of geocoding location data without exposing data to third party services. The use of Azure-API to connect to a local Docker image of DeGAUSS outputs a range of geocoding variables (latitude/longitude/FIPS [i.e., county] codes) and their precision. Combined with local copies of Census variables across counties within the Midwest USA, we have generated and continuously updated a multi-variable SES-related dataset for our clinic.

Results: Of the 19,763 addresses for our current and historic pediatric diabetes population, all but 1297 successfully returned geocoded results with the desired precision of 0.5 or greater. Those that failed had an erroneous or invalid address (PO Box, low precision, etc.). The results allowed mapping to Census tract-derived features, including the American Community Survey-derived community deprivation index.

Conclusions: The deployment of DeGAUSS in a clinical diabetes data integration system allows geocoding without the need for external, premium services like Google Maps and ArcGIS. We are currently seeking to expand the current functionality with new images providing data for census block, weather, food desert, and traffic features. Whether such features can enhance the performance of predictive models for near-term diabetes outcomes (e.g., worsening glycemic control or hospitalization for diabetic ketoacidosis) remains to be determined.

Link: https://onlinelibrary.wiley.com/doi/full/10.1111/pedi.13399