Geocoded data

Geocoding is the process of converting addresses to latitude/longitude coordinates. Geocoding individual patient address records allows researchers to examine how social, physical, and built environment factors influence health outcomes. Patient addresses are the most granular place-based information available, and once geocoded, can be linked out to other publicly available data. Geocoding can also allow healthcare workers to map where services are do and conduct targeted public health outreach within patients’ neighborhoods. 


What is available for UCSF researchers to use? 

The Population Health Data Initiative has geocoded residential address data for UCSF Health and San Francisco Health Network (SFHN) patients, including some historical addresses and most current address as of August 14, 2023. 

The geocoded data includes latitude/longitude coordinates for patient addresses as well as both 2010 and 2020 census tract IDs and block group IDs. Therefore, you can use it to link in any place-based data using census tract ID or block group ID. 

How do I access the data? 

Because patient address data is protected health information (PHI), researchers much submit an IRB to access geocoded data. If requesting data from SFHN, you much also submit the SFDPH Research Protocol Form and Statement of work. 

See more about requirements for requesting EHR data here. 

Once an IRB is approved, researchers can request a consultation and data extraction from UCSF Academic Research Services (ARS). ARS analysts can provide census tract identifiers, which are sufficient for linking most neighborhood-level characteristics. If you would like more granular geocoded data, such at latitude/longitude or block group identifers, please request a consultation with the PHDI team prior to your request for ARS data extraction.

ARS will provide the data via the secure UCSF Research Analysis Environment (RAE). 


What are neighborhood-level characteristics?

Individual-level characteristics relate to individuals, such as a person’s health status, education level, or income. Neighborhood-level characteristics, on the other hand, relate to a defined geographic area. For example, a measure of neighborhood education may include the proportion of individuals in a census tract who have completed high school. 

What are census tracts?

Census tracts are units used to approximate neighborhoods defined by the US Census Bureau. Census tracts tend to be geographically smaller in places with dense populations as the Census Bureau attempts to draw tracts around places where around 4,000 people live. Census tracts do not cross county or state boundaries, but may span city or municipal boundaries.

What is neighborhood socioeconomic status (nSES)?

Neighborhood socioeconomic status (nSES) is composite measure of seven indicator variables created by principal component analysis; indicators include: education, blue collar job, unemployment, household income, poverty, rent, and house value. Quintiles are based on the state distribution, with quintile 1 being the lowest SES and 5 being the highest.

What is the difference between 2010 and 2020 Census shapes?

Every ten years the US Census Bureau adjusts the boundaries of census geographies to account for significant population changes. Based on population count, census tracts may be split or merged, or new ones may be created, to attempt to maintain approximately 4,000 people per tract. New census-defined boundaries were introduced for U.S. Census data released in 2020.

How do I merge the data with Health Atlas variables?

To link geocoded patient data with public place-based data, researchers can download neighborhood data from Health Atlas and merge the two datasets on census tract code (labeled ‘geoid’ in Health Atlas). 


Geocoded data documentation

Information about the Population Health Data Initiative geocoded data can be found here.  This folder includes: 

  1. PHDI Documentation Summary: summary of available geocoded data and resources 

  1. PHDI Background: background, utility and example use cases of geocoded data 

  1. Geocoding Process: description of process for geocoding patient addresses 

  1. Data Dictionaries: description of variables in geocoded datasets  

Helpful references

Liu E, Rubinsky AD, Pacca L, Mujahid M, Fontil V, DeRouen MC, Fields J, Bibbins-Domingo K, and Lyles C. Examining Neighborhood Socioeconomic Status as a Mediator of Racial/Ethnic Disparities in Hypertension Control Across Two San Francisco Health Systems. Circulation: Cardiovascular Quality and Outcomes. 2022 Jan 31. doi: 10.1161/CIRCOUTCOMES.121.008256 

Nouri S, Lyles CR, Rubinsky AD, Patel K, Desai R, Fields J, DeRouen MC, Volow A, Bibbins-Domingo K, Sudore RL. Evaluation of Neighborhood Socioeconomic Characteristics and Advance Care Planning Among Older Adults. JAMA Netw Open. 2020 Dec 1;3(12):e2029063. doi: 10.1001/jamanetworkopen.2020.29063

Nguyen TT, Nguyen QC, Rubinsky AD, Tasdizen T, Deligani AHN, Dwivedi P, Whitaker R, Fields JD, DeRouen MC, Mane H, Lyles CR, Brunisholz KD, Bibbins-Domingo K. Google Street View-Derived Neighborhood Characteristics in California Associated with Coronary Heart Disease, Hypertension, Diabetes. Int J Environ Res Public Health. 2021 Oct 3;18(19):10428. doi: 10.3390/ijerph181910428

Diez Roux AV. Estimating neighborhood health effects: the challenges of causal inference in a complex world. Soc Sci Med. 2004 May;58(10):1953-60. doi: 10.1016/S0277-9536(03)00414-3

Hatef E, Ma X, Rouhizadeh M, Singh G, Weiner JP, Kharrazi H. Assessing the Impact of Social Needs and Social Determinants of Health on Health Care Utilization: Using Patient- and Community-Level Data. Popul Health Manag. 2021 Apr;24(2):222-230. doi: 10.1089/pop.2020.0043