3. Additional indices of socioeconomic disparity

Ian D. Buller (GitHub: @idblr)

2024-08-30

Start with the necessary packages for the vignette.

loadedPackages <- c('dplyr', 'ggplot2', 'ndi', 'sf', 'tidycensus', 'tigris')
invisible(lapply(loadedPackages, library, character.only = TRUE))
options(tigris_use_cache = TRUE)

Set your U.S. Census Bureau access key. Follow this link to obtain one. Specify your access key in the functions below using the key argument of the get_acs() function from the tidycensus package called within each or by using the census_api_key() function from the tidycensus package before running the functions.

census_api_key('...') # INSERT YOUR OWN KEY FROM U.S. CENSUS API

Additional indices of socioeconomic disparity

Since version v0.1.1, the ndi package can use data from the ACS to compute additional indices of socioeconomic disparity, including:

Compute income Atkinson Index (A)

Compute the income A values (2017-2021 5-year ACS) for census block groups within counties of Kentucky. This metric is based on Atkinson (1970) that assessed the distribution of income within 12 counties. To compare median household income, specify subgroup = 'MedHHInc' which will use the ACS variable ‘B19013_001’ in the computation and uses the Hölder mean. A is a measure of the inequality when comparing smaller geographical units to larger ones within which the smaller geographical units are located. A can range in value from 0 to 1 and smaller values of the index indicate lower levels of income inequality.

A is sensitive to the choice of epsilon argument or the shape parameter that determines how to weight the increments to inequality contributed by different proportions of the Lorenz curve. A user must explicitly decide how heavily to weight smaller geographical units at different points on the Lorenz curve (i.e., whether the index should take greater account of differences among areas of over- or under-representation). The epsilon argument must have values between 0 and 1.0. For 0 <= epsilon < 0.5 or less ‘inequality-averse,’ smaller geographical units with a subgroup proportion smaller than the subgroup proportion of the larger geographical unit contribute more to inequality (‘over-representation’). For 0.5 < epsilon <= 1.0 or more ‘inequality-averse,’ smaller geographical units with a subgroup proportion larger than the subgroup proportion of the larger geographical unit contribute more to inequality (‘under-representation’). If epsilon = 0.5 (the default), units of over- and under-representation contribute equally to the index. See Section 2.3 of Saint-Jacques et al. (2020) for one method to select epsilon. We choose epsilon = 0.67 in the example below:

atkinson2021KY <- atkinson(
  geo_large = 'county',
  geo_small = 'block group',
  state = 'KY',
  year = 2021,
  subgroup = 'MedHHInc',
  epsilon = 0.33
)

# Obtain the 2021 counties from the 'tigris' package
county2021KY <- counties(state = 'KY', year = 2021, cb = TRUE)

# Join the A values to the county geometries
KY2021atkinson <- county2021KY %>% 
  left_join(atkinson2021KY$a, by = 'GEOID')
# Visualize the A values (2017-2021 5-year ACS) for census block groups within counties of Kentucky
ggplot() +
  geom_sf(
    data = KY2021atkinson,
    aes(fill = A),
    size = 0.05,
    color = 'white'
  ) +
  geom_sf(
    data = county2021KY,
    fill = 'transparent',
    color = 'white',
    size = 0.2
  ) +
  theme_minimal() +
  scale_fill_viridis_c() +
  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2017-2021 estimates') +
  ggtitle(
    'Atkinson Index (Atkinson)\nCensus block groups within counties of Kentucky',
    subtitle = expression(paste('Median Household Income (', epsilon, ' = 0.33)'))
  )

Compute Educational Isolation Index (EI)

Compute the spatial EI (Bravo) values (2006-2010 5-year ACS) for census tracts of Oklahoma. This metric is based on Bravo et al. (2021) that assessed the educational isolation of the population without a four-year college degree. Multiple educational attainment categories are available in the bravo() function, including:

ACS table source educational attainment category character for subgroup argument
B06009_002 less than high school graduate LtHS
B06009_003 high school graduate (includes equivalency) HSGiE
B06009_004 some college or associate’s degree SCoAD
B06009_005 Bachelor’s degree BD
B06009_006 graduate or professional degree GoPD

Note: The ACS-5 data (2005-2009) uses the ‘B15002’ question.

A census geography (and its neighbors) that has nearly all of its population with the specified educational attainment category (e.g., a four-year college degree or more) will have an EI (Bravo) value close to 1. In contrast, a census geography (and its neighbors) that is nearly none of its population with the specified educational attainment category (e.g., with a four-year college degree) will have an EI (Bravo) value close to 0.

bravo2010OK <- bravo(state = 'OK', year = 2010, subgroup = c('LtHS', 'HSGiE', 'SCoAD'))

# Obtain the 2010 census tracts from the 'tigris' package
tract2010OK <- tracts(state = 'OK', year = 2010, cb = TRUE)
# Remove first 9 characters from GEOID for compatibility with tigris information
tract2010OK$GEOID <- substring(tract2010OK$GEO_ID, 10) 

# Obtain the 2010 counties from the 'tigris' package
county2010OK <- counties(state = 'OK', year = 2010, cb = TRUE)

# Join the EI  values to the census tract geometries
OK2010bravo <- tract2010OK %>%
  left_join(bravo2010OK$ei, by = 'GEOID')
# Visualize the EI values (2006-2010 5-year ACS) for census tracts of Oklahoma
ggplot() +
  geom_sf(
    data = OK2010bravo,
    aes(fill = EI),
    size = 0.05,
    color = 'transparent'
  ) +
  geom_sf(
    data = county2010OK,
    fill = 'transparent',
    color = 'white',
    size = 0.2
  ) +
  theme_minimal() +
  scale_fill_viridis_c(limits = c(0, 1)) +
  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
  ggtitle(
    'Educational Isolation Index (Bravo)\nCensus tracts of Oklahoma',
    subtitle = 'Without a four-year college degree (not corrected for edge effects)'
  )

Can correct one source of edge effect in the same manner as shown for the RI metric in vignette 2. Racial or Ethnic Residential Segregation Indices

The income Gini Index (G)

Retrieve the income Gini Index (G) values (2006-2010 5-year ACS) for census tracts within counties of Massachusetts. This metric is based on Gini (1921), and the gini() function retrieves the estimate from the ACS-5 when calculating the Gini Index (G) for racial or ethnic inequality.

According to the U.S. Census Bureau: ‘The Gini Index is a summary measure of income inequality. The Gini coefficient incorporates the detailed shares data into a single statistic, which summarizes the dispersion of income across the entire income distribution. The Gini coefficient ranges from 0, indicating perfect equality (where everyone receives an equal share), to 1, perfect inequality (where only one recipient or group of recipients receives all the income). The Gini Index is based on the difference between the Lorenz curve (the observed cumulative income distribution) and the notion of a perfectly equal income distribution.’

gini2010MA <- gini(
  geo_large = 'county',
  geo_small = 'tract',
  state = 'MA',
  year = 2010,
  subgroup = c('NHoLB', 'HoLB')
)

# Obtain the 2010 census tracts from the 'tigris' package
tract2010MA <- tracts(state = 'MA', year = 2010, cb = TRUE)
# Remove first 9 characters from GEOID for compatibility with tigris information
tract2010MA$GEOID <- substring(tract2010MA$GEO_ID, 10) 

# Obtain the 2010 counties from the 'tigris' package
county2010MA <- counties(state = 'MA', year = 2010, cb = TRUE)

# Join the G values to the census tract geometries
MA2010gini <- tract2010MA %>%
  left_join(gini2010MA$g_data, by = 'GEOID')
# Visualize the G values (2006-2010 5-year ACS) for census tracts within counties of Massachusetts
ggplot() +
  geom_sf(
    data = MA2010gini,
    aes(fill = G_inc),
    size = 0.05,
    color = 'transparent'
  ) +
  geom_sf(
    data = county2010MA,
    fill = 'transparent',
    color = 'white',
    size = 0.2
  ) +
  theme_minimal() +
  scale_fill_viridis_c(limits = c(0, 1)) +
  labs(
    fill = 'Index (Continuous)', 
    caption = 'Source: U.S. Census ACS 2006-2010 estimates'
  ) +
  ggtitle(
    'Gini Index (Gini)\nCensus tracts within counties of Massachusetts', 
    subtitle = 'Median Household Income'
  )

Index of Concentration at the Extremes (ICE)

Compute the Index of Concentration at the Extremes values (2006-2010 5-year ACS) for census tracts within Wayne County, Michigan. Wayne County is the home of Detroit, Michigan, a highly segregated city in the U.S. This metric is based on Feldman et al. (2015) and Krieger et al. (2016) who expanded the metric designed by Massey in a chapter of Booth & Crouter (2001) initially designed for residential segregation. The krieger() function computes five ICE metrics using the following ACS groups:

ACS table group ICE metric Comparison
B19001 Income, ‘ICE_inc’ 80th income percentile vs. 20th income percentile
B15002 Education, ‘ICE_edu’ less than high school vs. four-year college degree or more
B03002 Race or Ethnicity, ‘ICE_rewb’ 80th income percentile vs. 20th income percentile
B19001 & B19001B & B19001H Income and race or ethnicity combined, ‘ICE_wbinc’ white non-Hispanic in 80th income percentile vs. black alone (including Hispanic) in 20th income percentile
B19001 & B19001H Income and race or ethnicity combined, ‘ICE_wpcinc’ white non-Hispanic in 80th income percentile vs. white non-Hispanic in 20th income percentile

ICE metrics can range in value from −1 (most deprived) to 1 (most privileged). A value of 0 can thus represent two possibilities: (1) none of the residents are in the most privileged or most deprived categories, or (2) an equal number of persons are in the most privileged and most deprived categories, and in both cases indicates that the area is not dominated by extreme concentrations of either of the two groups.

ice2020WC <- krieger(
  state = 'MI', 
  county = 'Wayne', 
  year = 2010
)

# Obtain the 2010 census tracts from the 'tigris' package
tract2010WC <- tracts(state = 'MI', county = 'Wayne', year = 2010, cb = TRUE)
# Remove first 9 characters from GEOID for compatibility with tigris information
tract2010WC$GEOID <- substring(tract2010WC$GEO_ID, 10) 

# Join the ICE values to the census tract geometries
ice2020WC <- tract2010WC %>%
  left_join(ice2020WC$ice, by = 'GEOID')
# Plot ICE for Income
ggplot() +
  geom_sf(
    data = ice2020WC,
    aes(fill = ICE_inc),
    color = 'white',
    size = 0.05
  ) +
  theme_bw() +
  scale_fill_gradient2(
    low = '#998ec3',
    mid = '#f7f7f7',
    high = '#f1a340',
    limits = c(-1, 1)
  ) +
  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
  ggtitle(
    'Index of Concentration at the Extremes (Krieger)\nIncome',
    subtitle = '80th income percentile vs. 20th income percentile'
  )

# Plot ICE for Education
ggplot() +
  geom_sf(
    data = ice2020WC,
    aes(fill = ICE_edu),
    color = 'white',
    size = 0.05
  ) +
  theme_bw() +
  scale_fill_gradient2(
    low = '#998ec3',
    mid = '#f7f7f7',
    high = '#f1a340',
    limits = c(-1, 1)
  ) +
  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
  ggtitle(
    'Index of Concentration at the Extremes (Krieger)\nEducation',
    subtitle = 'less than high school vs. four-year college degree or more'
  )

# Plot ICE for Race or Ethnicity
ggplot() +
  geom_sf(
    data = ice2020WC,
    aes(fill = ICE_rewb),
    color = 'white',
    size = 0.05
  ) +
  theme_bw() +
  scale_fill_gradient2(
    low = '#998ec3',
    mid = '#f7f7f7',
    high = '#f1a340',
    limits = c(-1, 1)
  ) +
  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
  ggtitle(
    'Index of Concentration at the Extremes (Krieger)\nRace or Ethnicity',
    subtitle = 'white non-Hispanic vs. Black non-Hispanic'
  )

# Plot ICE for Income and Race or Ethnicity Combined
## white non-Hispanic in 80th income percentile vs. 
## black (including Hispanic) in 20th income percentile
ggplot() +
  geom_sf(
    data = ice2020WC,
    aes(fill = ICE_wbinc),
    color = 'white',
    size = 0.05
  ) +
  theme_bw() +
  scale_fill_gradient2(
    low = '#998ec3',
    mid = '#f7f7f7',
    high = '#f1a340',
    limits = c(-1, 1)
  ) +
  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
  ggtitle(
    'Index of Concentration at the Extremes (Krieger)\nIncome & race or ethnicity combined',
    subtitle = 'white non-Hispanic in 80th inc ptcl vs. black alone in 20th inc pctl'
  )

# Plot ICE for Income and Race or Ethnicity Combined
## white non-Hispanic in 80th income percentile vs. white non-Hispanic in 20th income percentile
ggplot() +
  geom_sf(
    data = ice2020WC,
    aes(fill = ICE_wpcinc),
    color = 'white',
    size = 0.05
  ) +
  theme_bw() +
  scale_fill_gradient2(
    low = '#998ec3',
    mid = '#f7f7f7',
    high = '#f1a340',
    limits = c(-1, 1)
  ) +
  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
  ggtitle(
    'Index of Concentration at the Extremes (Krieger)\nIncome & race or ethnicity combined',
    subtitle = 'white non-Hispanic (WNH) in 80th inc pctl vs. WNH in 20th inc pctl'
  )

sessionInfo()
## R version 4.4.1 (2024-06-14 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] tigris_2.1       tidycensus_1.6.5 sf_1.0-16        ndi_0.1.6.9008  
## [5] ggplot2_3.5.1    dplyr_1.1.4      knitr_1.48      
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.5       xfun_0.47          bslib_0.8.0        psych_2.4.6.26    
##  [5] lattice_0.22-6     tzdb_0.4.0         Cairo_1.6-2        vctrs_0.6.5       
##  [9] tools_4.4.1        generics_0.1.3     curl_5.2.2         parallel_4.4.1    
## [13] tibble_3.2.1       proxy_0.4-27       fansi_1.0.6        highr_0.11        
## [17] pkgconfig_2.0.3    Matrix_1.7-0       KernSmooth_2.23-24 uuid_1.2-1        
## [21] lifecycle_1.0.4    farver_2.1.2       compiler_4.4.1     stringr_1.5.1     
## [25] munsell_0.5.1      mnormt_2.1.1       carData_3.0-5      htmltools_0.5.8.1 
## [29] class_7.3-22       sass_0.4.9         yaml_2.3.10        pillar_1.9.0      
## [33] car_3.1-2          crayon_1.5.3       jquerylib_0.1.4    tidyr_1.3.1       
## [37] MASS_7.3-61        classInt_0.4-10    cachem_1.1.0       abind_1.4-5       
## [41] nlme_3.1-166       tidyselect_1.2.1   rvest_1.0.4        digest_0.6.36     
## [45] stringi_1.8.4      purrr_1.0.2        labeling_0.4.3     fastmap_1.2.0     
## [49] grid_4.4.1         colorspace_2.1-1   cli_3.6.3          magrittr_2.0.3    
## [53] utf8_1.2.4         e1071_1.7-14       readr_2.1.5        withr_3.0.1       
## [57] scales_1.3.0       rappdirs_0.3.3     rmarkdown_2.28     httr_1.4.7        
## [61] hms_1.1.3          evaluate_0.24.0    viridisLite_0.4.2  rlang_1.1.4       
## [65] Rcpp_1.0.13        glue_1.7.0         DBI_1.2.3          xml2_1.3.6        
## [69] rstudioapi_0.16.0  jsonlite_1.8.8     R6_2.5.1           units_0.8-5