Start with the necessary packages for the vignette.
loadedPackages <- c('dplyr', 'ggplot2', 'ndi', 'sf', 'tidycensus', 'tigris')
invisible(lapply(loadedPackages, library, character.only = TRUE))
options(tigris_use_cache = TRUE)
Set your U.S. Census Bureau access key. Follow this link to
obtain one. Specify your access key in the functions below using the
key
argument of the get_acs()
function from
the tidycensus
package called within each or by using the census_api_key()
function from the tidycensus
package before running the functions.
Since version v0.1.1, the ndi package can use data from the ACS to compute additional indices of socioeconomic disparity, including:
atkinson()
function also computes the Atkinson Index
(A) of income based on Atkinson
(1970)bravo()
function that computes the Educational
Isolation Index (EI) based on Bravo et
al. (2021)gini()
function also retrieves the Gini Index
(G) of income inequality based on Gini (1921)krieger()
function that computes the Index of
Concentration at the Extremes (ICE) based on based on Feldman et
al. (2015) and Krieger et
al. (2016)Compute the income A values (2017-2021 5-year ACS) for
census block groups within counties of Kentucky. This metric is based on
Atkinson (1970) that
assessed the distribution of income within 12 counties. To compare
median household income, specify subgroup = 'MedHHInc'
which will use the ACS variable ‘B19013_001’ in the computation and uses
the Hölder mean. A is a measure of the inequality when
comparing smaller geographical units to larger ones within which the
smaller geographical units are located. A can range in value
from 0 to 1 and smaller values of the index indicate lower levels of
income inequality.
A is sensitive to the choice of epsilon
argument or the shape parameter that determines how to weight the
increments to inequality contributed by different proportions of the
Lorenz curve. A user must explicitly decide how heavily to weight
smaller geographical units at different points on the Lorenz curve
(i.e., whether the index should take greater account of differences
among areas of over- or under-representation). The epsilon
argument must have values between 0 and 1.0. For
0 <= epsilon < 0.5
or less ‘inequality-averse,’
smaller geographical units with a subgroup proportion smaller than the
subgroup proportion of the larger geographical unit contribute more to
inequality (‘over-representation’). For
0.5 < epsilon <= 1.0
or more ‘inequality-averse,’
smaller geographical units with a subgroup proportion larger than the
subgroup proportion of the larger geographical unit contribute more to
inequality (‘under-representation’). If epsilon = 0.5
(the
default), units of over- and under-representation contribute equally to
the index. See Section 2.3 of Saint-Jacques et
al. (2020) for one method to select epsilon
. We choose
epsilon = 0.67
in the example below:
atkinson2021KY <- atkinson(
geo_large = 'county',
geo_small = 'block group',
state = 'KY',
year = 2021,
subgroup = 'MedHHInc',
epsilon = 0.33
)
# Obtain the 2021 counties from the 'tigris' package
county2021KY <- counties(state = 'KY', year = 2021, cb = TRUE)
# Join the A values to the county geometries
KY2021atkinson <- county2021KY %>%
left_join(atkinson2021KY$a, by = 'GEOID')
# Visualize the A values (2017-2021 5-year ACS) for census block groups within counties of Kentucky
ggplot() +
geom_sf(
data = KY2021atkinson,
aes(fill = A),
size = 0.05,
color = 'white'
) +
geom_sf(
data = county2021KY,
fill = 'transparent',
color = 'white',
size = 0.2
) +
theme_minimal() +
scale_fill_viridis_c() +
labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2017-2021 estimates') +
ggtitle(
'Atkinson Index (Atkinson)\nCensus block groups within counties of Kentucky',
subtitle = expression(paste('Median Household Income (', epsilon, ' = 0.33)'))
)
Compute the spatial EI (Bravo) values (2006-2010 5-year ACS)
for census tracts of Oklahoma. This metric is based on Bravo et al. (2021)
that assessed the educational isolation of the population without a
four-year college degree. Multiple educational attainment categories are
available in the bravo()
function, including:
ACS table source | educational attainment category | character for subgroup argument |
---|---|---|
B06009_002 | less than high school graduate | LtHS |
B06009_003 | high school graduate (includes equivalency) | HSGiE |
B06009_004 | some college or associate’s degree | SCoAD |
B06009_005 | Bachelor’s degree | BD |
B06009_006 | graduate or professional degree | GoPD |
Note: The ACS-5 data (2005-2009) uses the ‘B15002’ question.
A census geography (and its neighbors) that has nearly all of its population with the specified educational attainment category (e.g., a four-year college degree or more) will have an EI (Bravo) value close to 1. In contrast, a census geography (and its neighbors) that is nearly none of its population with the specified educational attainment category (e.g., with a four-year college degree) will have an EI (Bravo) value close to 0.
bravo2010OK <- bravo(state = 'OK', year = 2010, subgroup = c('LtHS', 'HSGiE', 'SCoAD'))
# Obtain the 2010 census tracts from the 'tigris' package
tract2010OK <- tracts(state = 'OK', year = 2010, cb = TRUE)
# Remove first 9 characters from GEOID for compatibility with tigris information
tract2010OK$GEOID <- substring(tract2010OK$GEO_ID, 10)
# Obtain the 2010 counties from the 'tigris' package
county2010OK <- counties(state = 'OK', year = 2010, cb = TRUE)
# Join the EI values to the census tract geometries
OK2010bravo <- tract2010OK %>%
left_join(bravo2010OK$ei, by = 'GEOID')
# Visualize the EI values (2006-2010 5-year ACS) for census tracts of Oklahoma
ggplot() +
geom_sf(
data = OK2010bravo,
aes(fill = EI),
size = 0.05,
color = 'transparent'
) +
geom_sf(
data = county2010OK,
fill = 'transparent',
color = 'white',
size = 0.2
) +
theme_minimal() +
scale_fill_viridis_c(limits = c(0, 1)) +
labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
ggtitle(
'Educational Isolation Index (Bravo)\nCensus tracts of Oklahoma',
subtitle = 'Without a four-year college degree (not corrected for edge effects)'
)
Can correct one source of edge effect in the same manner as shown for the RI metric in vignette 2. Racial or Ethnic Residential Segregation Indices
Retrieve the income Gini Index (G) values (2006-2010 5-year
ACS) for census tracts within counties of Massachusetts. This metric is
based on Gini (1921), and
the gini()
function retrieves the estimate from the ACS-5
when calculating the Gini Index (G) for racial or ethnic
inequality.
According to the U.S. Census Bureau: ‘The Gini Index is a summary measure of income inequality. The Gini coefficient incorporates the detailed shares data into a single statistic, which summarizes the dispersion of income across the entire income distribution. The Gini coefficient ranges from 0, indicating perfect equality (where everyone receives an equal share), to 1, perfect inequality (where only one recipient or group of recipients receives all the income). The Gini Index is based on the difference between the Lorenz curve (the observed cumulative income distribution) and the notion of a perfectly equal income distribution.’
gini2010MA <- gini(
geo_large = 'county',
geo_small = 'tract',
state = 'MA',
year = 2010,
subgroup = c('NHoLB', 'HoLB')
)
# Obtain the 2010 census tracts from the 'tigris' package
tract2010MA <- tracts(state = 'MA', year = 2010, cb = TRUE)
# Remove first 9 characters from GEOID for compatibility with tigris information
tract2010MA$GEOID <- substring(tract2010MA$GEO_ID, 10)
# Obtain the 2010 counties from the 'tigris' package
county2010MA <- counties(state = 'MA', year = 2010, cb = TRUE)
# Join the G values to the census tract geometries
MA2010gini <- tract2010MA %>%
left_join(gini2010MA$g_data, by = 'GEOID')
# Visualize the G values (2006-2010 5-year ACS) for census tracts within counties of Massachusetts
ggplot() +
geom_sf(
data = MA2010gini,
aes(fill = G_inc),
size = 0.05,
color = 'transparent'
) +
geom_sf(
data = county2010MA,
fill = 'transparent',
color = 'white',
size = 0.2
) +
theme_minimal() +
scale_fill_viridis_c(limits = c(0, 1)) +
labs(
fill = 'Index (Continuous)',
caption = 'Source: U.S. Census ACS 2006-2010 estimates'
) +
ggtitle(
'Gini Index (Gini)\nCensus tracts within counties of Massachusetts',
subtitle = 'Median Household Income'
)
Compute the Index of Concentration at the Extremes values (2006-2010
5-year ACS) for census tracts within Wayne County, Michigan. Wayne
County is the home of Detroit, Michigan, a highly segregated city in the
U.S. This metric is based on Feldman et
al. (2015) and Krieger et
al. (2016) who expanded the metric designed by Massey in a chapter
of Booth & Crouter
(2001) initially designed for residential segregation. The
krieger()
function computes five ICE metrics using
the following ACS groups:
ACS table group | ICE metric | Comparison |
---|---|---|
B19001 | Income, ‘ICE_inc’ | 80th income percentile vs. 20th income percentile |
B15002 | Education, ‘ICE_edu’ | less than high school vs. four-year college degree or more |
B03002 | Race or Ethnicity, ‘ICE_rewb’ | 80th income percentile vs. 20th income percentile |
B19001 & B19001B & B19001H | Income and race or ethnicity combined, ‘ICE_wbinc’ | white non-Hispanic in 80th income percentile vs. black alone (including Hispanic) in 20th income percentile |
B19001 & B19001H | Income and race or ethnicity combined, ‘ICE_wpcinc’ | white non-Hispanic in 80th income percentile vs. white non-Hispanic in 20th income percentile |
ICE metrics can range in value from −1 (most deprived) to 1 (most privileged). A value of 0 can thus represent two possibilities: (1) none of the residents are in the most privileged or most deprived categories, or (2) an equal number of persons are in the most privileged and most deprived categories, and in both cases indicates that the area is not dominated by extreme concentrations of either of the two groups.
ice2020WC <- krieger(
state = 'MI',
county = 'Wayne',
year = 2010
)
# Obtain the 2010 census tracts from the 'tigris' package
tract2010WC <- tracts(state = 'MI', county = 'Wayne', year = 2010, cb = TRUE)
# Remove first 9 characters from GEOID for compatibility with tigris information
tract2010WC$GEOID <- substring(tract2010WC$GEO_ID, 10)
# Join the ICE values to the census tract geometries
ice2020WC <- tract2010WC %>%
left_join(ice2020WC$ice, by = 'GEOID')
# Plot ICE for Income
ggplot() +
geom_sf(
data = ice2020WC,
aes(fill = ICE_inc),
color = 'white',
size = 0.05
) +
theme_bw() +
scale_fill_gradient2(
low = '#998ec3',
mid = '#f7f7f7',
high = '#f1a340',
limits = c(-1, 1)
) +
labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
ggtitle(
'Index of Concentration at the Extremes (Krieger)\nIncome',
subtitle = '80th income percentile vs. 20th income percentile'
)
# Plot ICE for Education
ggplot() +
geom_sf(
data = ice2020WC,
aes(fill = ICE_edu),
color = 'white',
size = 0.05
) +
theme_bw() +
scale_fill_gradient2(
low = '#998ec3',
mid = '#f7f7f7',
high = '#f1a340',
limits = c(-1, 1)
) +
labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
ggtitle(
'Index of Concentration at the Extremes (Krieger)\nEducation',
subtitle = 'less than high school vs. four-year college degree or more'
)
# Plot ICE for Race or Ethnicity
ggplot() +
geom_sf(
data = ice2020WC,
aes(fill = ICE_rewb),
color = 'white',
size = 0.05
) +
theme_bw() +
scale_fill_gradient2(
low = '#998ec3',
mid = '#f7f7f7',
high = '#f1a340',
limits = c(-1, 1)
) +
labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
ggtitle(
'Index of Concentration at the Extremes (Krieger)\nRace or Ethnicity',
subtitle = 'white non-Hispanic vs. Black non-Hispanic'
)
# Plot ICE for Income and Race or Ethnicity Combined
## white non-Hispanic in 80th income percentile vs.
## black (including Hispanic) in 20th income percentile
ggplot() +
geom_sf(
data = ice2020WC,
aes(fill = ICE_wbinc),
color = 'white',
size = 0.05
) +
theme_bw() +
scale_fill_gradient2(
low = '#998ec3',
mid = '#f7f7f7',
high = '#f1a340',
limits = c(-1, 1)
) +
labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
ggtitle(
'Index of Concentration at the Extremes (Krieger)\nIncome & race or ethnicity combined',
subtitle = 'white non-Hispanic in 80th inc ptcl vs. black alone in 20th inc pctl'
)
# Plot ICE for Income and Race or Ethnicity Combined
## white non-Hispanic in 80th income percentile vs. white non-Hispanic in 20th income percentile
ggplot() +
geom_sf(
data = ice2020WC,
aes(fill = ICE_wpcinc),
color = 'white',
size = 0.05
) +
theme_bw() +
scale_fill_gradient2(
low = '#998ec3',
mid = '#f7f7f7',
high = '#f1a340',
limits = c(-1, 1)
) +
labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
ggtitle(
'Index of Concentration at the Extremes (Krieger)\nIncome & race or ethnicity combined',
subtitle = 'white non-Hispanic (WNH) in 80th inc pctl vs. WNH in 20th inc pctl'
)
## R version 4.4.1 (2024-06-14 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 10 x64 (build 19045)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_United States.utf8
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] tigris_2.1 tidycensus_1.6.5 sf_1.0-16 ndi_0.1.6.9008
## [5] ggplot2_3.5.1 dplyr_1.1.4 knitr_1.48
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.5 xfun_0.47 bslib_0.8.0 psych_2.4.6.26
## [5] lattice_0.22-6 tzdb_0.4.0 Cairo_1.6-2 vctrs_0.6.5
## [9] tools_4.4.1 generics_0.1.3 curl_5.2.2 parallel_4.4.1
## [13] tibble_3.2.1 proxy_0.4-27 fansi_1.0.6 highr_0.11
## [17] pkgconfig_2.0.3 Matrix_1.7-0 KernSmooth_2.23-24 uuid_1.2-1
## [21] lifecycle_1.0.4 farver_2.1.2 compiler_4.4.1 stringr_1.5.1
## [25] munsell_0.5.1 mnormt_2.1.1 carData_3.0-5 htmltools_0.5.8.1
## [29] class_7.3-22 sass_0.4.9 yaml_2.3.10 pillar_1.9.0
## [33] car_3.1-2 crayon_1.5.3 jquerylib_0.1.4 tidyr_1.3.1
## [37] MASS_7.3-61 classInt_0.4-10 cachem_1.1.0 abind_1.4-5
## [41] nlme_3.1-166 tidyselect_1.2.1 rvest_1.0.4 digest_0.6.36
## [45] stringi_1.8.4 purrr_1.0.2 labeling_0.4.3 fastmap_1.2.0
## [49] grid_4.4.1 colorspace_2.1-1 cli_3.6.3 magrittr_2.0.3
## [53] utf8_1.2.4 e1071_1.7-14 readr_2.1.5 withr_3.0.1
## [57] scales_1.3.0 rappdirs_0.3.3 rmarkdown_2.28 httr_1.4.7
## [61] hms_1.1.3 evaluate_0.24.0 viridisLite_0.4.2 rlang_1.1.4
## [65] Rcpp_1.0.13 glue_1.7.0 DBI_1.2.3 xml2_1.3.6
## [69] rstudioapi_0.16.0 jsonlite_1.8.8 R6_2.5.1 units_0.8-5