Start with the necessary packages for the vignette.
loadedPackages <- c('dplyr', 'ggplot2', 'ndi', 'sf', 'tidycensus', 'tigris')
invisible(lapply(loadedPackages, library, character.only = TRUE))
options(tigris_use_cache = TRUE)
Set your U.S. Census Bureau access key. Follow this link to
obtain one. Specify your access key in the messer()
or
powell_wiley()
functions using the key
argument of the get_acs()
function from the tidycensus
package called within each or by using the census_api_key()
function from the tidycensus
package before running the messer()
or
powell_wiley()
functions (see an example of the latter
below).
Compute the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, U.S.A., census tracts. This metric is based on Messer et al. (2006) with the following socio-economic status (SES) variables:
Characteristic | SES dimension | ACS table source | Description |
---|---|---|---|
OCC | Occupation | C24030 | Percent males in management, science, and arts occupation |
CWD | Housing | B25014 | Percent of crowded housing |
POV | Poverty | B17017 | Percent of households in poverty |
FHH | Poverty | B25115 | Percent of female headed households with dependents |
PUB | Poverty | B19058 | Percent of households on public assistance |
U30 | Poverty | B19001 | Percent households earning <$30,000 per year |
EDU | Education | B06009 | Percent earning less than a high school education |
EMP | Employment | B23001 (2010 only); B23025 (2011 onward) | Percent unemployed |
One output from the messer()
function is a tibble
containing the identification, geographic name, NDI (Messer)
values, and raw census characteristics for each tract.
## # A tibble: 1,969 × 14
## GEOID state county tract NDI NDIQuart OCC CWD POV FHH PUB U30
## <chr> <chr> <chr> <chr> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1300… Geor… Appli… 9501 -0.0075 2-Below… 0 0 0.1 0.1 0.1 0.3
## 2 1300… Geor… Appli… 9502 0.0458 4-Most … 0 0 0.3 0.1 0.2 0.5
## 3 1300… Geor… Appli… 9503 0.0269 3-Above… 0 0 0.2 0 0.2 0.4
## 4 1300… Geor… Appli… 9504 -0.0083 2-Below… 0 0 0.1 0 0.1 0.3
## 5 1300… Geor… Appli… 9505 0.0231 3-Above… 0 0 0.2 0 0.2 0.4
## 6 1300… Geor… Atkin… 9601 0.0619 4-Most … 0 0.1 0.2 0.2 0.2 0.5
## 7 1300… Geor… Atkin… 9602 0.0593 4-Most … 0 0.1 0.3 0.1 0.2 0.4
## 8 1300… Geor… Atkin… 9603 0.0252 3-Above… 0 0 0.3 0.1 0.2 0.4
## 9 1300… Geor… Bacon… 9701 0.0061 3-Above… 0 0 0.2 0 0.2 0.4
## 10 1300… Geor… Bacon… 9702… 0.0121 3-Above… 0 0 0.2 0.1 0.1 0.5
## # ℹ 1,959 more rows
## # ℹ 2 more variables: EDU <dbl>, EMP <dbl>
A second output from the messer()
function is the
results from the principal component analysis used to compute the
NDI (Messer) values.
## Principal Components Analysis
## Call: psych::principal(r = ndi_data_pca, nfactors = 1, n.obs = nrow(ndi_data_pca),
## covar = FALSE, scores = TRUE, missing = imp)
## Standardized loadings (pattern matrix) based upon correlation matrix
## PC1 h2 u2 com
## OCC -0.59 0.35 0.65 1
## CWD 0.47 0.22 0.78 1
## POV 0.87 0.76 0.24 1
## FHH 0.67 0.45 0.55 1
## PUB 0.89 0.79 0.21 1
## U30 0.90 0.81 0.19 1
## EDU 0.79 0.62 0.38 1
## EMP 0.46 0.21 0.79 1
##
## PC1
## SS loadings 4.21
## Proportion Var 0.53
##
## Mean item complexity = 1
## Test of the hypothesis that 1 component is sufficient.
##
## The root mean square of the residuals (RMSR) is 0.11
## with the empirical chi square 1221.09 with prob < 2.3e-246
##
## Fit based upon off diagonal values = 0.95
A third output from the messer()
function is a tibble
containing a breakdown of the missingness of the census characteristics
used to compute the NDI (Messer) values.
## # A tibble: 8 × 4
## variable total n_missing percent_missing
## <chr> <int> <int> <chr>
## 1 CWD 1969 14 0.71 %
## 2 EDU 1969 13 0.66 %
## 3 EMP 1969 13 0.66 %
## 4 FHH 1969 14 0.71 %
## 5 OCC 1969 15 0.76 %
## 6 POV 1969 14 0.71 %
## 7 PUB 1969 14 0.71 %
## 8 U30 1969 14 0.71 %
We can visualize the NDI (Messer) values geographically by linking them to spatial information from the [tigris](tidycensus package and plotting with the [ggplot2](tidycensus package suite.
# Obtain the 2010 counties from the 'tigris' package
county2010GA <- counties(state = 'GA', year = 2010, cb = TRUE)
# Remove first 9 characters from GEOID for compatibility with tigris information
county2010GA$GEOID <- substring(county2010GA$GEO_ID, 10)
# Obtain the 2010 census tracts from the 'tigris' package
tract2010GA <- tracts(state = 'GA', year = 2010, cb = TRUE)
# Remove first 9 characters from GEOID for compatibility with tigris information
tract2010GA$GEOID <- substring(tract2010GA$GEO_ID, 10)
# Join the NDI (Messer) values to the census tract geometry
GA2010messer <- tract2010GA %>%
left_join(messer2010GA$ndi, by = 'GEOID')
# Visualize the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, U.S.A., census tracts
## Continuous Index
ggplot() +
geom_sf(
data = GA2010messer,
aes(fill = NDI),
size = 0.05,
color = 'transparent'
) +
geom_sf(
data = county2010GA,
fill = 'transparent',
color = 'white',
size = 0.2
) +
theme_minimal() +
scale_fill_viridis_c() +
labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
ggtitle(
'Neighborhood Deprivation Index (Messer)',
subtitle = 'GA census tracts as the referent'
)
## Categorical Index
### Rename '9-NDI not avail' level as NA for plotting
GA2010messer$NDIQuartNA <-
factor(
replace(
as.character(GA2010messer$NDIQuart),
GA2010messer$NDIQuart == '9-NDI not avail',
NA
),
c(levels(GA2010messer$NDIQuart)[-5], NA)
)
ggplot() +
geom_sf(
data = GA2010messer,
aes(fill = NDIQuartNA),
size = 0.05,
color = 'transparent'
) +
geom_sf(
data = county2010GA,
fill = 'transparent',
color = 'white',
size = 0.2
) +
theme_minimal() +
scale_fill_viridis_d(guide = guide_legend(reverse = TRUE), na.value = 'grey80') +
labs(fill = 'Index (Categorical)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
ggtitle(
'Neighborhood Deprivation Index (Messer) Quartiles',
subtitle = 'GA census tracts as the referent'
)
The results above are at the tract level. The NDI (Messer) values can also be calculated at the county level.
messer2010GA_county <- messer(geo = 'county', state = 'GA', year = 2010)
# Join the NDI (Messer) values to the county geometry
GA2010messer_county <- county2010GA %>%
left_join(messer2010GA_county$ndi, by = 'GEOID')
# Visualize the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, U.S.A., counties
## Continuous Index
ggplot() +
geom_sf(
data = GA2010messer_county,
aes(fill = NDI),
size = 0.20,
color = 'white'
) +
theme_minimal() +
scale_fill_viridis_c() +
labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
ggtitle(
'Neighborhood Deprivation Index (Messer)',
subtitle = 'GA counties as the referent'
)
## Categorical Index
### Rename '9-NDI not avail' level as NA for plotting
GA2010messer_county$NDIQuartNA <-
factor(
replace(
as.character(GA2010messer_county$NDIQuart),
GA2010messer_county$NDIQuart == '9-NDI not avail',
NA
),
c(levels(GA2010messer_county$NDIQuart)[-5], NA)
)
ggplot() +
geom_sf(
data = GA2010messer_county,
aes(fill = NDIQuartNA),
size = 0.20,
color = 'white'
) +
theme_minimal() +
scale_fill_viridis_d(guide = guide_legend(reverse = TRUE), na.value = 'grey80') +
labs(fill = 'Index (Categorical)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
ggtitle(
'Neighborhood Deprivation Index (Messer) Quartiles',
subtitle = 'GA counties as the referent'
)
Compute the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for Maryland, Virginia, Washington, D.C., and West Virginia, U.S.A., census tracts. This metric is based on Andrews et al. (2020) and Slotman et al. (2022) with socio-economic status (SES) variables chosen by Roux and Mair (2010):
Characteristic | SES dimension | ACS table source | Description |
---|---|---|---|
MedHHInc | Wealth and income | B19013 | Median household income (dollars) |
PctRecvIDR | Wealth and income | B19054 | Percent of households receiving dividends, interest, or rental income |
PctPubAsst | Wealth and income | B19058 | Percent of households receiving public assistance |
MedHomeVal | Wealth and income | B25077 | Median home value (dollars) |
PctMgmtBusSciArt | Occupation | C24060 | Percent in a management, business, science, or arts occupation |
PctFemHeadKids | Occupation | B11005 | Percent of households that are female headed with any children under 18 years |
PctOwnerOcc | Housing conditions | DP04 | Percent of housing units that are owner occupied |
PctNoPhone | Housing conditions | DP04 | Percent of households without a telephone |
PctNComPlmb | Housing conditions | DP04 | Percent of households without complete plumbing facilities |
PctEducHSPlus | Education | S1501 | Percent with a high school degree or higher (population 25 years and over) |
PctEducBchPlus | Education | S1501 | Percent with a college degree or higher (population 25 years and over) |
PctFamBelowPov | Wealth and income | S1702 | Percent of families with incomes below the poverty level |
PctUnempl | Occupation | S2301 | Percent unemployed |
More information about the codebook and computation of the NDI (Powell-Wiley) can be found on a GIS Portal for Cancer Research website.
powell_wiley2020DMVW <- powell_wiley(
state = c('DC', 'MD', 'VA', 'WV'),
year = 2020,
round_output = TRUE
)
One output from the powell_wiley()
function is a tibble
containing the identification, geographic name, NDI
(Powell-Wiley) values, and raw census characteristics for each
tract.
## # A tibble: 4,425 × 20
## GEOID state county tract NDI NDIQuint MedHHInc PctRecvIDR PctPubAsst
## <chr> <chr> <chr> <chr> <dbl> <fct> <dbl> <dbl> <dbl>
## 1 11001000101 Distr… Distr… 1.01 -2.13 1-Least… 187839 50.9 0.8
## 2 11001000102 Distr… Distr… 1.02 -2.46 1-Least… 184167 52.2 0.6
## 3 11001000201 Distr… Distr… 2.01 NA 9-NDI n… NA NaN NaN
## 4 11001000202 Distr… Distr… 2.02 -2.30 1-Least… 164261 49.6 0.9
## 5 11001000300 Distr… Distr… 3 -2.06 1-Least… 156483 46 0.6
## 6 11001000400 Distr… Distr… 4 -2.09 1-Least… 153397 47.8 0
## 7 11001000501 Distr… Distr… 5.01 -2.11 1-Least… 119911 44.5 0.8
## 8 11001000502 Distr… Distr… 5.02 -2.21 1-Least… 153264 46.8 0.5
## 9 11001000600 Distr… Distr… 6 -2.16 1-Least… 154266 60.8 7.4
## 10 11001000702 Distr… Distr… 7.02 -1.20 1-Least… 71747 22.9 0
## # ℹ 4,415 more rows
## # ℹ 11 more variables: MedHomeVal <dbl>, PctMgmtBusScArti <dbl>,
## # PctFemHeadKids <dbl>, PctOwnerOcc <dbl>, PctNoPhone <dbl>,
## # PctNComPlmb <dbl>, PctEducHSPlus <dbl>, PctEducBchPlus <dbl>,
## # PctFamBelowPov <dbl>, PctUnempl <dbl>, TotalPop <dbl>
A second output from the powell_wiley()
function is the
results from the principal component analysis used to compute the
NDI (Powell-Wiley) values.
## $loadings
##
## Loadings:
## PC1 PC2 PC3
## logMedHHInc -0.638 -0.364
## PctNoIDRZ 0.612 0.319
## PctPubAsstZ 0.379 0.615
## logMedHomeVal -0.893
## PctWorkClassZ 0.974
## PctFemHeadKidsZ 0.128 0.697 -0.233
## PctNotOwnerOccZ -0.375 0.923
## PctNoPhoneZ 0.329 0.524
## PctNComPlmbZ -0.141 0.869
## PctEducLTHSZ 0.642 0.164
## PctEducLTBchZ 1.020 -0.121
## PctFamBelowPovZ 0.219 0.698
## PctUnemplZ 0.596
##
## PC1 PC2 PC3
## SS loadings 4.340 2.971 1.102
## Proportion Var 0.334 0.229 0.085
## Cumulative Var 0.334 0.562 0.647
##
## $rotmat
## [,1] [,2] [,3]
## [1,] 0.68516447 0.4403017 0.04517229
## [2,] -0.95446519 1.0806634 0.03196818
## [3,] -0.09078904 -0.2028145 1.02053725
##
## $rotation
## [1] "promax"
##
## $Phi
## [,1] [,2] [,3]
## [1,] 1.0000000 0.5250277 0.1649550
## [2,] 0.5250277 1.0000000 0.1923867
## [3,] 0.1649550 0.1923867 1.0000000
##
## $Structure
## [,1] [,2] [,3]
## [1,] -0.8432264 -0.71542812 -0.26037289
## [2,] 0.7750210 0.63477178 0.13357439
## [3,] 0.7001489 0.81237803 0.17181801
## [4,] -0.8931597 -0.46211477 -0.19777858
## [5,] 0.9253217 0.42037509 0.12424330
## [6,] 0.4559503 0.71962940 -0.07780352
## [7,] 0.1195748 0.73799969 0.17788881
## [8,] 0.2552300 0.42753150 0.58693047
## [9,] 0.1155170 0.05008712 0.84948765
## [10,] 0.7325510 0.50620197 0.16403994
## [11,] 0.9557341 0.41335682 0.13787215
## [12,] 0.5881663 0.81650181 0.18717861
## [13,] 0.3841044 0.63036827 0.09925687
##
## $communality
## [1] 0.5468870 0.4774810 0.5220533 0.8012746 0.9576793 0.5566938 0.9968490
## [8] 0.3829550 0.7774209 0.4398519 1.0560478 0.5359573 0.3616523
##
## $uniqueness
## logMedHHInc PctNoIDRZ PctPubAsstZ logMedHomeVal PctWorkClassZ
## 0.453113047 0.522518997 0.477946655 0.198725386 0.042320711
## PctFemHeadKidsZ PctNotOwnerOccZ PctNoPhoneZ PctNComPlmbZ PctEducLTHSZ
## 0.443306153 0.003150984 0.617045044 0.222579117 0.560148142
## PctEducLTBchZ PctFamBelowPovZ PctUnemplZ
## -0.056047809 0.464042693 0.638347726
##
## $Vaccounted
## [,1] [,2] [,3]
## SS loadings 4.5987837 3.2131788 1.10386926
## Proportion Var 0.3537526 0.2471676 0.08491302
## Cumulative Var 0.3537526 0.6009202 0.68583321
## Proportion Explained 0.5157997 0.3603902 0.12381002
## Cumulative Proportion 0.5157997 0.8761900 1.00000000
A third output from the powell_wiley()
function is a
tibble containing a breakdown of the missingness of the census
characteristics used to compute the NDI (Powell-Wiley)
values.
## # A tibble: 13 × 4
## variable total n_missing percent_missing
## <chr> <int> <int> <chr>
## 1 PctEducLTBchZ 4425 47 1.06 %
## 2 PctEducLTHSZ 4425 47 1.06 %
## 3 PctFamBelowPovZ 4425 63 1.42 %
## 4 PctFemHeadKidsZ 4425 60 1.36 %
## 5 PctNComPlmbZ 4425 60 1.36 %
## 6 PctNoIDRZ 4425 60 1.36 %
## 7 PctNoPhoneZ 4425 60 1.36 %
## 8 PctNotOwnerOccZ 4425 60 1.36 %
## 9 PctPubAsstZ 4425 60 1.36 %
## 10 PctUnemplZ 4425 57 1.29 %
## 11 PctWorkClassZ 4425 57 1.29 %
## 12 logMedHHInc 4425 73 1.65 %
## 13 logMedHomeVal 4425 148 3.34 %
A fourth output from the powell_wiley()
function is a
character string or numeric value of a standardized Cronbach’s alpha. A
value greater than 0.7 is desired.
## [1] 0.9321693
We can visualize the NDI (Powell-Wiley) values geographically by linking them to spatial information from the [tigris](tidycensus package and plotting with the [ggplot2](tidycensus package suite.
# Obtain the 2020 counties from the 'tigris' package
county2020 <- counties(cb = TRUE)
county2020DMVW <- county2020[county2020$STUSPS %in% c('DC', 'MD', 'VA', 'WV'), ]
# Obtain the 2020 census tracts from the 'tigris' package
tract2020D <- tracts(state = 'DC', year = 2020, cb = TRUE)
tract2020M <- tracts(state = 'MD', year = 2020, cb = TRUE)
tract2020V <- tracts(state = 'VA', year = 2020, cb = TRUE)
tract2020W <- tracts(state = 'WV', year = 2020, cb = TRUE)
tracts2020DMVW <- rbind(tract2020D, tract2020M, tract2020V, tract2020W)
# Join the NDI (Powell-Wiley) values to the census tract geometry
DMVW2020pw <- tracts2020DMVW %>%
left_join(powell_wiley2020DMVW$ndi, by = 'GEOID')
# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS)
## Maryland, Virginia, Washington, D.C., and West Virginia, U.S.A., census tracts
## Continuous Index
ggplot() +
geom_sf(
data = DMVW2020pw,
aes(fill = NDI),
color = NA
) +
geom_sf(
data = county2020DMVW,
fill = 'transparent',
color = 'white'
) +
theme_minimal() +
scale_fill_viridis_c(na.value = 'grey80') +
labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2016-2020 estimates') +
ggtitle(
'Neighborhood Deprivation Index (Powell-Wiley)',
subtitle = 'DC, MD, VA, and WV tracts as the referent'
)
## Categorical Index (Population-weighted quintiles)
### Rename '9-NDI not avail' level as NA for plotting
DMVW2020pw$NDIQuintNA <-
factor(replace(
as.character(DMVW2020pw$NDIQuint),
DMVW2020pw$NDIQuint == '9-NDI not avail',
NA
),
c(levels(DMVW2020pw$NDIQuint)[-6], NA))
ggplot() +
geom_sf(data = DMVW2020pw, aes(fill = NDIQuintNA), color = NA) +
geom_sf(data = county2020DMVW, fill = 'transparent', color = 'white') +
theme_minimal() +
scale_fill_viridis_d(guide = guide_legend(reverse = TRUE), na.value = 'grey80') +
labs(fill = 'Index (Categorical)', caption = 'Source: U.S. Census ACS 2016-2020 estimates') +
ggtitle(
'Neighborhood Deprivation Index (Powell-Wiley) Population-weighted Quintiles',
subtitle = 'DC, MD, VA, and WV tracts as the referent'
)
Like the NDI (Messer), we also compute county-level NDI (Powell-Wiley).
# Obtain the 2020 counties from the 'tigris' package
county2020DMVW <- counties(state = c('DC', 'MD', 'VA', 'WV'), year = 2020, cb = TRUE)
# NDI (Powell-Wiley) at the county level (2016-2020)
powell_wiley2020DMVW_county <- powell_wiley(
geo = 'county',
state = c('DC', 'MD', 'VA', 'WV'),
year = 2020
)
# Join the NDI (Powell-Wiley) values to the county geometry
DMVW2020pw_county <- county2020DMVW %>%
left_join(powell_wiley2020DMVW_county$ndi, by = 'GEOID')
# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS)
## Maryland, Virginia, Washington, D.C., and West Virginia, U.S.A., counties
## Continuous Index
ggplot() +
geom_sf(
data = DMVW2020pw_county,
aes(fill = NDI),
size = 0.20,
color = 'white'
) +
theme_minimal() +
scale_fill_viridis_c() +
labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2016-2020 estimates') +
ggtitle(
'Neighborhood Deprivation Index (Powell-Wiley)',
subtitle = 'DC, MD, VA, and WV counties as the referent'
)
## Categorical Index
### Rename '9-NDI not avail' level as NA for plotting
DMVW2020pw_county$NDIQuintNA <-
factor(
replace(
as.character(DMVW2020pw_county$NDIQuint),
DMVW2020pw_county$NDIQuint == '9-NDI not avail',
NA
),
c(levels(DMVW2020pw_county$NDIQuint)[-6], NA)
)
ggplot() +
geom_sf(
data = DMVW2020pw_county,
aes(fill = NDIQuint),
size = 0.20,
color = 'white'
) +
theme_minimal() +
scale_fill_viridis_d(guide = guide_legend(reverse = TRUE), na.value = 'grey80') +
labs(fill = 'Index (Categorical)', caption = 'Source: U.S. Census ACS 2016-2020 estimates') +
ggtitle(
'Neighborhood Deprivation Index (Powell-Wiley) Population-weighted Quintiles',
subtitle = 'DC, MD, VA, and WV counties as the referent'
)
In the messer()
and powell_wiley()
functions, missing census characteristics can be imputed using the
missing
and impute
arguments of the
pca()
function in the psych
package called within the messer()
and
powell_wiley()
functions. Impute values using the logical
imp
argument (currently only calls
impute = 'median'
by default, which assigns the median
values of each missing census variable for a geography).
powell_wiley2020DC <- powell_wiley(state = 'DC', year = 2020) # without imputation
powell_wiley2020DCi <- powell_wiley(state = 'DC', year = 2020, imp = TRUE) # with imputation
table(is.na(powell_wiley2020DC$ndi$NDI)) # n=13 tracts without NDI (Powell-Wiley) values
table(is.na(powell_wiley2020DCi$ndi$NDI)) # n=0 tracts without NDI (Powell-Wiley) values
# Obtain the 2020 census tracts from the 'tigris' package
tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE)
# Join the NDI (Powell-Wiley) values to the census tract geometry
DC2020pw <- tract2020DC %>%
left_join(powell_wiley2020DC$ndi, by = 'GEOID')
DC2020pw <- DC2020pw %>%
left_join(powell_wiley2020DCi$ndi, by = 'GEOID', suffix = c('_nonimp', '_imp'))
# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for
## Washington, D.C., census tracts
## Continuous Index
ggplot() +
geom_sf(
data = DC2020pw,
aes(fill = NDI_nonimp),
size = 0.2,
color = 'white'
) +
theme_minimal() +
scale_fill_viridis_c() +
labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2016-2020 estimates') +
ggtitle(
'Neighborhood Deprivation Index (Powell-Wiley), Non-Imputed',
subtitle = 'DC census tracts as the referent'
)
ggplot() +
geom_sf(
data = DC2020pw,
aes(fill = NDI_imp),
size = 0.2,
color = 'white'
) +
theme_minimal() +
scale_fill_viridis_c() +
labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2016-2020 estimates') +
ggtitle(
'Neighborhood Deprivation Index (Powell-Wiley), Imputed',
subtitle = 'DC census tracts as the referent'
)
## Categorical Index
### Rename '9-NDI not avail' level as NA for plotting
DC2020pw$NDIQuintNA_nonimp <-
factor(
replace(
as.character(DC2020pw$NDIQuint_nonimp),
DC2020pw$NDIQuint_nonimp == '9-NDI not avail',
NA
),
c(levels(DC2020pw$NDIQuint_nonimp)[-6], NA)
)
ggplot() +
geom_sf(
data = DC2020pw,
aes(fill = NDIQuintNA_nonimp),
size = 0.2,
color = 'white'
) +
theme_minimal() +
scale_fill_viridis_d(guide = guide_legend(reverse = TRUE), na.value = 'grey80') +
labs(fill = 'Index (Categorical)', caption = 'Source: U.S. Census ACS 2016-2020 estimates') +
ggtitle(
'Neighborhood Deprivation Index (Powell-Wiley) Quintiles, Non-Imputed',
subtitle = 'DC census tracts as the referent'
)
### Rename '9-NDI not avail' level as NA for plotting
DC2020pw$NDIQuintNA_imp <-
factor(
replace(
as.character(DC2020pw$NDIQuint_imp),
DC2020pw$NDIQuint_imp == '9-NDI not avail',
NA
),
c(levels(DC2020pw$NDIQuint_imp)[-6], NA)
)
ggplot() +
geom_sf(
data = DC2020pw,
aes(fill = NDIQuintNA_imp),
size = 0.2,
color = 'white'
) +
theme_minimal() +
scale_fill_viridis_d(guide = guide_legend(reverse = TRUE), na.value = 'grey80') +
labs(fill = 'Index (Categorical)', caption = 'Source: U.S. Census ACS 2016-2020 estimates') +
ggtitle(
'Neighborhood Deprivation Index (Powell-Wiley) Quintiles, Imputed',
subtitle = 'DC census tracts as the referent'
)
To conduct a contiguous US-standardized index, compute an
NDI for all states as in the example below that replicates the
nationally standardized NDI (Powell-Wiley) values (2013-2017
ACS-5) found in Slotman et
al. (2022) and available from a GIS Portal for
Cancer Research website. To replicate the nationally standardized
NDI (Powell-Wiley) values (2006-2010 ACS-5) found in Andrews et
al. (2020) change the year
argument to
2010
(i.e., year = 2010
).
us <- states()
n51 <- c(
'Commonwealth of the Northern Mariana Islands',
'Guam',
'American Samoa',
'Puerto Rico',
'United States Virgin Islands'
)
y51 <- us$STUSPS[!(us$NAME %in% n51)]
start_time <- Sys.time() # record start time
powell_wiley2017US <- powell_wiley(state = y51, year = 2017)
end_time <- Sys.time() # record end time
time_srr <- end_time - start_time # Calculate run time
ggplot(powell_wiley2017US$ndi, aes(x = NDI)) +
geom_histogram(color = 'black', fill = 'white') +
theme_minimal() +
ggtitle(
'Histogram of US-standardized NDI (Powell-Wiley) values (2013-2017)',
subtitle = 'U.S. census tracts as the referent (including AK, HI, and DC)'
)
The process to compute a US-standardized NDI (Powell-Wiley) took about 4.5 minutes to run on a machine with the features listed at the end of the vignette.
## R version 4.4.1 (2024-06-14 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 10 x64 (build 19045)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_United States.utf8
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] tigris_2.1 tidycensus_1.6.5 sf_1.0-16 ndi_0.1.6.9007
## [5] ggplot2_3.5.1 dplyr_1.1.4 knitr_1.48
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.5 xfun_0.47 bslib_0.8.0 psych_2.4.6.26
## [5] lattice_0.22-6 tzdb_0.4.0 Cairo_1.6-2 vctrs_0.6.5
## [9] tools_4.4.1 generics_0.1.3 curl_5.2.2 parallel_4.4.1
## [13] tibble_3.2.1 proxy_0.4-27 fansi_1.0.6 highr_0.11
## [17] pkgconfig_2.0.3 Matrix_1.7-0 KernSmooth_2.23-24 uuid_1.2-1
## [21] lifecycle_1.0.4 farver_2.1.2 compiler_4.4.1 stringr_1.5.1
## [25] munsell_0.5.1 mnormt_2.1.1 carData_3.0-5 htmltools_0.5.8.1
## [29] class_7.3-22 sass_0.4.9 yaml_2.3.10 pillar_1.9.0
## [33] car_3.1-2 crayon_1.5.3 jquerylib_0.1.4 tidyr_1.3.1
## [37] MASS_7.3-61 classInt_0.4-10 cachem_1.1.0 abind_1.4-5
## [41] nlme_3.1-166 tidyselect_1.2.1 rvest_1.0.4 digest_0.6.36
## [45] stringi_1.8.4 purrr_1.0.2 labeling_0.4.3 fastmap_1.2.0
## [49] grid_4.4.1 colorspace_2.1-1 cli_3.6.3 magrittr_2.0.3
## [53] utf8_1.2.4 e1071_1.7-14 readr_2.1.5 withr_3.0.1
## [57] scales_1.3.0 rappdirs_0.3.3 rmarkdown_2.28 httr_1.4.7
## [61] hms_1.1.3 evaluate_0.24.0 viridisLite_0.4.2 rlang_1.1.4
## [65] Rcpp_1.0.13 glue_1.7.0 DBI_1.2.3 xml2_1.3.6
## [69] rstudioapi_0.16.0 jsonlite_1.8.8 R6_2.5.1 units_0.8-5