Motivation

In the event of a wildfire, there is nothing more important than a successful evacuation. Wildfires in California can be very destructive; recently, they have been particularly concerning in Southern California. For example, one fire alone in October 2019 resulted in more than 100,000 people being ordered to evacuate their homes.

The California Department of Forestry and Fire Protection (CAL FIRE) recommends that all residents assemble an emergency supply kit to be better prepared in the event of a fire evacuation. Emergency supply kits provide critical goods as well as educational materials, such as an evacuation route, that make it feasible for a household to evacuate in the event of a wildfire.

We want to work with public sector agencies at the county level to strategically distribute emergency supply kits to the right individuals.

To facilitate this engagement, we developed an algorithm and an application called CalKit. The algorithm has two parts that feed into a composite score: wildfire risk and social vulnerability. For wildfire risk, we built a model to predict for fire occurrences using environmental risk factors and other spatial features. The predicted probabilities were then incorporated with the Social Vulnerability Index provided by the Centers for Disease Control and Prevention.

This final algorithm outputs a prioritized list of census tracts to target. Census tracts are prioritized based in equal parts on the probability of a fire occurring and the Social Vulnerability Index score for that tract. In practice, this will help ensure that kits make it to both those most in-need and those that are most vulnerable to emergencies like wildfires. The algorithm is accessible via an interactive application that allows users to tweak inputs such as budget and cost of the kits. In this way, users can visualize the exact areas and total number of households served.

Ultimately, the goal is for CalKit to make it into the hands of program managers who can facilitate the distribution of emergency supply kits, leveraging a user-friendly interface that helps them optimize allocation based on the households who need the support the most.

## Setup
#load libraries, initial functions, and palettes

library(tidyverse)
library(dplyr)
library(plyr)
library(tidyr)
library(sf)
library(sp)
library(viridis)
library(spatstat)
library(rgdal)
library(raster)
library(spdep)
library(FNN)
library(grid)
library(gridExtra)
library(knitr)
library(kableExtra)
library(tidycensus)
library(mapview)
library(leaflet)
library(rgeos)
library(nngeo)
library(gstat)
library(tigris)
library(pscl)
library(plotROC)
library(pROC)
library(scales)
library(lubridate)
library(caret)
library(jtools)
library(ggrepel)

root.dir = "C:/Users/jenna/Documents/GitHub/MUSA508-final/"

# nn function
nn_function <- function(measureFrom,measureTo,k) {
  measureFrom_Matrix <-
    as.matrix(measureFrom)
  measureTo_Matrix <-
    as.matrix(measureTo)
  nn <-
    get.knnx(measureTo, measureFrom, k)$nn.dist
    output <-
      as.data.frame(nn) %>%
      rownames_to_column(var = "thisPoint") %>%
      gather(points, point_distance, V1:ncol(.)) %>%
      arrange(as.numeric(thisPoint)) %>%
      group_by(thisPoint) %>%
      summarize(pointDistance = mean(point_distance)) %>%
      arrange(as.numeric(thisPoint)) %>%
      dplyr::select(-thisPoint) %>%
      pull()

  return(output)
}

#r cross validate function
crossValidate <- function(dataset, id, dependentVariable, indVariables) {

allPredictions <- data.frame()
cvID_list <- unique(dataset[[id]])

for (i in cvID_list) {

  thisFold <- i
  cat("This hold out fold is", thisFold, "\n")

  fold.train <- filter(dataset, dataset[[id]] != thisFold) %>% as.data.frame() %>%
                dplyr::select(id, geometry, indVariables, dependentVariable)
  fold.test  <- filter(dataset, dataset[[id]] == thisFold) %>% as.data.frame() %>%
                dplyr::select(id, geometry, indVariables, dependentVariable)

  regression <-
    glm(n_fires_int ~ ., family = "poisson",
      data = fold.train %>%
      dplyr::select(-geometry, -id))

  thisPrediction <-
    mutate(fold.test, Prediction = predict(regression, fold.test, type = "response"))

  allPredictions <-
    rbind(allPredictions, thisPrediction)

  }
  return(st_sf(allPredictions))
}

#quintile breaks
qBr <- function(df, variable, rnd) {
  if (missing(rnd)) {
    as.character(quantile(round(df[[variable]],0),
                          c(.01,.2,.4,.6,.8), na.rm=T))
  } else if (rnd == FALSE | rnd == F) {
    as.character(formatC(quantile(df[[variable]]), digits = 3),
                 c(.01,.2,.4,.6,.8), na.rm=T)
  }
}

q5 <- function(variable) {as.factor(ntile(variable, 5))}

#themes and palettes
mapTheme <- function(base_size = 12) {
  theme(
    text = element_text( color = "black"),
    plot.title = element_text(size = 14,colour = "black"),
    plot.subtitle=element_text(face="italic"),
    plot.caption=element_text(hjust=0),
    axis.ticks = element_blank(),
    panel.background = element_blank(),axis.title = element_blank(),
    axis.text = element_blank(),
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_rect(colour = "black", fill=NA, size=2),
    strip.text.x = element_text(size = 14))
}

plotTheme <- function(base_size = 12) {
  theme(
    text = element_text( color = "black"),
    plot.title = element_text(size = 14,colour = "black"),
    plot.subtitle = element_text(face="italic"),
    plot.caption = element_text(hjust=0),
    axis.ticks = element_blank(),
    panel.background = element_blank(),
    panel.grid.major = element_line("grey80", size = 0.1),
    panel.grid.minor = element_blank(),
    panel.border = element_rect(colour = "black", fill=NA, size=2),
    strip.background = element_rect(fill = "grey80", color = "white"),
    strip.text = element_text(size=12),
    axis.title = element_text(size=12),
    axis.text = element_text(size=10),
    plot.background = element_blank(),
    legend.background = element_blank(),
    legend.title = element_text(colour = "black", face = "italic"),
    legend.text = element_text(colour = "black", face = "italic"),
    strip.text.x = element_text(size = 14)
  )
}
palette2 <- c("#fcff93", "#f7b61f")


#getting discrete viridis inferno
library(scales)
#q_colors =  5
#v_colors =  viridis(q_colors, option = "B")
#print(v_colors)

paletteInferno5 <- c("#f6d746", "#f7b61f", "#c83233", "#51005b", "#000005")
paletteInferno8 <- c("#fcff93", "#f6d746", "#f7b61f", "#f06813", "#c83233", "#8c1750", "#51005b", "#1d0042")

Data Exploration and Feature Engineering

# need this to standardize the crs. will discuss features in detail later in the section.

#JENNA
fishnet_rasterfeatures <- st_read("C:/Users/jenna/Documents/GitHub/MUSA508-final/data/fishnet_veg_elev_imperv_nofires/fishnet_raster_features.shp")

#JEFF - change to your local path
#fishnet_veg_elev_imperv <- st_read("C:/Users/jenna/Documents/GitHub/MUSA508-final/data/fishnet_veg_elev_imperv_nofires/fishnet_raster_features.shp")

# fishnet adjustments
fishnet_rasterfeatures <- fishnet_rasterfeatures %>% dplyr::mutate(uniqueID = row_number())
fishnet_rasterfeatures$uniqueID <-as.integer(fishnet_rasterfeatures$uniqueID)

# rename some columns
fishnet_rasterfeatures <- fishnet_rasterfeatures %>%
  dplyr::rename(VEG_MAJORITY = MAJORITY) %>%
  dplyr::rename(ELEVATION_MEAN = MEAN) %>%
  dplyr::rename(IMPERV_SURFACE_PCT_MEAN = MEAN_1) %>%
  dplyr::select(-AREA, -MEAN_12) %>%
  st_sf()

#no longer needed, but just in case need this to refactor:
#fishnet_veg_elev <- fishnet_veg_elev %>% as.factor(fishnet_veg_elev$VEG_MAJORITY)

# convert the numbers for vegetation types back to the categories
fishnet_rasterfeatures$VEG_MAJORITY[fishnet_rasterfeatures$VEG_MAJORITY==0] <- "agriculture"
fishnet_rasterfeatures$VEG_MAJORITY[fishnet_rasterfeatures$VEG_MAJORITY==1] <- "barren_or_other"
fishnet_rasterfeatures$VEG_MAJORITY[fishnet_rasterfeatures$VEG_MAJORITY==2] <- "conifer"
fishnet_rasterfeatures$VEG_MAJORITY[fishnet_rasterfeatures$VEG_MAJORITY==3] <- "hardwood"
fishnet_rasterfeatures$VEG_MAJORITY[fishnet_rasterfeatures$VEG_MAJORITY==4] <- "herbaceous"
fishnet_rasterfeatures$VEG_MAJORITY[fishnet_rasterfeatures$VEG_MAJORITY==5] <- "shrub"
fishnet_rasterfeatures$VEG_MAJORITY[fishnet_rasterfeatures$VEG_MAJORITY==6] <- "urban"
fishnet_rasterfeatures$VEG_MAJORITY[fishnet_rasterfeatures$VEG_MAJORITY==7] <- "water"

Wildfire is not a phenomenon that depends on municipal boundaries, so feature identification and model development took place at a grid cell level to account for a smoother landscape across space. We created a “fishnet” of grid cells in ArcMap to cover Southern California. Five counties along the coast make up the study area for this analysis: Santa Barbara, Ventura, Los Angeles, Orange, and San Diego.

Note that after the model is developed and cross-validated for accuracy and generalizability, we extrapolate the fire probabilities up to the census tract level, since this is a more meaningful geography for our use case of evacuation kit allocation.

#all cali boundaries (counties)
counties <-  st_read("https://opendata.arcgis.com/datasets/8713ced9b78a4abb97dc130a691a8695_0.geojson") %>% st_transform(st_crs(fishnet_rasterfeatures))

#socal counties
socal_counties <- counties %>% filter(COUNTY_NAME == "Kern"| COUNTY_NAME == "Riverside" | COUNTY_NAME == "Los Angeles" | COUNTY_NAME == "Imperial" | COUNTY_NAME == "San Bernardino" | COUNTY_NAME == "Ventura"| COUNTY_NAME == "Orange" | COUNTY_NAME == "Santa Barbara" | COUNTY_NAME == "San Luis Obispo" | COUNTY_NAME == "San Diego")

socal_counties <- socal_counties %>% filter(OBJECTID < 59)

socal_counties_coastal <- socal_counties %>% filter(COUNTY_NAME == "Los Angeles" | COUNTY_NAME == "Santa Barbara" | COUNTY_NAME == "San Diego" | COUNTY_NAME == "Ventura" | COUNTY_NAME == "Orange")


# centroids of southern california counties
centroids.socal_counties <- cbind(socal_counties_coastal, st_coordinates(st_centroid(socal_counties_coastal)))

#join fishnet with counties
fishnet_features <-
  st_centroid(fishnet_rasterfeatures) %>%
    st_join(dplyr::select(socal_counties_coastal, COUNTY_NAME)) %>%
      st_drop_geometry() %>%
      left_join(dplyr::select(fishnet_rasterfeatures, geometry, uniqueID)) %>%
      st_sf() %>%
  na.omit()

Wildfire Data

Perimeters and Presence of Fires

#all perimeters
all_cali_fires <- st_read("https://opendata.arcgis.com/datasets/e3802d2abf8741a187e73a9db49d68fe_2.geojson") %>% st_transform(st_crs(fishnet_features))
# last five years
cali_fires_recent <- all_cali_fires %>% filter(YEAR_ > 2014)

#clipped fire perimeter data to southern california counties in ArcMap
#JENNA
fires_clipped <- st_read("C:/Users/jenna/Documents/GitHub/MUSA508-final/data/fires_clipped/fires_clipped.shp") %>% st_sf() %>% filter(YEAR_ > 2014)

#JEFF - add your local path
#fishnet_veg_elev <- st_read("C:/Users/jenna/Documents/GitHub/MUSA508-final/data/fires_clipped/fires_clipped.shp")

#filter to only include previous 10 years
fires_clipped <- fires_clipped %>% st_transform(st_crs(fishnet_features))

The historical fire perimeter data comes from the California Department of Forestry and Fire Protection (CAL FIRE), supplied on the California State Geoportal. The data was clipped to the southern portion of California and filtered to only include fire perimeters from incidents between 2015 and 2019. With climate change, environmental conditions have been changing more rapidly over more recent years, so we decided to use a smaller year range of data to develop a more accurate model that is reflective of recent conditions.

#map
#fires clipped for map
fires_clipped2 <- st_intersection(st_buffer(fires_clipped,0), socal_counties_coastal)


ggplot()+
  geom_sf(data=socal_counties_coastal, fill="gray", color="white") +
  geom_sf(data=socal_counties_coastal, fill="gray", color="white") +
  geom_sf(data=fires_clipped2, fill="#f7b61f", color="#f06813", alpha=0.9) +
     geom_text(data = centroids.socal_counties, aes(X, Y, label = COUNTY_NAME), size = 3, fontface="bold")+
  labs(title = "Fire Perimeters in Southern California",
       subtitle="2015-2019",
       caption="Data Sources: CAL FIRE; California State Geoportal") +
  mapTheme()

Some feature engineering is done to to calculate the number of fire intersections per fishnet cell and create a new count variable to explore, n_fires_int. Then, a binary numeric variable is created called prev_fire to indicate the historical presence of a fire (1) or not (0) at any point over the past five years.

## 1. intersections of fire perimeters with each fishnet cell. using length function: https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/lengths
fires_fishnet <- fishnet_features %>% mutate(n_fires_int = lengths(st_intersects(fishnet_features, fires_clipped)))

## 2. adding a column for yes or no for historical fire presence
fires_fishnet <-
  fires_fishnet %>%
  mutate(prev_fire = ifelse(fires_fishnet$n_fires_int > 0, "1", "0"))

Outcome Variable: Historical Fire Occurence (Yes or No)

Here, some feature engineering is done to to calculate the number of fire intersections per fishnet cell. Using this count data, a binary numeric variable is created called prev_fire to indicate any historical presence of a fire: yes (1) or no (0). This is our outcome variable - instead of relying on frequency (count data), it looks at the presence (or not) of any fire occurring at a given cell over the past five years. Note that the count data will be explored when looking at the spatial process of fire distribution later.

Below is a map of this outcome variable, prev_fire. It is clear that incidents of fires over the past five years were distributed throughout the area, but especially concentrated around the three middle counties: Ventura, Los Angeles, and Orange.

# mapping fire incident - yes or no (1 or 0)

#convert to character first
fires_fishnet$prev_fire <- as.character(fires_fishnet$prev_fire)

ggplot() +
  geom_sf(data = fires_fishnet, aes(fill = prev_fire)) +
  scale_fill_manual(values=palette2)+
  labs(title = "Presence of Historical Fire Incidents in Southern California",
       subtitle="Per Fishnet Cell | 2015-2019 \n \n0=no fire occurred; 1=at least one fire occurred",
       fill="Fire Occurrence",
      caption="Data Sources: CAL FIRE; California State Geoportal") +
  mapTheme()+ theme(legend.position='right')

Exploring Raster Data: Vegetation, Impervious Surfaces, Elevation

Due to the size of raster files covering the entire state of California, ArcMap is used to do some initial data wrangling.

Feature: Vegetation/Land Type Majority

Vegetation is fuel to wildfires, especially certain types that are concentrated in the same area. To explore the relationship between majority vegetation types and wildfire occurrence, this vegetation/land type (VEG_MAJORITY) variable was built from a raster file of statewide vegetation and land type from CAL FIRE’s Fire and Resource Assessment Program (FRAP), in cooperation with the California Department of Fish and Wildlife and the USDA Forest Service. In ArcMap, zonal statistics was used to calculate the majority vegetation life-form type per grid cell of the fishnet. The cells in the northwest area are majority conifer and hardwood, indicating a greater presence of forestry throughout. By contrast, the middle section - where Los Angeles County is located - contains cells that are majority urban and shrub.

# map the majority vegetation classification by fishnet cell
ggplot() +
  geom_sf(data = fires_fishnet, aes(fill = VEG_MAJORITY)) +
  scale_fill_manual(values = paletteInferno8)+
  labs(title = "Majority Vegetation/Land Type Classification in Southern California",
       subtitle="Per Fishnet Cell",
       caption="Data source: frap.fire.ca.gov - FVEG geodatabase (2015)") + mapTheme()

#in preparation for the model development, the a new fire variable created to a yes/no variable.
fires_fishnet <- fires_fishnet %>% mutate(fire_yesno = case_when(fires_fishnet$prev_fire == 1 ~ "yes",
                                                                 fires_fishnet$prev_fire == 0 ~ "no"))

This bar chart shows that most cells with at least one fire are of majority vegetation type “shrub” or “urban.” This is interesting, as it is not typically assumed at urban areas would be more fire prone than those with more tree cover. However, it is clear that the majority of cells are of type “shrub” to begin with. Given the high number of cells of this type in the data, we must not assume that shrubbery will be a strong predictive factor.

fires_fishnet_veg <- as.data.frame(fires_fishnet)

fires_fishnet_veg <- fires_fishnet_veg %>%
  dplyr::select(fire_yesno, VEG_MAJORITY) %>%
  gather(Variable, value, -fire_yesno) %>%
  dplyr::count(Variable, value, fire_yesno) %>%
  ggplot(aes(value, n, fill = fire_yesno)) +
    geom_bar(position = "dodge", stat="identity") +
    scale_fill_manual(values = palette2) +
    labs(x="Majority Type", y="Count of Cells",
         title = "Vegetation/Land Type Majority Associations with Fire Occurrence") +
    plotTheme() + theme(axis.text.x = element_text(angle = 45, hjust = 1))+
theme(legend.position = "bottom")

fires_fishnet_veg

Feature: Mean Percent of Impervious Surface

Impervious surfaces present concerns for wildfire, since they do not allow water to soak into the ground. If the surface cannot properly absorb water, it may be more difficult to diffuse fires or limit their spread. This is especially a concern in primarily highly-developed or industrial urban areas. As discussed above, majority urban areas contain a higher count of fire occurrences than most other vegetation land cover categories.

An impervious surfaces (IMPERV_SURFACE_PCT_MEAN) variable was created based on impervious surface raster data from the National Land Cover Database (NLCD). NLCD represents urban impervious surfaces as a percentage of developed surface over 30-meter pixels. The raster file was brought into ArcMap, clipped to the study area, and zonal statistics was used to calculate the mean percent of impervious surface per grid cell. As expected, the map shows that areas of Los Angeles County, especially by the coast where lots of people live, have a higher percentage of impervious surface area.

ggplot() +
  geom_sf(data = fishnet_features, aes(fill = IMPERV_SURFACE_PCT_MEAN)) +
  scale_fill_viridis(option="B")+
  labs(title = "Mean Impervious Surface across Southern California",
       subtitle="Per Fishnet Cell",
       fill="Mean Percent of \nImpervious Surface",
       caption="Data source: National Land Cover Database (2016)") + mapTheme()

Feature: Mean Elevation

Elevation affects fire behavior on multiple fronts: changing the level of exposure of an area from wind and precipitation, causing greater spread on steeper slopes, and increasing ignition potential. To create this elevation (ELEVATION_MEAN) variable, a raster image of Digital Elevation Model of 15 arc-seconds resolution from the U.S. Geological Survey was brought into ArcMap. Zonal statistics as table was used to calculate the mean elevation per grid cell of the fishnet.

# map the mean elevation by fishnet cell
# source: https://geodata.lib.berkeley.edu/catalog/stanford-wj406qc5431; publisher - USGS
ggplot() +
  geom_sf(data = fishnet_features, aes(fill = ELEVATION_MEAN)) +
  scale_fill_viridis(option="B")+
  labs(title = "Mean Elevation across Southern California",
       subtitle="Per Fishnet Cell",
       fill="Mean Elevation \n(meters)",
       caption="Data source: USGS - Digital Elevation Model of North America, 15 arc-seconds resolution (2009)") + mapTheme()

Engineering Weather Data Features

Weather is also a major factor affecting the conditions for a wildfire ignition and future spread. To incorporate weather data into this analysis, the {riem} package was used to import data from the National Oceanic and Atmospheric Administration (NOAA)’s Automated Surface Observing Systems across southern California weather stations. Data was aggregated over the 2015-2019 five-year period, inverse distanced weighting interpolation was used to create a smooth weather surface, and the values were applied to each nearest fishnet cell.

The variables are annual averages of four weather phenomena - precipitation, wind speed, temperature, and humidity - and are joined with the fishnet. The maps below show each of these four weather variables.

#Convert prev_fire to integer to prepare for model development later
fires_fishnet$prev_fire <- as.numeric(fires_fishnet$prev_fire)

#load in specific weather data selection for the stations
#JENNA
weather <- read.csv("C:/Users/jenna/Documents/GitHub/MUSA508-final/data/weather_data_socal_2015to2019.csv")

weather_annual <- weather %>% st_as_sf(coords = c("lon", "lat"), crs = 4236, agr="constant") %>% st_transform(st_crs("EPSG:3310"))

#JEFF - add local file path
#weather <- read.csv("C:/Users/jenna/Documents/GitHub/MUSA508-final/data/weather_data_socal_2015to2019.csv")

library(dplyr)
weather_annual <- weather_annual %>% dplyr::group_by(year, weather_station_id) %>%
      dplyr::summarize(mean_ann_tmpf = mean(mean_tmpf, na.rm = TRUE),
                       mean_ann_precipitation = mean(mean_precipitation, na.rm = TRUE),
                       mean_ann_humidity = mean(mean_humidity, na.rm = TRUE),
                       mean_ann_windspeed = mean(mean_wind_Speed, na.rm = TRUE))
#create new df with only precipitation variable
weather_precip <- weather_annual %>% dplyr::select(geometry, mean_ann_precipitation)

# perform idw for precipitation surface. DW interpolation that uses a power parameter of 2 (idp=2.0).
weather_precip.idw <- gstat::idw(mean_ann_precipitation ~ 1, weather_precip, newdata=fishnet_features, idp=2.0)

#rename var1.pred
weather_precip.idw <- weather_precip.idw %>% dplyr::rename(avg_precip_idw = var1.pred)

#create new df
precip.idw <- weather_precip.idw %>% dplyr::select(avg_precip_idw, geometry)

#join to main fishnet
fishnet_precip <- st_join(fires_fishnet, precip.idw, join=st_equals)

library(dplyr)
#new df with only wind speed variable
weather_wind <- weather_annual %>% dplyr::select(geometry, mean_ann_windspeed)

#idw for wind speed surface. DW interpolation that uses a power parameter of 2 (idp=2.0).
weather_wind.idw <- gstat::idw(mean_ann_windspeed ~ 1, weather_wind, newdata=fishnet_features, idp=2.0)

#rename var1.pred
weather_wind.idw <- weather_wind.idw %>% dplyr::rename(avg_windspeed_idw = var1.pred)

avg_windspeed.idw <- weather_wind.idw %>% dplyr::select(avg_windspeed_idw, geometry)

#join to precip fishnet, create a final fishnet
fishnet_with_wind <- st_join(fishnet_precip, avg_windspeed.idw, join=st_equals)

library(dplyr)#new df with only temp variable
weather_temp <- weather_annual %>% dplyr::select(geometry, mean_ann_tmpf)

#idw for temp surface. DW interpolation that uses a power parameter of 2 (idp=2.0).
weather_temp.idw <- gstat::idw(mean_ann_tmpf ~ 1, weather_temp, newdata=fishnet_features, idp=2.0)

#rename var1.pred
weather_temp.idw <- weather_temp.idw %>% dplyr::rename(avg_temp_idw = var1.pred)

avg_temp.idw <- weather_temp.idw %>% dplyr::select(avg_temp_idw, geometry)

#join to wind fishnet, create a final fishnet
fishnet_with_temp <- st_join(fishnet_with_wind, avg_temp.idw, join=st_equals)

library(dplyr)
#new df with only humidity variable
weather_hum <- weather_annual %>% dplyr::select(geometry, mean_ann_humidity)

#idw for temp surface. DW interpolation that uses a power parameter of 2 (idp=2.0).
weather_hum.idw <- gstat::idw(mean_ann_humidity ~ 1, weather_hum, newdata=fishnet_features, idp=2.0)

#rename var1.pred
weather_hum.idw <- weather_hum.idw %>% dplyr::rename(avg_hum_idw = var1.pred)

avg_hum.idw <- weather_hum.idw %>% dplyr::select(avg_hum_idw, geometry)

#join to wind fishnet, create a final fishnet
fishnet_with_hum <- st_join(fishnet_with_temp, avg_hum.idw, join=st_equals)
#maps with weather features

precip <- ggplot()+
  geom_sf(data=weather_precip.idw, aes(fill=avg_precip_idw, round(avg_precip_idw, digits=2))) +
  scale_fill_viridis(option="B") +
  labs(title="Average Precipitation",
       subtitle="2015-2019",
       caption="Data source: NOAA",
       fill="Inches")+
          mapTheme() + theme(legend.position='right')

wind <- ggplot()+
  geom_sf(data=weather_wind.idw, aes(fill=avg_windspeed_idw)) +
  scale_fill_viridis(option="B") +
  labs(title="Average Wind Speed",
       subtitle="2015-2019",
       caption="Data source: NOAA",
       fill="Tenths m/sec")+
          mapTheme() + theme(legend.position='right')

temp <- ggplot()+
  geom_sf(data=weather_temp.idw, aes(fill=avg_temp_idw)) +
  scale_fill_viridis(option="B") +
  labs(title="Average Temperature",
       subtitle="2015-2019",
       caption="Data source: NOAA",
       fill="Degrees (F)")+
          mapTheme()+ theme(legend.position='right')

humidity <- ggplot()+
  geom_sf(data=weather_hum.idw, aes(fill=avg_hum_idw)) +
  scale_fill_viridis(option="B") +
  labs(title="Average Relative Humidity",
       subtitle="2015-2019",
       caption="Data source: NOAA",
       fill="Percentage")+
          mapTheme()+ theme(legend.position='right')

#print maps
precip

wind

temp

humidity

The four bar charts below help to better explore the relationship between each weather variable and the likelihood of a fire occurrence. None of these variables, however, differ much in their respective means across the respective yes and no categories. Precipitation appears to have the biggest difference of likelihood, but not by a large degree. The importance of these variables will be explored further during iterative model development - alone they may not be strong indicators of fire occurrence, but they may contribute to predictive power as part of a model.

#rename to final fishnet
fishnet_final <- fishnet_with_hum

# creating new data frame to visualize weather features on bar plots
fishnet_final_weatherfeatures <- fishnet_final %>% st_drop_geometry() %>%
  dplyr::rename(avg_relative_humidity = avg_hum_idw) %>%
    dplyr::rename(avg_temperature = avg_temp_idw) %>%
  dplyr::rename(avg_wind_speed = avg_windspeed_idw) %>%
  dplyr::rename(avg_precipitation = avg_precip_idw)


fishnet_final_weatherfeatures %>%
  dplyr::select(fire_yesno, avg_precipitation, avg_wind_speed, avg_temperature, avg_relative_humidity) %>%
  gather(Variable, value, -fire_yesno) %>%
  ggplot(aes(fire_yesno, value, fill=fire_yesno)) +
    geom_bar(position = "dodge", stat = "summary", fun = "mean") +
    facet_wrap(~Variable, scales = "free_y") +
    scale_fill_manual(values = palette2) +
    labs(x="fire_yesno", y="Mean",
         title = "Weather Feature Associations with Fire Occurrence",
         subtitle="Averages, 2015-2019") +
    plotTheme() + theme(legend.position = "bottom")

Additional Feature Engineering

Feature: Tree Mortality Hazard Classification

Areas with trees classified as being at higher risk of mortality may contribute to the level of wildfire risk - more dead trees means more fuel for fires to burn. CAL FIRE’s Fire and Resource Assessment Program (FRAP) and the U.S. Forest Service have a classification system for areas of high tree mortality, which could be a decent predictor for wildfire risk. After loading the data source, the count of cells with at least one intersection of a tree mortality “tier one hazard zone” is calculated to create the feature variable n_tier1hazard_int.

tier1hazard_treemortality <- st_read("https://opendata.arcgis.com/datasets/a71a85136b0b414ea734fdfbe3d7674a_0.geojson") %>% st_transform(st_crs(fishnet_final))

## intersections of tier 1 areas with each fishnet cell, using length function
fishnet_final <- fishnet_final %>% mutate(n_tier1hazard_int = lengths(st_intersects(fishnet_final, tier1hazard_treemortality)))

Feature: Distance to Nearest Electric Substation

Given the presence of high voltage electricity coupled with flammable liquids and industrial equipment, internal malfunctions and damages to electric substations increase risk of fire eruption and spread. In 2018, electric company PG&E neglected maintenance of transmission infrastructure, which sparked the Camp Fire in Paradise, California that caused widespread structural damage and the deaths of over 80 people. Electric substations connect transmission lines, and contain high-voltage switches control electricity flow and magnitude. Given the high amount of electricity at these central hubs relative to the transmission lines, proximity to electric substations is explored as a fire risk feature.

The point data for electric substations comes from the California State Geoportal, and the distance from each fishnet cell’s centroid to the nearest substation is calculated (in meters). The map below illustrates this dist_elec_substation variable. The darker cells, indicating closer distance, are concentrated around the middle of the study area (which covers lots of Los Angeles County).

#more feature engineering...
#read in electric substations
electric_substations <- st_read("https://opendata.arcgis.com/datasets/7f37f2535d3144e898a53b9385737ee0_0.geojson") %>%
  dplyr::select(Y = Latitude, X = Longitute) %>%
    na.omit() %>%
    st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant") %>%
    st_transform(st_crs(fishnet_final))

# calculate distance (meters) from each cell centroid to nearest electric substation and add as a variable
fishnet_final <-
  fishnet_final %>%
    mutate(dist_elec_substation=st_nn(st_centroid(fishnet_final), electric_substations, 1))

fishnet_final$dist_elec_substation <- as.numeric(fishnet_final$dist_elec_substation)  # Convert variable to numeric
ggplot()+
  geom_sf(data=fishnet_final, aes(fill=dist_elec_substation)) +
  scale_fill_viridis(option="B") +
  labs(title="Distance to Nearest Electrical Substation in Southern California",
       fill="Distance \n(meters)",
  subtitle="Per Fishnet Cell",
  caption="Data source: California Open Data Portal")+
          mapTheme()

Feature: Distance to nearest CAL FIRE or Contract Facility

The distance to the nearest CAL FIRE or affiliated contract fire facility is also explored as an additional feature. Similarly to the substations, the distance is calculated from each cell’s centroid to the nearest facility and stored in a new variable called dist_fire_facility. The map below shows that areas in the southern part of the study area (mainly San Diego county) are closer, on average, to such facilities, where those in the more northern parts are further away.

#read in facilities
fire_facilities <- st_read("https://opendata.arcgis.com/datasets/1c8a93cac92f418e98a8fa6a2eaf4265_0.geojson") %>%
  dplyr::select(Y = LAT, X = LON) %>%
    na.omit() %>%
    st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant") %>%
    st_transform(st_crs(fishnet_final))

# calculate distance (meters) from each cell centroid to nearest facility and add as a variable
fishnet_final <-
  fishnet_final %>%
    mutate(dist_fire_facility=st_nn(st_centroid(fishnet_final), fire_facilities, 1))

fishnet_final$dist_fire_facility <- as.numeric(fishnet_final$dist_fire_facility)  # Convert variable to numeric
ggplot()+
  geom_sf(data=fishnet_final, aes(fill=dist_fire_facility)) +
  scale_fill_viridis(option="B") +
  labs(title="Distance to Nearest Fire Facility in Southern California",
       fill="Distance \n(meters)",
  subtitle="Per Fishnet Cell",
  caption="Data source: California Open Data Portal")+
          mapTheme()

Exploring the Spatial Process of Fire Incidents

After looking back at the outcome variable map of where the fire incidents occurred, we wanted to explore the distribution of fire incidents across space.

Local Moran’s I

The Local Moran’s I statistic is calculated to explore local spatial autocorrelation of fire incidents. It explores spatial clustering by measuring how similar locations are to their immediate neighbors. High Local Moran’s I values indicate that there is evidence of local clustering.

The results of the Local Moran’s I are plotted below, alongside the count of incidents (on the far left). Also mapped are the p-value and significant hotspots (cells with higher local incident counts than expected).

Based on these maps, there is evidence of spatial clustering exists in the middle part of the study area - specifically, Los Angeles County and into Ventura.

library(spdep)
#use {spdep}
fishnet_final.nb <- poly2nb(as_Spatial(fishnet_final), queen=TRUE)
## and neighborhoods to list of weights
fishnet_final.weights <- nb2listw(fishnet_final.nb, style="W", zero.policy=TRUE)

fishnet_final.localMorans <-
  cbind(
    as.data.frame(localmoran(fishnet_final$n_fires_int, fishnet_final.weights)),
    as.data.frame(fishnet_final)) %>%
    st_sf() %>%
      dplyr::select(n_fires_int_Count = n_fires_int,
                    Local_Morans_I = Ii,
                    P_Value = `Pr(z > 0)`) %>%
      mutate(Significant_Hotspots = ifelse(P_Value <= 0.05, 1, 0)) %>%
      gather(Variable, Value, -geometry)
vars <- unique(fishnet_final.localMorans$Variable)
varList <- list()

for(i in vars){
  varList[[i]] <-
    ggplot() +
      geom_sf(data = filter(fishnet_final.localMorans, Variable == i),
              aes(fill = Value), colour=NA) +
      scale_fill_viridis(option="B", name="") +
      labs(title=i) +
      mapTheme() + theme(legend.position="right")}

do.call(grid.arrange,c(varList, ncol = 2, top = "Local Morans I statistics, Number of Fire Intersections"))

# create Local Moran's I feature in fishnet_final
fishnet_final <-
  fishnet_final %>%
  mutate(n_fires_int.isSig =
           ifelse(spdep::localmoran(fishnet_final$n_fires_int,
                             fishnet_final.weights)[,5] <= 0.0000001, 1, 0))

# using st_nn
fishnet_final <-
  fishnet_final %>%
  mutate(n_fires_int.isSig.dist =
           st_nn(st_centroid(fishnet_final),(st_centroid(filter(fishnet_final, n_fires_int.isSig == 1))), 1))


# Convert variable to numeric
fishnet_final$n_fires_int.isSig.dist <-as.numeric(fishnet_final$n_fires_int.isSig.dist)

In order to develop a model that generalizes well, Local Moran’s I is added as a feature in the final fishnet grid, using a dummy variable to indicate that a cell is part of a significant cluster. Then, the average nearest neighbor distance is measured from the centroid of each grid cell to its nearest significant cluster. The p-value chosen is 0.0000001 - the smaller the p-value, the more significant the clusters.

# Convert variable to numeric
fishnet_final$prev_fire <- as.numeric(fishnet_final$prev_fire)  

Model Development and Testing

A logistic regression (in the generalized linear model family) predicts the probability of an observation based on features of the model. Since we are predicting the probability that a fire will occur, it is well-suited for this case. To maximize the probability of the data observed, the variables included and engineered are iteratively swapped in and out to find the best possible fit. It turns out that all of the engineered features together help improve the model, so they are all included.

The data is divided into training (70%) and testing (30%) sets, and the model runs on the training set.

set.seed(3456)
trainIndex <- createDataPartition(fishnet_final$prev_fire, p = .70,
                                list = FALSE,
                               times = 1)
dataTrain <- fishnet_final[ trainIndex,]
dataTest  <- fishnet_final[-trainIndex,]

reg.binomial<- glm(prev_fire ~ .,
                  data=st_drop_geometry(dataTrain) %>% dplyr::select(-fire_yesno, -uniqueID, -n_fires_int,), family="binomial")

#reg.binomial.r2 <- pR2(reg.binomial)[4]
#summary(reg.binomial)
#print(reg.binomial.r2)

Results and Goodness of Fit

To ensure the generalizabiilty of the model, it needs to both perform well on new data, and also predict relatively accurately across groups.

The results of the model for the training data are included below. Note that the evaluation across groups will be explored later when looking at the model performance across the five counties in the study area.

Results Summary

summ(reg.binomial)
Observations 1020
Dependent variable prev_fire
Type Generalized linear model
Family binomial
Link logit
χ²(22) 371.68
Pseudo-R² (Cragg-Uhler) 0.43
Pseudo-R² (McFadden) 0.29
AIC 947.96
BIC 1061.29
Est. S.E. z val. p
(Intercept) 30.04 10.96 2.74 0.01
VEG_MAJORITYbarren_or_other -16.71 1391.46 -0.01 0.99
VEG_MAJORITYconifer -1.29 0.81 -1.60 0.11
VEG_MAJORITYhardwood -1.01 0.85 -1.19 0.23
VEG_MAJORITYherbaceous 0.03 0.57 0.06 0.96
VEG_MAJORITYshrub 0.22 0.52 0.42 0.67
VEG_MAJORITYurban 0.21 0.66 0.32 0.75
VEG_MAJORITYwater 1.29 1.74 0.74 0.46
ELEVATION_MEAN -0.00 0.00 -3.77 0.00
IMPERV_SURFACE_PCT_MEAN -0.07 0.01 -5.37 0.00
COUNTY_NAMEOrange 0.02 0.43 0.05 0.96
COUNTY_NAMESan Diego -2.76 0.44 -6.22 0.00
COUNTY_NAMESanta Barbara -2.86 0.52 -5.48 0.00
COUNTY_NAMEVentura -0.01 0.36 -0.03 0.98
avg_precip_idw 206.28 111.01 1.86 0.06
avg_windspeed_idw -0.77 0.20 -3.89 0.00
avg_temp_idw -0.38 0.12 -3.02 0.00
avg_hum_idw -0.04 0.05 -0.82 0.41
n_tier1hazard_int 0.04 0.03 1.47 0.14
dist_elec_substation 0.00 0.00 1.06 0.29
dist_fire_facility 0.00 0.00 0.21 0.84
n_fires_int.isSig 17.19 471.10 0.04 0.97
n_fires_int.isSig.dist 0.01 0.00 1.53 0.13
Standard errors: MLE

Confusion Matrix and Statistics

The confusion matrix and its statistics for the regression are shown below. The matrix of 0s and 1s shows the comparison between the observed and predicted instances of fire occurrence at the 50% threshold. Sensitivity (the ability to predict true positive outcomes) here is ~46%, and specificity (the ability to predict true negative outcomes) is ~91%. This indicates that the model is better at predicting ares with an absence of a fire, and no fire occurred.

testProbs <- data.frame(Outcome = as.factor(dataTest$prev_fire),
                        COUNTY_NAME = dataTest$COUNTY_NAME,
                        Probs = predict(reg.binomial, dataTest, type= "response"))

testProbs <-
  testProbs %>%
  mutate(predOutcome  = as.factor(ifelse(testProbs$Probs > 0.5 , 1, 0)))
xtab.reg1 <-caret::confusionMatrix(testProbs$predOutcome, testProbs$Outcome,
                       positive = "1")

as.matrix(xtab.reg1) %>% kable(caption = "Confusion Matrix") %>% kable_styling("striped", full_width = T, font_size = 14, position = "left")
Confusion Matrix
0 1
0 274 73
1 27 63
as.matrix(xtab.reg1, what="classes") %>% kable(caption = "Confusion Matrix - Statistics") %>% kable_styling(font_size = 14, full_width = T,
                bootstrap_options = c("striped", "hover"))
Confusion Matrix - Statistics
Sensitivity 0.4632353
Specificity 0.9102990
Pos Pred Value 0.7000000
Neg Pred Value 0.7896254
Precision 0.7000000
Recall 0.4632353
F1 0.5575221
Prevalence 0.3112128
Detection Rate 0.1441648
Detection Prevalence 0.2059497
Balanced Accuracy 0.6867671

Distribution of Predicted Probabilities

The visualization below shows the distribution of outcomes for the model. It is clear that the model is better at predicting 0s (the “no fire occurred” outcome) than predicting 1s (the “at least one fire occurred” outcome).

ggplot(testProbs, aes(x = Probs, fill = as.factor(Outcome))) +
  geom_density() +
  facet_grid(Outcome ~ .) +
  scale_fill_manual(values = palette2) + xlim(0, 1) +
  labs(x = "Fire Occurred or Not (outcome variable)", y = "Density of Probabilities",
       title = "Distribution of Predicted Probabilities by Observed Outcome") +
  plotTheme() + theme(strip.text.x = element_text(size = 18),
        legend.position = "none")

ROC Curve

The Receiver Operating Characteristic (ROC) Curve for the model is shown below. This curve is a helpful goodness of fit indicator, while helping to visualize trade-offs between true positive and false positive metrics at each threshold from 0.01 to 1. A line going “over” the curve indicates a useful fit.

The area under the curve (AUC) here is about 0.78, indicating a fairly useful fit. A reasonable AUC is between 0.5 and 1.

ggplot(testProbs, aes(d = as.numeric(Outcome), m = Probs)) +
  geom_roc(n.cuts = 50, labels = FALSE, colour = "#FE9900") +
  style_roc(theme = theme_grey) +
  geom_abline(slope = 1, intercept = 0, size = 1.5, color = 'grey') +
  labs(title = "Receiver Operating Characteristic (ROC) Curve", subtitle = "AUC = ~0.78") + plotTheme()

Cross Validation (CV)

While the ROC indicates a relatively good fit, cross-validation is key to ensure that the model generalizes well across contexts.

A k-fold cross validation (CV) is performed. The factor levels for the outcome variable (“yes” and “no”) are reversed at this time to ensure that the cross-validation reads “yes” before “no” (reverse-alphabetical order) and calculates the metrics appropriately.

ctrl <- trainControl(method = "cv", number = 100, classProbs=TRUE, summaryFunction=twoClassSummary)

#str(fishnet_final$fire_yesno)
fishnet_final$fire_yesno <- as.factor(fishnet_final$fire_yesno)
fishnet_final$fire_yesno <- fct_rev(fishnet_final$fire_yesno)

cvFit.reg1 <- train(fire_yesno ~ .,
                  data=st_drop_geometry(fishnet_final) %>% dplyr::select(-prev_fire, -uniqueID, -n_fires_int),
                method="glm", family="binomial",
                metric="ROC", trControl = ctrl)

CV Results & Goodness of Fit

As with the initial results of the model, when cross-validated, it performs better when predicting negatives than it does predicting positive occurrences of fire. While this may initially seem problematic, the ultimate use case for this information will be the allocation of evacuation preparedness kits. Looking at the benefit for the public good, the “cost” of predicting falsely is not really a cost for the community receiving the kit – preparation materials can be useful in the face of other disasters.

print(cvFit.reg1) %>% kable(caption="Cross-Validation Results") %>% kable_styling(font_size = 14, full_width = T,bootstrap_options = c("striped", "hover"))
## Generalized Linear Model
##
## 1457 samples
##   13 predictor
##    2 classes: 'yes', 'no'
##
## No pre-processing
## Resampling: Cross-Validated (100 fold)
## Summary of sample sizes: 1443, 1443, 1443, 1443, 1442, 1443, ...
## Resampling results:
##
##   ROC        Sens    Spec
##   0.8021389  0.4405  0.922
Cross-Validation Results
ROC Sens Spec
0.8021389 0.4405 0.922

These cross-validation goodness of fit metrics are shown in the faceted plots below. These include the mean for area under the curve (“ROC” in this function’s output), sensitivity (“Sens”), and specificity (“Spec”) across all of the 100 folds.

To evaluate the generalizability of the model, we look to the distributions for each of the goodness of fit metrics. Tighter distributions around the means are indicative of greater generalizability. The cross-validated model generalizes well for specificity, meaning that it correctly predicts instances of no fire occurring most cases. On the other hand, it does not generalize as well for sensitivity, or the rate at which it correctly predicts a fire occurrence.

dplyr::select(cvFit.reg1$resample, -Resample) %>%
  gather(metric, value) %>%
  left_join(gather(cvFit.reg1$results[2:4], metric, mean)) %>%
  ggplot(aes(value)) +
    geom_histogram(bins=35, fill = "#f7b61f") +
    facet_wrap(~metric) +
    geom_vline(aes(xintercept = mean), colour = "red", linetype = 3, size = 1.5) +
    scale_x_continuous(limits = c(0, 1)) +
    labs(x="Goodness of Fit", y="Count", title="Cross-Validation Goodness of Fit Metrics",
         subtitle = "Across-fold mean reprented as dotted lines")

Costs and Benefits of Identifying False Positives and Negatives

Our model will not have perfect accuracy, and when determining the optimal threshold it is important to consider the costs and benefits of inaccurately predicting data. Keep in mind that the ultimate use case for this data is to help inform where emergency supply kits are allocated. For the sake of this section, let’s make some approximations and assume the following:

  • Emergency supply kits cost $50
  • Predictions are being made at the cell-level
  • Kits are being distributed to every household in a cell
Type of prediction What this means Total cost Total benefit
True Positives We predict a fire and there is one Households in this tract are more likely to be prioritized for a kit, costing $50/household These households will be better prepared for a future evacuation
False Positives We predict a fire and there is not one Households in this tract are more likely to be prioritized for a kit, costing $50/household No immediate benefit, although these kits do not expire after a year and these households will be better prepared for a future evacuation
True Negatives We do not predict a fire and there is not one Households here are less likely to be prioritized for a kit None
False Negatives We do not predict a fire and there is one Households here are less likely to be prioritized for a kit, and these households will be less prepared for a fire and at greater risk for evacuation None

As the table illustrates, optimizing the algorithm for more false positives will incur a greater financial cost and these supply kits may not go to immediate use. Whereas, optimizing for more false negatives is potentially leaving more households underprepared for an evacuation. For that reason, from a social and moral perspective, we believe that the ultimate cost of false positives is less than the cost of false negatives.

This is simply for illustrative purposes. As our final algorithm ends up extrapolating probabilities to the tract-level and then combining this data with the social vulnerability index, we cannot make any quantitative statements about cost-benefit at this time.

Finding the Optimal Threshold

In order to best optimize this model for use, identifying the threshold that provides the greatest accuracy is important. This is done by running the iterateThresholds function and looking at the results for all thresholds the table of thresholds 0.01 through 0.99. The optimal threshold in this case is 0.65.

# iterateThresholds function
iterateThresholds <- function(data, group) {
   group <- enquo(group)
  x = .01
  all_prediction <- data.frame()
  while (x <= 1) {

  this_prediction <-
      testProbs %>%
      mutate(predOutcome = ifelse(Probs > x, 1, 0)) %>%
         group_by(!!group) %>%
      dplyr::count(predOutcome, Outcome) %>%
      dplyr::summarize(sum_TN = sum(n[predOutcome==0 & Outcome==0]),
                sum_TP = sum(n[predOutcome==1 & Outcome==1]),
                sum_FN = sum(n[predOutcome==0 & Outcome==1]),
                sum_FP = sum(n[predOutcome==1 & Outcome==0]),
            total=sum(n)) %>%
    mutate(True_Positive = sum_TP / total,
         True_Negative = sum_TN / total,
         False_Negative = sum_FN / total,
         False_Positive = sum_FP / total,
         Accuracy = (sum_TP + sum_TN) / total, Threshold = x)

  all_prediction <- rbind(all_prediction, this_prediction)
  x <- x + .01
  }
return(all_prediction)
}
whichThreshold <- iterateThresholds(testProbs)

allThresholds<-kable(whichThreshold) %>%
  kable_styling(bootstrap_options = "striped", full_width = FALSE)%>%
  scroll_box(width = "100%", height = "500px")

allThresholds
sum_TN sum_TP sum_FN sum_FP total True_Positive True_Negative False_Negative False_Positive Accuracy Threshold
9 136 0 292 437 0.3112128 0.0205950 0.0000000 0.6681922 0.3318078 0.01
20 136 0 281 437 0.3112128 0.0457666 0.0000000 0.6430206 0.3569794 0.02
24 136 0 277 437 0.3112128 0.0549199 0.0000000 0.6338673 0.3661327 0.03
34 136 0 267 437 0.3112128 0.0778032 0.0000000 0.6109840 0.3890160 0.04
49 136 0 252 437 0.3112128 0.1121281 0.0000000 0.5766590 0.4233410 0.05
63 133 3 238 437 0.3043478 0.1441648 0.0068650 0.5446224 0.4485126 0.06
77 132 4 224 437 0.3020595 0.1762014 0.0091533 0.5125858 0.4782609 0.07
95 132 4 206 437 0.3020595 0.2173913 0.0091533 0.4713959 0.5194508 0.08
105 131 5 196 437 0.2997712 0.2402746 0.0114416 0.4485126 0.5400458 0.09
113 129 7 188 437 0.2951945 0.2585812 0.0160183 0.4302059 0.5537757 0.10
121 126 10 180 437 0.2883295 0.2768879 0.0228833 0.4118993 0.5652174 0.11
126 124 12 175 437 0.2837529 0.2883295 0.0274600 0.4004577 0.5720824 0.12
133 121 15 168 437 0.2768879 0.3043478 0.0343249 0.3844394 0.5812357 0.13
137 121 15 164 437 0.2768879 0.3135011 0.0343249 0.3752860 0.5903890 0.14
138 119 17 163 437 0.2723112 0.3157895 0.0389016 0.3729977 0.5881007 0.15
141 118 18 160 437 0.2700229 0.3226545 0.0411899 0.3661327 0.5926773 0.16
145 114 22 156 437 0.2608696 0.3318078 0.0503432 0.3569794 0.5926773 0.17
154 113 23 147 437 0.2585812 0.3524027 0.0526316 0.3363844 0.6109840 0.18
157 112 24 144 437 0.2562929 0.3592677 0.0549199 0.3295195 0.6155606 0.19
160 109 27 141 437 0.2494279 0.3661327 0.0617849 0.3226545 0.6155606 0.20
165 103 33 136 437 0.2356979 0.3775744 0.0755149 0.3112128 0.6132723 0.21
171 102 34 130 437 0.2334096 0.3913043 0.0778032 0.2974828 0.6247140 0.22
178 101 35 123 437 0.2311213 0.4073227 0.0800915 0.2814645 0.6384439 0.23
181 99 37 120 437 0.2265446 0.4141876 0.0846682 0.2745995 0.6407323 0.24
187 98 38 114 437 0.2242563 0.4279176 0.0869565 0.2608696 0.6521739 0.25
191 98 38 110 437 0.2242563 0.4370709 0.0869565 0.2517162 0.6613272 0.26
194 95 41 107 437 0.2173913 0.4439359 0.0938215 0.2448513 0.6613272 0.27
198 95 41 103 437 0.2173913 0.4530892 0.0938215 0.2356979 0.6704805 0.28
203 93 43 98 437 0.2128146 0.4645309 0.0983982 0.2242563 0.6773455 0.29
209 90 46 92 437 0.2059497 0.4782609 0.1052632 0.2105263 0.6842105 0.30
211 90 46 90 437 0.2059497 0.4828375 0.1052632 0.2059497 0.6887872 0.31
214 90 46 87 437 0.2059497 0.4897025 0.1052632 0.1990847 0.6956522 0.32
219 89 47 82 437 0.2036613 0.5011442 0.1075515 0.1876430 0.7048055 0.33
223 88 48 78 437 0.2013730 0.5102975 0.1098398 0.1784897 0.7116705 0.34
226 86 50 75 437 0.1967963 0.5171625 0.1144165 0.1716247 0.7139588 0.35
226 86 50 75 437 0.1967963 0.5171625 0.1144165 0.1716247 0.7139588 0.36
232 85 51 69 437 0.1945080 0.5308924 0.1167048 0.1578947 0.7254005 0.37
237 84 52 64 437 0.1922197 0.5423341 0.1189931 0.1464531 0.7345538 0.38
239 83 53 62 437 0.1899314 0.5469108 0.1212815 0.1418764 0.7368421 0.39
243 81 55 58 437 0.1853547 0.5560641 0.1258581 0.1327231 0.7414188 0.40
246 80 56 55 437 0.1830664 0.5629291 0.1281465 0.1258581 0.7459954 0.41
248 75 61 53 437 0.1716247 0.5675057 0.1395881 0.1212815 0.7391304 0.42
250 75 61 51 437 0.1716247 0.5720824 0.1395881 0.1167048 0.7437071 0.43
255 72 64 46 437 0.1647597 0.5835240 0.1464531 0.1052632 0.7482838 0.44
259 70 66 42 437 0.1601831 0.5926773 0.1510297 0.0961098 0.7528604 0.45
264 68 68 37 437 0.1556064 0.6041190 0.1556064 0.0846682 0.7597254 0.46
268 67 69 33 437 0.1533181 0.6132723 0.1578947 0.0755149 0.7665904 0.47
271 66 70 30 437 0.1510297 0.6201373 0.1601831 0.0686499 0.7711670 0.48
274 65 71 27 437 0.1487414 0.6270023 0.1624714 0.0617849 0.7757437 0.49
274 63 73 27 437 0.1441648 0.6270023 0.1670481 0.0617849 0.7711670 0.50
274 62 74 27 437 0.1418764 0.6270023 0.1693364 0.0617849 0.7688787 0.51
274 59 77 27 437 0.1350114 0.6270023 0.1762014 0.0617849 0.7620137 0.52
276 59 77 25 437 0.1350114 0.6315789 0.1762014 0.0572082 0.7665904 0.53
278 56 80 23 437 0.1281465 0.6361556 0.1830664 0.0526316 0.7643021 0.54
278 55 81 23 437 0.1258581 0.6361556 0.1853547 0.0526316 0.7620137 0.55
280 54 82 21 437 0.1235698 0.6407323 0.1876430 0.0480549 0.7643021 0.56
282 53 83 19 437 0.1212815 0.6453089 0.1899314 0.0434783 0.7665904 0.57
284 51 85 17 437 0.1167048 0.6498856 0.1945080 0.0389016 0.7665904 0.58
285 50 86 16 437 0.1144165 0.6521739 0.1967963 0.0366133 0.7665904 0.59
285 48 88 16 437 0.1098398 0.6521739 0.2013730 0.0366133 0.7620137 0.60
287 47 89 14 437 0.1075515 0.6567506 0.2036613 0.0320366 0.7643021 0.61
287 46 90 14 437 0.1052632 0.6567506 0.2059497 0.0320366 0.7620137 0.62
288 46 90 13 437 0.1052632 0.6590389 0.2059497 0.0297483 0.7643021 0.63
290 44 92 11 437 0.1006865 0.6636156 0.2105263 0.0251716 0.7643021 0.64
291 44 92 10 437 0.1006865 0.6659039 0.2105263 0.0228833 0.7665904 0.65
291 44 92 10 437 0.1006865 0.6659039 0.2105263 0.0228833 0.7665904 0.66
291 41 95 10 437 0.0938215 0.6659039 0.2173913 0.0228833 0.7597254 0.67
291 39 97 10 437 0.0892449 0.6659039 0.2219680 0.0228833 0.7551487 0.68
291 39 97 10 437 0.0892449 0.6659039 0.2219680 0.0228833 0.7551487 0.69
291 39 97 10 437 0.0892449 0.6659039 0.2219680 0.0228833 0.7551487 0.70
291 39 97 10 437 0.0892449 0.6659039 0.2219680 0.0228833 0.7551487 0.71
291 38 98 10 437 0.0869565 0.6659039 0.2242563 0.0228833 0.7528604 0.72
292 37 99 9 437 0.0846682 0.6681922 0.2265446 0.0205950 0.7528604 0.73
293 36 100 8 437 0.0823799 0.6704805 0.2288330 0.0183066 0.7528604 0.74
293 32 104 8 437 0.0732265 0.6704805 0.2379863 0.0183066 0.7437071 0.75
294 31 105 7 437 0.0709382 0.6727689 0.2402746 0.0160183 0.7437071 0.76
294 28 108 7 437 0.0640732 0.6727689 0.2471396 0.0160183 0.7368421 0.77
294 27 109 7 437 0.0617849 0.6727689 0.2494279 0.0160183 0.7345538 0.78
294 26 110 7 437 0.0594966 0.6727689 0.2517162 0.0160183 0.7322654 0.79
296 26 110 5 437 0.0594966 0.6773455 0.2517162 0.0114416 0.7368421 0.80
296 24 112 5 437 0.0549199 0.6773455 0.2562929 0.0114416 0.7322654 0.81
297 24 112 4 437 0.0549199 0.6796339 0.2562929 0.0091533 0.7345538 0.82
297 23 113 4 437 0.0526316 0.6796339 0.2585812 0.0091533 0.7322654 0.83
299 21 115 2 437 0.0480549 0.6842105 0.2631579 0.0045767 0.7322654 0.84
299 20 116 2 437 0.0457666 0.6842105 0.2654462 0.0045767 0.7299771 0.85
299 20 116 2 437 0.0457666 0.6842105 0.2654462 0.0045767 0.7299771 0.86
299 20 116 2 437 0.0457666 0.6842105 0.2654462 0.0045767 0.7299771 0.87
299 19 117 2 437 0.0434783 0.6842105 0.2677346 0.0045767 0.7276888 0.88
299 19 117 2 437 0.0434783 0.6842105 0.2677346 0.0045767 0.7276888 0.89
301 19 117 0 437 0.0434783 0.6887872 0.2677346 0.0000000 0.7322654 0.90
301 19 117 0 437 0.0434783 0.6887872 0.2677346 0.0000000 0.7322654 0.91
301 19 117 0 437 0.0434783 0.6887872 0.2677346 0.0000000 0.7322654 0.92
301 19 117 0 437 0.0434783 0.6887872 0.2677346 0.0000000 0.7322654 0.93
301 19 117 0 437 0.0434783 0.6887872 0.2677346 0.0000000 0.7322654 0.94
301 19 117 0 437 0.0434783 0.6887872 0.2677346 0.0000000 0.7322654 0.95
301 19 117 0 437 0.0434783 0.6887872 0.2677346 0.0000000 0.7322654 0.96
301 19 117 0 437 0.0434783 0.6887872 0.2677346 0.0000000 0.7322654 0.97
301 19 117 0 437 0.0434783 0.6887872 0.2677346 0.0000000 0.7322654 0.98
301 19 117 0 437 0.0434783 0.6887872 0.2677346 0.0000000 0.7322654 0.99

How Does Ths Model Perform Across Counties?

To assess how well this model performs across space, goodness of fit metrics are calculated across the five counties.

This visualization below shows the prediction outcomes grouped by county, at the optimal threshold identified (0.65). Accuracy rates are the highest for Los Angeles and San Diego counties. Santa Barbara has the greatest false negative rate, a noticable difference when compared to the rate for the other four counties. Due to its greater accuracy - coupled with the known condition of population density in the urban area - Los Angeles presents an compelling space to dive in further as we develop our application for evacuation kit distribution.

testProbs.thresholds <- iterateThresholds(data=testProbs, group=COUNTY_NAME)

filter(testProbs.thresholds, Threshold == "0.65")%>%
    dplyr::select(Accuracy, COUNTY_NAME, True_Positive, True_Negative, False_Negative, False_Positive) %>%
    gather(Variable, Value, -COUNTY_NAME) %>%
    ggplot(aes(Variable, Value, fill = COUNTY_NAME)) +
      geom_bar(aes(fill = COUNTY_NAME), position = "dodge", stat = "identity") +
      scale_fill_manual(values = paletteInferno5) +
      labs(title="Confusion Matrix Rates by County",
           fill="County",
           subtitle = "65% threshold", x = "Outcome",y = "Rate") +
      plotTheme() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + theme(legend.position="right")

Here is an ROC plot showing the goodness of fit for each of the five counties evaluated. It measures trade-offs in trues positive and false positive rates for each threshold. Across all thresholds, this plot conveys that the model performs similarly for Orange and San Diego counties, the two furthest southern ones in the study area. This indicates that there may be some spatial processes at play at these local scales, or some additional features not captured in the model. The model does not perform as well for Santa Barbara (as indicated by the closer “hugging” of the line, and even dipping below the line at some thresholds). This is consistent with the confusion matrix metrics shown in the bars above. It seems that this model is the most accurate when looking at Los Angeles and Ventura.

library(pROC)

ggplot(testProbs, aes(d = as.numeric(Outcome), m = Probs, group=COUNTY_NAME, color=COUNTY_NAME)) +
  geom_roc(n.cuts = 50, labels = F,)+
  scale_color_manual(values = paletteInferno5) +
  style_roc(theme = theme_grey) +
  geom_abline(slope = 1, intercept = 0, size = 1.25, color = 'grey') +
  labs(title = "ROC Curves, by County", color="County") + plotTheme() + theme(legend.position="right")

Preparing for the Use Case: From Grid Cells to Census Tracts

Now that we have an optimal threshold, we predict out the probabilities and join them back to the fishnet using each cell’s unique ID. Using this data set, we use areal weighting to make estimates from a set of source polygons (the grid cells) to the overlapping, incongruent set of target polygons (census tracts). The census tract geometries are incorporated using the US Census Bureau’s API. Finally, this dataset is exported to incorporate into the final algorithm and application development discussed in the next section.

predict_fishnet <-
  data.frame(uniqueID = fishnet_final$uniqueID,
  Outcome = fishnet_final$prev_fire,
  Probs = predict(reg.binomial, fishnet_final, type="response"))

predict_fishnet <-
  predict_fishnet %>%
  mutate(predOutcome  = as.factor(ifelse(predict_fishnet$Probs > 0.65, 1, 0)))


fishnet_final_preds <- cbind(fishnet_final,predict_fishnet)

Here is a visual representation of what this extrapolation from the grid cells to the census tracts look like for Los Angeles County:

#Code - ran separately, but this shows the process for areal-weighted interpolation
#1. fishnet_final_preds_only <- fishnet_final_preds %>% dplyr::select(uniqueID, Probs, geometry, COUNTY_NAME)

#2. interpolate_probs <-
#  dplyr::select(fishnet_final_preds, Probs) %>%
#  st_interpolate_aw(., tracts17, extensive = FALSE) %>% st_sf()


#3. interpolation_tracts <- st_join(interpolate_probs, tracts17, join=st_equals)

#4. fireProb_tracts <- interpolation_tracts %>% dplyr::select(GEOID, geometry, Probs) %>% st_sf()

#5. fireProb_tracts_nogeom <- interpolation_tracts %>% dplyr::select(GEOID, Probs) %>% st_drop_geometry()

#6. write.csv

CalKit: An Interactive Application Incorporating a Final Algorithm

We save the output from the fire probability model at the census tract level. These probabilities will be the “wildfire risk” component of the final algorithm powering our application for allocating emergency supply kits.

Assessing Population Vulnerability

Not only does our final algorithm take into account wildfire risk, but it also prioritizes census tracts to receive aid if their populations are more “vulnerable” to an emergency - like a wildfire. These populations may be less likely to be able to evacuate or less likely to be able to purchase or own an emergency supply kit on their own.

We initially explored creating our own composite score for vulnerability, but decided to use the reputable CDC Social Vulnerability Index. The CDC SVI uses variables from the US Census Bureau’s American Community Survey. Among its many uses, CDC recommends using the score to “allocate emergency preparedness funding by community need.”

knitr::opts_chunk$set(warning = FALSE, message = FALSE)
library(dplyr)
palette5 <- c("#f0f9e8","#bae4bc","#7bccc4","#43a2ca","#0868ac")
projection <- "EPSG:6423"
options(scipen = 999)
vulnIndex <- st_read("data/California-SVI/SVI2018_CALIFORNIA_tract.shp")%>%
  filter(COUNTY == 'Los Angeles') %>%
  filter(FIPS != '06037599100') %>%
  filter(FIPS != '06037599000') %>%
  dplyr::select(starts_with("RPL_"),"COUNTY","FIPS","LOCATION","E_TOTPOP") %>%
  mutate_if(is.numeric, list(~na_if(., -999)))%>%
  st_transform(projection) 

The CDC Social Vulnerability Index is made up of 4 sub-themes:

knitr::include_graphics("C:/Users/jenna/Documents/GitHub/MUSA508-final/images/SVI_image.jpg")
CDC SVI Vulnerability Index

CDC SVI Vulnerability Index

We retrieved the data for the Social Vulnerability Index themes for each census tract in Los Angeles County. These plots show the tracts’ percentile rankings for the sub-themes as well as the primary composite score for social vulnerability.

svi_theme1 <- ggplot(vulnIndex) +
  geom_sf(data = st_union(vulnIndex))+
  geom_sf(aes(fill = q5(RPL_THEME1)),lwd = 0) +
  scale_fill_manual(values = palette5,
                    labels = qBr(vulnIndex %>% drop_na(RPL_THEME1), "RPL_THEME1", rnd=FALSE),
                    name = "Percentile") +
  labs(title = "Socioeconomic \nstatus") +
  mapTheme() +
  theme(plot.title = element_text(size=16))

svi_theme2 <- ggplot(vulnIndex) +
  geom_sf(data = st_union(vulnIndex))+
  geom_sf(aes(fill = q5(RPL_THEME2)),lwd = 0) +
  scale_fill_manual(values = palette5,
                    labels = qBr(vulnIndex %>% drop_na(RPL_THEME2), "RPL_THEME2", rnd=FALSE),
                    name = "Percentile") +
  labs(title = "Household composition \n& disability") +
  mapTheme() +
  theme(plot.title = element_text(size=16))

svi_theme3 <- ggplot(vulnIndex) +
  geom_sf(data = st_union(vulnIndex))+
  geom_sf(aes(fill = q5(RPL_THEME3)),lwd = 0) +
  scale_fill_manual(values = palette5,
                    labels = qBr(vulnIndex %>% drop_na(RPL_THEME3), "RPL_THEME3", rnd=FALSE),
                    name = "Percentile") +
  labs(title = "Minority status \n& language") +
  mapTheme() +
  theme(plot.title = element_text(size=16))

svi_theme4 <- ggplot(vulnIndex) +
  geom_sf(data = st_union(vulnIndex))+
  geom_sf(aes(fill = q5(RPL_THEME4)),lwd = 0) +
  scale_fill_manual(values = palette5,
                    labels = qBr(vulnIndex %>% drop_na(RPL_THEME4), "RPL_THEME4", rnd=FALSE),
                    name = "Percentile") +
  labs(title = "Housing type \n& transportation") +
  mapTheme() +
  theme(plot.title = element_text(size=16))

grid.arrange(svi_theme1, svi_theme2, svi_theme3, svi_theme4, nrow=2,ncol=2, top="Percentile Rankings, 2018 CDC Social Vulnerability Index Sub-Themes \n By Census Tract, Los Angeles County")

svi_total <- ggplot(vulnIndex) +
  geom_sf(data = st_union(vulnIndex))+
  geom_sf(aes(fill = q5(RPL_THEMES)),lwd = 0) +
  scale_fill_manual(values = palette5,
                    labels = qBr(vulnIndex %>% drop_na(RPL_THEMES), "RPL_THEMES", rnd=FALSE),
                    name = "Percentile") +
  labs(title = "CDC Social Vulnerability Index Composite", subtitle = "By Census Tract, Los Angeles County", caption="2018 CDC Social Vulnerability Index")+
  mapTheme() +
  theme(plot.title = element_text(size=16))

svi_total

Although the CDC SVI dataset includes population information for each tract, we want to be able to allocate one kit per household. We get household data from the American Community Survey and join it with our social vulnerability index information.

projection <- "EPSG:6423"

#census_api_key("e59695f18b5f5959947fd9098feba458ca285cc5", install=TRUE, overwrite=TRUE)
housingUnitsByTract <- get_acs(geography = "tract", variables = c("B07013_002E"),
            year=2018, state='CA', county=c('Los Angeles'), geometry=T, output="wide") %>%
        st_transform(projection) %>%
        dplyr::rename(HousingUnits = "B07013_002E") %>%
        dplyr::select(-starts_with("B"), -NAME)  %>%
  filter(GEOID != '06037599100') %>%
  filter(GEOID != '06037599000') %>%
  dplyr::rename(FIPS='GEOID') %>% st_drop_geometry()
vulnIndex <- vulnIndex %>% dplyr::left_join(housingUnitsByTract, left=TRUE)

We combine the Social Vulnerability Index with our fire risk for each tract.

fireProbabilities <- read.csv("fireProb_tracts_nogeom.csv") %>%
    dplyr::rename(FIPS='GEOID') %>%
    mutate(FIPS=paste('0',as.character(FIPS),sep="")) %>%
    dplyr::rename(FireProb='Probs')

fireProbabilities <- fireProbabilities %>%
    mutate(FireProbNormalized = (FireProb)/(max(fireProbabilities$FireProb)))

finalDataSet <- vulnIndex %>% dplyr::left_join(fireProbabilities, left=TRUE)

# Exporting this dataset for use with the Shiny App:
# st_write("EmergencyKitDistribution/finalDataSet-forShiny.shp")

The getPrioritizedCensusTracts() function that can be used to invoke our algorithm and output a prioritized list of census tracts. The inputs for the function are:

  1. Total allocated budget (default: $250,000)
  2. Cost of the kit (default: $20)
  3. Percentage of households in each tract to distribute a kit (default: 100%)
  4. Amount of weight to give to fire risk, vs. social vulnerability (default: 50%)

The function returns a dataset of all the Los Angeles County tracts with columns added to indicate whether or not they should be prioritized to receive kits, and how many they should receive.

addComma <- function(num){ format(num, big.mark=",")}

getPrioritizedCensusTracts <- function(budget = 250000, costOfKit = 20, pctOfTract = 1.0, weight = 0.5){

  d <- finalDataSet %>%
    mutate(isReceivingKits = 0) %>%
    mutate(totalKitsToSend = 0) %>%
    mutate(priorityRanking = FireProbNormalized*weight + RPL_THEMES*(1-weight)) %>%
    arrange(desc(priorityRanking))

   numOfKits <- (budget / costOfKit)

  budgetRemaining <- budget
  tract.i <- 1
  totalHousingUnits <- 0

  while(budgetRemaining > costOfKit) {
      housingUnits <- floor(d[tract.i,]$HousingUnits * pctOfTract)
      if (housingUnits == 0) { tract.i <- tract.i+1; next; }
      costForTract <- housingUnits*costOfKit
      if(costForTract <= budgetRemaining) {
        kitsSent <- housingUnits
      } else {
        possibleHousingUnits <- floor(budgetRemaining/costOfKit)
        kitsSent <- possibleHousingUnits
      }
      budgetRemaining <- budgetRemaining - costForTract
      d[tract.i,]$isReceivingKits <- 1
      d[tract.i,]$totalKitsToSend <- kitsSent
      tract.i <- tract.i + 1
  }
  attr(d, "budget") <- budget
  attr(d, "costOfKit") <- costOfKit
  attr(d, "pctOfTract") <- pctOfTract
  attr(d, "weight") <- weight
  return(d)
}

The getServedTractsPlot() function uses the dataframe from getPriotizedCensusTracts() to generate the tracts that can be served.

getServedTractsPlot <- function(df, ttl, st){
  allocationMap <- ggplot(df) +
    geom_sf(data = st_union(df))+
    geom_sf(aes(fill = isReceivingKits),lwd = 0) +
    labs(subtitle = st, title = ttl) +
    mapTheme() +
    theme(plot.title = element_text(size=14)) +
    theme(plot.subtitle = element_text(size=10)) +
    theme(legend.position = "none")

  allocationMap
}

We are now able to see exactly which census tracts are served. Here are a few examples where we are varying the budget and then the percentage of households in a tract that we send a kit. In each instance, the algorithm prioritizes a different set of tracts.

example1 <- getPrioritizedCensusTracts(budget = 250000, costOfKit = 20)
example2 <- getPrioritizedCensusTracts(budget = 2500000, costOfKit = 20)
example3 <- getPrioritizedCensusTracts(budget = 2500000, costOfKit = 20, pctOfTract = 0.5)

example1.plot <- getServedTractsPlot(example1, "Census Tracts Served (ex. 1)", "$250k and $20 kits")
example2.plot <- getServedTractsPlot(example2, "Census Tracts Served (ex. 2)","Increased budget to $2.5m")
example3.plot <- getServedTractsPlot(example3, "Census Tracts Served (ex. 3)","Sending kits to only 50% of households increases the number of served tracts")

grid.arrange(example1.plot, example2.plot, example3.plot, nrow=2)

We can also adjust the weights to see how different tracts would be served if we ONLY prioritized based off the social vulnerability index or ONLY prioritized based off of the probability of a fire occurring.

As these graphics makes clear, by only using wildfire tracts or only using census data, a program manager would be serving very different and distant households.

example4 <- getPrioritizedCensusTracts(budget = 2500000, costOfKit = 20, pctOfTract = 0.5, weight=0)
example5 <- getPrioritizedCensusTracts(budget = 2500000, costOfKit = 20, pctOfTract = 0.5, weight=1)

example4.plot <- getServedTractsPlot(example4, "Census Tracts Served (ex. 4)","Only prioritizing on vulnerability index")
example5.plot <- getServedTractsPlot(example5, "Census Tracts Served (ex. 5)","Only prioritizing on fire probability")

grid.arrange(example4.plot, example5.plot, nrow=1)

The Application

For our Los Angeles County program manager, we have made this algorithm available via CalKit, an interactive application. CalKit can be used at https://bit.ly/508wildfire or below. The user can adjust the inputs for the algorithm and see a live map and table update to highlight the tracts to serve.

Discussion and Future Work

We are excited for the potential of CalKit to strategically distribute emergency supply kits to the highest priority census tracts in Los Angeles County. As seen above, our algorithm helps prioritize a very different group of households compared to just using wildfire risk or just using a social vulnerability index. The interactive application gives a program manager the flexibility to explore this data in more detail to make an informed decision about how to spend their budget.

Reflecting on the fire probability algorithm, we believe that this is an instrumental part of the final product. We have used historical data to create a model that predicts with a high degree of accuracy where fires may occur. The model generalizes reasonably well to most other counties around Los Angeles. In future work, we would want to continue to improve the generalizability of the model, especially for areas like Santa Barbara. We also want to dive deeper into nuances across census tracts, running spatial cross-validation at this level and examining how well the model generalizes for different types of tracts (eg. tracts of lower and higher socioeconomic status).

We believe that CalKit can also be extended to serve other geographic areas, such as neighboring counties with decent generalizability. Perhaps even more exciting is to think about the use cases that it may have for allocating other types of emergency relief – such as financial aid or COVID-19 vaccines.

In short, we believe that this is just the beginning. We hope that 508 Analytics and the CalKit application can help communities make data-informed decisions when preparing for – and responding to – both emergency disasters and everyday problems.

— Jenna Epstein and Jeff Stern, December 2020