Motivation

According to the World Health Organization, floods are the most frequent natural disaster affecting countries across the globe. Floods have the potential to leave communities devastated not only because of the direct loss of life due to drowning and infrastructure damage, but also because of indirect impacts such as increased transmission of disease, higher risk of injury and hypothermia, disrupted or disabled infrastructure systems, and increased likelihood of causing other natural disasters. In the past ten years, climate change has exacerbated the effects of natural disasters like flooding, drought, sea level rise, and extreme precipitation and their frequency and intensity are expected to continue to rise if left unchecked.

City planners have the utmost responsibility to ensure their cities are prepared to respond to such natural disasters and minimize harm to their most vulnerable communities. The following document outlines a machine learning algorithm that predicts which areas of a city are at the highest risk of flooding disasters. An Emergency Management team can use the resulting model of this algorithm to inform their preparedness tactics to minimize damage and respond to future flooding disasters more efficiently. Ultimately, the algorithm is a helpful tool for individuals in Public Works, Public Health, City Planning, and Community-Based Organizations to proactively plan for resilience.

The model is created using past flood data from the City of Calgary in Canada to measure accuracy and then is applied to the City of Denver in the United States to measure generalizability. This document explains each step of the process such that City Planning departments can input their city’s data and accurately interpret the resulting model. We also wanted to provide transparency into the feature engineering steps of the process, which is why the procedures in ArcGIS and R are described in detail.

For audiences who would like the main takeaways, we invite you to view this three-page summary document.

Setup

The algorithm was created using ArcGIS Pro and R. First, the necessary R libraries are loaded and themes are defined for the maps and plots that will be displayed later in the process.

# SETUP

# load libraries
library(caret)
library(pscl)
library(plotROC)
library(pROC)
library(sf)
library(tidyverse)
library(knitr)
library(kableExtra)
library(FNN)
library(scales)
library(jtools)
library(viridis)
library(gridExtra)


# load themes and palettes
mapTheme <- function(base_size = 12) {
  theme(
    text = element_text( color = "black"),
    plot.title = element_text(size = 14,colour = "black"),
    plot.subtitle=element_text(face="italic"),
    plot.caption=element_text(hjust=0),
    axis.ticks = element_blank(),
    panel.background = element_blank(),axis.title = element_blank(),
    axis.text = element_blank(),
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_rect(colour = "black", fill=NA, size=2),
    strip.text.x = element_text(size = 14))
}

plotTheme <- function(base_size = 12) {
  theme(
    text = element_text( color = "black"),
    plot.title = element_text(size = 14,colour = "black"),
    plot.subtitle = element_text(face="italic"),
    plot.caption = element_text(hjust=0),
    axis.ticks = element_blank(),
    panel.background = element_blank(),
    panel.grid.major = element_line("grey80", size = 0.1),
    panel.grid.minor = element_blank(),
    panel.border = element_rect(colour = "black", fill=NA, size=2),
    strip.background = element_rect(fill = "grey80", color = "white"),
    strip.text = element_text(size=12),
    axis.title = element_text(size=12),
    axis.text = element_text(size=10),
    plot.background = element_blank(),
    legend.background = element_blank(),
    legend.title = element_text(colour = "black", face = "italic"),
    legend.text = element_text(colour = "black", face = "italic"),
    strip.text.x = element_text(size = 14)
  )
}

Calgary Data

Initial Data

The fishnet was generated in ArcGIS Pro using the Create Fishnet tool in order to generate a grid of the appropriate extent with 200 meter by 200 meter cells as the unit of measurement. This cell size was determined using the overall extent of the city boundary and the size of the resulting dataset. Cell size will vary depending on the city of focus and the computing power of the machine being used.

The inundation raster data was also generated in ArcGIS Pro using the Reclassify tool to indicate pixels within the flooding extent and pixels outside of the flooding extent. Inundated pixels are assigned a value of 1 and not inundated pixels are assigned a value of 0. Zonal Statistics were then used to calculate the maximum level of flooding per fishnet grid cell in order to visualize and identify which grid cells are at the highest risk of flooding.

# CALGARY: Load Calgary fishnet generated in ArcGIS Pro (200x200m cells).
calgary_fishnet <- st_read("Data_Calgary/Calgary_new/for_R/calgary_fishnet.shp") %>%
st_as_sf() %>% st_transform('EPSG:3780') %>% rename(ID_FISHNET = ID_fishnet)

calgary_fishnet$ID_FISHNET <-as.character(calgary_fishnet$ID_FISHNET)

calgary_boundary <- st_read("provided_midTermProject_Data/CALGIS_CITYBOUND_LIMIT/CALGIS_CITYBOUND_LIMIT.shp")
calgary_boundary <- calgary_boundary %>% st_transform('EPSG:3780')

#ggplot() +
#  geom_sf(data=calgary_boundary, fill="grey") +
#  geom_sf(data=calgary_fishnet, alpha=0.2) + mapTheme()

Feature Engineering

Next, data for features that contribute to flooding and flood risk is loaded into ArcGIS Pro for calculations. The following sections describe this process in brief and explains why they are included in the model or not. Once the calculations are complete, the resulting data tables are exported from ArcGIS Pro and loaded into R to inform the model. We also used a select-by-centroids approach to select fishnet grid cells that have centroids situated within the 2013 flood extent.

Knowledge about the characteristics of the land such as topography, land cover, and hydrology can indicate how susceptible the area is to floods.

# CALGARY: Inundation classification (generated by reclassifying the initial inundation raster, and calculated using zonal statistics (maximum)).
calgary_inundation <- st_read("Data_Calgary/Calgary_new/for_R/cal_inun_max.xls")  %>%
  dplyr::select(ID_FISHNET, MAX) %>%
  #dplyr::rename(ID_fishnet = FID) %>%
  dplyr::rename(inundation = MAX)

calgary_inundation$ID_FISHNET <-as.character(calgary_inundation$ID_FISHNET)

calgary_inundation_grid <- left_join(calgary_inundation, calgary_fishnet)
calgary_fishnet_new <-  calgary_inundation_grid

# CALGARY: Elevation mean (calculated using zonal statistics (mean)).
calgary_elev <- st_read("Data_Calgary/Calgary_new/for_R/cal_elev_mean.xls") %>%
  dplyr::select(ID_FISHNET, MEAN) %>%
  dplyr::rename(elevation_mean = MEAN)

calgary_elev$ID_FISHNET <-as.character(calgary_elev$ID_FISHNET)

calgary_elev_grid <- full_join(calgary_elev, calgary_fishnet_new) %>% st_as_sf()
calgary_fishnet_new <- calgary_elev_grid

# CALGARY: Stream network, generated using the ArcHydro workflow from the week4 lab in ArcGIS Pro.
calgary_streams <- st_read("Data_Calgary/Calgary_new/for_R/Hydrology/cal_stream.shp")
calgary_streams_neardist <- st_read("Data_Calgary/Calgary_new/for_R/cal_stream_neardist.xls")  %>%
  dplyr::select(-Rowid, -OBJECTID, -NEAR_FID) %>%
  dplyr::rename(ID_FISHNET = IN_FID) %>%
  dplyr::rename(dist_stream = NEAR_DIST)

## used the generate near table in ArcGIS to calculate from fishnet to stream
calgary_streams_neardist$ID_FISHNET <-as.character(calgary_streams_neardist$ID_FISHNET)

calgary_streamneardist_grid <- full_join(calgary_streams_neardist, calgary_fishnet_new) %>% st_as_sf()
calgary_fishnet_new <-  calgary_streamneardist_grid

# CALGARY: flow accumulation max per fishnet cell. Generated using zonal statistics (maximum).
calgary_flowacc <- st_read("Data_Calgary/Calgary_new/for_R/cal_flowacc_max.xls") %>%
  dplyr::rename(flowacc_max = MAX) %>%
  dplyr::select(-Rowid, -COUNT, -AREA, -ZONE.CODE)

calgary_flowacc$ID_FISHNET <-as.character(calgary_flowacc$ID_FISHNET)

calgary_flowacc_grid <- full_join(calgary_flowacc, calgary_fishnet_new) %>%st_as_sf()
calgary_fishnet_new <- calgary_flowacc_grid


# CALGARY: Parks distance - from open data calgary. Used arcgis pro to create a near distance table.
calgary_parks_neardist <- st_read("Data_Calgary/Calgary_new/for_R/cal_parksfeat_near.xls")  %>%
  dplyr::select(-Rowid, -OBJECTID, -NEAR_FID) %>%
  dplyr::rename(ID_FISHNET = IN_FID) %>%
  dplyr::rename(dist_parks = NEAR_DIST)

calgary_parks_neardist$ID_FISHNET <-as.character(calgary_parks_neardist$ID_FISHNET)

calgary_parks_neardist_grid <- full_join(calgary_parks_neardist, calgary_fishnet_new) %>% st_as_sf()
calgary_fishnet_new <-  calgary_parks_neardist_grid


# CALGARY: Slope max per fishnet cell. Generated using zonal statistics (max). Ended up not using this in the model, but still explored in feature engineering.

calgary_slope <- st_read("Data_Calgary/Calgary_new/for_R/cal_slope_max.xls") %>%
  dplyr::rename(slope_max = MAX) %>%
  dplyr::select(-Rowid, -COUNT, -AREA, -ZONE.CODE)

calgary_slope$ID_FISHNET <-as.character(calgary_slope$ID_FISHNET)

calgary_slope_grid <- full_join(calgary_slope, calgary_fishnet_new) %>% st_as_sf()
calgary_fishnet_new <- calgary_slope_grid

# CALGARY: Distance to steep slopes per fishnet cell. Generated using near table, near distance.
calgary_steepslope_dist <- st_read("Data_Calgary/Calgary_new/for_R/cal_steepslope_dist.xls")  %>%
  dplyr::select(-Rowid, -OBJECTID, -NEAR_FID) %>%
  dplyr::rename(ID_FISHNET = IN_FID) %>%
  dplyr::rename(steepslope_dist = NEAR_DIST)

calgary_steepslope_dist$ID_FISHNET <-as.character(calgary_steepslope_dist$ID_FISHNET)

calgary_steepslope_dist_grid <- full_join(calgary_steepslope_dist, calgary_fishnet_new) %>%st_as_sf()
calgary_fishnet_new <- calgary_steepslope_dist_grid

# CALGARY: Intersections with hydro features
#shapefile import
calgary_hydro <- st_read("Data_Calgary/Calgary_new/for_R/Hydrology/hydrology_features.shp") %>%
  st_transform(st_crs(calgary_fishnet_new))

#visualize hydrology features - more extensive than what the stream network from ArcGIS hydrology workflow. so it is streams + major hydrology features like lakes and ponds, etc.
#ggplot() +
#  geom_sf(data=calgary_boundary, fill="grey") +
#  #geom_sf(data=calgary_fishnet, alpha=0.2) +
#  geom_sf(data=calgary_hydro, color="blue") +mapTheme()

calgary_fishnet_new <- calgary_fishnet_new %>% mutate(n_hydro_int = lengths(st_intersects(calgary_fishnet_new, calgary_hydro)))

# CALGARY: majority land cover type per fishnet cell. Reclassified the data by land cover impervious and not impervious, converted it to a raster, and used zonal statistics (sum) for pixels. Not used in the model ultimately, but still explored in feature engineering process - worthwhile to discuss.

calgary_impervious_sum <- st_read("Data_Calgary/Calgary_new/for_R/cal_lcimp_sumt.xls") %>%
  dplyr::rename(imperv_pixels_sum = SUM) %>%
  dplyr::select(-Rowid, -COUNT, -AREA, -ZONE.CODE)

calgary_impervious_sum$ID_FISHNET <-as.character(calgary_impervious_sum$ID_FISHNET)
calgary_impervious_sum_grid <- full_join(calgary_impervious_sum, calgary_fishnet_new) %>% st_as_sf()
calgary_impervious_sum_grid[is.na(calgary_impervious_sum_grid)] = 0
calgary_fishnet_new <- calgary_impervious_sum_grid
# Centroid-in-polygon join to see which cells have their centroid in the calgary boundary
calgary_selectCentroids <-
  st_centroid(calgary_fishnet_new)[calgary_boundary,] %>%
  st_drop_geometry() %>%
  left_join(dplyr::select(calgary_fishnet_new, ID_FISHNET)) %>%
  st_sf()

calgary_fishnet_final <- calgary_selectCentroids
calgary_fishnet_final <- calgary_fishnet_final %>%
  dplyr::select(-Shape_Leng, -Shape_Area)

calgary_fishnet_final$inundation <-as.numeric(as.character(calgary_fishnet_final$inundation))
calgary_fishnet_final$slope_max <-as.numeric(as.character(calgary_fishnet_final$slope_max))
calgary_fishnet_final$flowacc_max <-as.numeric(as.character(calgary_fishnet_final$flowacc_max))
calgary_fishnet_final$elevation_mean <-as.numeric(as.character(calgary_fishnet_final$elevation_mean))
calgary_fishnet_final$imperv_pixels_sum <-as.numeric(as.character(calgary_fishnet_final$imperv_pixels_sum))
calgary_fishnet_final$dist_stream <-as.numeric(as.character(calgary_fishnet_final$dist_stream))
calgary_fishnet_final$n_hydro_int <-as.numeric(as.character(calgary_fishnet_final$n_hydro_int))
calgary_fishnet_final$dist_parks <-as.numeric(as.character(calgary_fishnet_final$dist_parks))
calgary_fishnet_final$steepslope_dist <-as.numeric(as.character(calgary_fishnet_final$steepslope_dist))

Significant Features

Mean Elevation

Elevation generally describes the topography of the land. Areas at lower levels of elevation are at higher risk for floods than areas at higher elevations. We explored calculating the lowest degree of elevation per fishnet grid cell in Calgary, but found that using the mean helped the performance of our model. Mean elevation is stored in the elevation_mean feature variable, measured in meters.

To calculate our elevation_mean feature variable, a Digital Elevation Model (DEM) was loaded into ArcGIS. Zonal Statistics were used to calculate the mean per fishnet grid cell.

map.elevation <- ggplot() +
  geom_sf(data = calgary_fishnet_final, aes(fill = elevation_mean), color=NA) +
  scale_fill_viridis()+
  labs(title = "Elevation \n(Mean)", fill="Elevation \n(meters)") + mapTheme()

Distance to Nearest Stream

Understanding the structure of streams helps inform the intensity of potential flooding. If the water level of the stream exceeds the stream channel, then flooding occurs. Areas that are closer to streams are more vulnerable to floods than areas that are farther away.

The hydrology tools in ArcGIS Pro are used to turn the Calgary DEM into a stream network. First, the Fill tool was used to fill any sinks. The resulting surface is input in the Flow Direction tool to generate a raster showing the direction of flow out of each pixel. Direction is used to calculate Flow Accumulation per pixel - this step is described further in the next feature section. Assuming a drainage threshold of 25 square kilometers, Set Null is used to calculate the number of pixels that represent a drainage of this area or more, assign these cells a value of 1 and all other pixels a value of NODATA. The resulting Calgary stream network is displayed below. Lastly, the Generate Near Table tool is used to calculate the distance of each fishnet grid cell to a stream, stored as the dist_stream feature variable, measured in meters.

#visualize streams
map.streams <- ggplot() +
  geom_sf(data=calgary_boundary, fill="grey") +
  #geom_sf(data=calgary_fishnet, alpha=0.2) +
  geom_sf(data=calgary_streams, color="blue") + labs(title="Stream Network in Calgary") + mapTheme()

#distance to streams
map.diststreams <- ggplot() +
  geom_sf(data = calgary_fishnet_final, aes(fill = dist_stream), color=NA) +
  scale_fill_viridis()+
  labs(title = "Distance to Nearest \nStream", fill="Distance \n(meters)") + mapTheme()

Maximum Flow Accumulation

Flooding occurs when water levels exceed the stream channel. Based on the direction of the streams calculated above, Flow Accumulation tool is used to calculate the accumulated weight of all pixels flowing into each downslope pixel in the raster. Zonal Statistics (maximum) is used to calculate the maximum flow accumulation per each fishnet grid cell. Using the maximum value allows us to evaluate at the highest risk level. In other words, the resulting model will present a “worst-case-scenario” picture enabling planners to be over-prepared rather than under-prepared. This feature variable is called flow_maxacc.

map.flowaccmax <- ggplot() +
  geom_sf(data = calgary_fishnet_final, aes(fill = flowacc_max), color=NA) +
  scale_fill_viridis(breaks = c(400000, 800000, 1200000), labels = c("400K", "800K", "1200K"))+
  labs(title = "Flow Accumulation \n(Maximum)", fill="Pixels \n") + mapTheme()

Distance to Nearest Steep Slope

Steep slopes in the land can contribute to both flow direction and flow accumulation. Additionally, a steeper slope likely accelerates the speed of flow and can intensify in the case of a flood. Thus, it is important to be aware of the location of steep slopes in the topography and their relative distance to past floods.

Slopes can be calculated using the Slope tool in ArcGIS. To distinguish which slopes are steep, Reclassify is used to assign slopes greater than or equal to 10 a value of 1 and all other slopes a value of NODATA. These steep slopes are converted from raster data to polygon features using the Raster to Polygon tool. Then, Near can be used once again to calculate the distance of each fishnet grid cell to a steep slope. this is imported into R as the dist_steepslope feature variable.

map.steepslopedist <- ggplot() +
  geom_sf(data = calgary_fishnet_final, aes(fill = steepslope_dist), color=NA) +
  scale_fill_viridis()+
  labs(title = "Distance to \nNearest Steep Slope", fill="Distance \n(meters)") + mapTheme()

Distance to Nearest Park

Land cover or soil type can be indicative of areas prone to flooding. The more permeable the surface is, the more risky it is for floods. Initially, land cover data for Calgary was used to identify which fishnet grid cells were impervious based on their land cover type; however, this did not help the model perform well. Instead, distance to parks is incorporated into the model. Parks are generally covered in grass which is fairly permeable. Thus, the closer a fishnet grid cell is to a park, the more likely it is to become inundated.

After bringing in the complete parks dataset in Calgary from the city’s open data portal, the Near tool in ArcGIS is used to perform this calculation for the distance to parks feature variable.

#distance to parks
map.distparks <- ggplot() +
  geom_sf(data = calgary_fishnet_final, aes(fill = dist_parks), color=NA) +
  scale_fill_viridis()+
  labs(title = "Distance to \nNearest Park", fill="Distance \n(meters)") + mapTheme()

Number of Hydrological Feature Intersections

In addition to mere distance to stream lines, the frequency of intersections with hydrological features could indicate damper soils and areas at greater risk of inundation. To calculate the number of intersections with existing hydrological features per fishnet grid cell, the lengths(st_intersects) functionality in R is used to create a count feature variable, n_hydro_int.

#number of hydrology feature intersections
map.hydroints <- ggplot() +
  geom_sf(data = calgary_fishnet_final, aes(fill = n_hydro_int), color=NA) +
  scale_fill_viridis()+
  labs(title = "Number of \nHydrology Intersections", fill="Number \n") + mapTheme()

Maps of All Significant Features included in the Model

grid.arrange(map.elevation, map.diststreams, map.flowaccmax, map.steepslopedist, map.distparks, map.hydroints,
             ncol=3,
             top = "")

Plotting Features per Inundation Outcome

These six selected significant features are shown below in the plots to show differences in across fishnet grid cells that flooded and those that did not, according to the classification from the 2013 inundation extent.

inundationPlotVariables <-
  calgary_fishnet_final %>% st_drop_geometry()

# plotting independent variables for elevation_mean and slope_max
calgary_fishnet_variables <- inundationPlotVariables %>%
  dplyr::select(inundation, elevation_mean, steepslope_dist, dist_stream, flowacc_max, n_hydro_int, dist_parks) %>%
  gather(key, value, elevation_mean:dist_parks)%>%
   mutate(value = ifelse(key == "flowacc_max", value/10, value)) %>%
  mutate(value=ifelse(key=="n_hydro_int", value*10000, value))


ggplot(calgary_fishnet_variables, aes(as.factor(inundation), value, fill=as.factor(inundation))) +
    geom_bar(stat = "identity")+
    facet_wrap(~key) +
    scale_y_continuous(labels = comma) +
  scale_fill_manual(values = c("darkgreen", "navy"),
                    labels = c("Not Flooded","Flooded"),
                    name = "")   + labs(x="Inundation: Flooding or Not?", y="Value", title= "Independent Variables used in the Model", caption="\nNumber of hydro intersections scaled up by 10,000 \nFlow Accumulation (max) scaled down by 10") + plotTheme()

Model Development & Testing on Calgary

Now that the features have been prepared, we can begin to build the predictive model. First, we split up our known data set into a training and a test set.

Iterative Development Process

A linear regression is performed on the training set to determine the correlation between the set of selected features for the training data and the inundation data for the training data. This process was repeated multiple times to determine which combination of engineered features performed the best. For instance, recall from the Distance to Parks section that an impervious surfaces feature had been engineered but ultimately was not included in the final model. Reviewing the results from multiple regressions helped us determine which combination of features would provide the best results. The results of the final model are displayed and discussed below.

set.seed(3456)
trainIndex <- createDataPartition(calgary_fishnet_final$inundation, p = .70,
                                  list = FALSE,
                                  times = 1)
calgaryTrain <- calgary_fishnet_final[ trainIndex,]
calgaryTest  <- calgary_fishnet_final[-trainIndex,]
inundationModel <- glm(inundation ~ ., family="binomial"(link="logit"),
                  data=(calgaryTrain) %>% as.data.frame() %>% dplyr::select(-ID_FISHNET, -geometry, -slope_max, -imperv_pixels_sum))
 print(inundationModel)

Results

Model Summary

The output displays various statistics that indicate the quality of the model. For instance, the p-value listed for each feature describes how significant the feature is to the outcome of whether or not a flood occurs. All of these features are statistically significant as their p-values are <0.05. Additionally, measures like the Pseudo-R^2 and the AIC score describe how closely related our features are to the outcome. This is the point at which features can be removed, added, or adjusted within the model to enhance performance.

summ(inundationModel)
Observations 12977
Dependent variable inundation
Type Generalized linear model
Family binomial
Link logit
χ²(6) 4779.80
Pseudo-R² (Cragg-Uhler) 0.64
Pseudo-R² (McFadden) 0.56
AIC 3812.46
BIC 3864.75
Est. S.E. z val. p
(Intercept) 37.84 1.89 20.06 0.00
steepslope_dist -0.00 0.00 -20.60 0.00
dist_parks -0.00 0.00 -2.95 0.00
flowacc_max 0.00 0.00 5.49 0.00
dist_stream -0.00 0.00 -2.50 0.01
elevation_mean -0.04 0.00 -20.64 0.00
n_hydro_int 3.14 0.10 32.37 0.00
Standard errors: MLE

Histogram of classProbs

Once the features have been finalized, we can run the regression model on our test data to further examine its accuracy. The results are displayed in the histogram below showing the frequency of each probability level that a flood will occur.

classProbs <- predict(inundationModel, calgaryTest, type="response")

hist((classProbs), main = paste("Histogram of classProbs"), col = "blue", xlab = "Inundation Probability") + plotTheme()

Distribution of Probabilities Visualization

The visualization below shows the distribution of outcomes for the model. It is clear that the model is better at predicting 0s (no inundation) than predicting 1s (inundation).

testProbs <- data.frame(obs = as.numeric(calgaryTest$inundation),
                        pred = classProbs,
                        ID_FISHNET = calgaryTest$ID_FISHNET)

ggplot(testProbs, aes(x = pred, fill=as.factor(obs))) + geom_density() +
  facet_grid(obs ~ .) + xlab("Probability") + geom_vline(xintercept = .38) +
  scale_fill_manual(values = c("darkgreen", "navy"),
                      labels = c("Not Flooded","Flooded"),
                                 name="") +
                      labs(title = "Distribution of Probabilities") + plotTheme()

Finding the Optimal Threshold

In order to best optimize this model for use, identifying the threshold that provides the greatest accuracy is important. This is done by running the iterateThresholds function and looking at the results for all thresholds the table of thresholds 0.01 through 0.99. The optimal threshold in this case is 0.38.

# iterateThresholds function
iterateThresholds <- function(data, group) {
   group <- enquo(group)
  x = .01
  all_prediction <- data.frame()
  while (x <= 1) {

  this_prediction <-
      testProbs %>%
      mutate(predOutcome = ifelse(pred > x, 1, 0)) %>%
         group_by(!!group) %>%
      dplyr::count(predOutcome, obs) %>%
      dplyr::summarize(sum_TN = sum(n[predOutcome==0 & obs==0]),
                sum_TP = sum(n[predOutcome==1 & obs==1]),
                sum_FN = sum(n[predOutcome==0 & obs==1]),
                sum_FP = sum(n[predOutcome==1 & obs==0]),
            total=sum(n)) %>%
    mutate(True_Positive = sum_TP / total,
         True_Negative = sum_TN / total,
         False_Negative = sum_FN / total,
         False_Positive = sum_FP / total,
         Accuracy = (sum_TP + sum_TN) / total, Threshold = x)

  all_prediction <- rbind(all_prediction, this_prediction)
  x <- x + .01
  }
return(all_prediction)
}
whichThreshold <- iterateThresholds(testProbs)

allThresholds<-kable(whichThreshold) %>%
  kable_styling(bootstrap_options = "striped", full_width = FALSE)%>%
  scroll_box(width = "100%", height = "500px")

allThresholds
sum_TN sum_TP sum_FN sum_FP total True_Positive True_Negative False_Negative False_Positive Accuracy Threshold
2786 580 12 2183 5561 0.1042978 0.5009890 0.0021579 0.3925553 0.6052868 0.01
3287 574 18 1682 5561 0.1032188 0.5910807 0.0032368 0.3024636 0.6942996 0.02
3576 570 22 1393 5561 0.1024996 0.6430498 0.0039561 0.2504945 0.7455494 0.03
3797 560 32 1172 5561 0.1007013 0.6827909 0.0057544 0.2107535 0.7834922 0.04
3971 556 36 998 5561 0.0999820 0.7140802 0.0064737 0.1794641 0.8140622 0.05
4080 555 37 889 5561 0.0998022 0.7336810 0.0066535 0.1598633 0.8334832 0.06
4176 552 40 793 5561 0.0992627 0.7509441 0.0071930 0.1426003 0.8502068 0.07
4245 547 45 724 5561 0.0983636 0.7633519 0.0080921 0.1301924 0.8617155 0.08
4308 543 49 661 5561 0.0976443 0.7746808 0.0088114 0.1188635 0.8723251 0.09
4368 533 59 601 5561 0.0958461 0.7854702 0.0106096 0.1080741 0.8813163 0.10
4418 529 63 551 5561 0.0951268 0.7944614 0.0113289 0.0990829 0.8895882 0.11
4486 521 71 483 5561 0.0936882 0.8066894 0.0127675 0.0868549 0.9003776 0.12
4530 513 79 439 5561 0.0922496 0.8146017 0.0142061 0.0789426 0.9068513 0.13
4572 507 85 397 5561 0.0911707 0.8221543 0.0152850 0.0713900 0.9133249 0.14
4608 501 91 361 5561 0.0900917 0.8286279 0.0163640 0.0649164 0.9187197 0.15
4644 500 92 325 5561 0.0899119 0.8351016 0.0165438 0.0584427 0.9250135 0.16
4667 494 98 302 5561 0.0888329 0.8392375 0.0176227 0.0543068 0.9280705 0.17
4695 490 102 274 5561 0.0881136 0.8442726 0.0183420 0.0492717 0.9323863 0.18
4719 488 104 250 5561 0.0877540 0.8485884 0.0187017 0.0449559 0.9363424 0.19
4733 484 108 236 5561 0.0870347 0.8511059 0.0194210 0.0424384 0.9381406 0.20
4744 482 110 225 5561 0.0866751 0.8530840 0.0197806 0.0404603 0.9397590 0.21
4759 479 113 210 5561 0.0861356 0.8557813 0.0203201 0.0377630 0.9419169 0.22
4769 477 115 200 5561 0.0857759 0.8575796 0.0206797 0.0359648 0.9433555 0.23
4779 470 122 190 5561 0.0845172 0.8593778 0.0219385 0.0341665 0.9438950 0.24
4790 468 124 179 5561 0.0841575 0.8613559 0.0222981 0.0321885 0.9455134 0.25
4803 467 125 166 5561 0.0839777 0.8636936 0.0224780 0.0298507 0.9476713 0.26
4815 463 129 154 5561 0.0832584 0.8658515 0.0231973 0.0276929 0.9491099 0.27
4824 461 131 145 5561 0.0828988 0.8674699 0.0235569 0.0260744 0.9503686 0.28
4831 454 138 138 5561 0.0816400 0.8687286 0.0248157 0.0248157 0.9503686 0.29
4839 454 138 130 5561 0.0816400 0.8701672 0.0248157 0.0233771 0.9518072 0.30
4848 450 142 121 5561 0.0809207 0.8717857 0.0255350 0.0217587 0.9527063 0.31
4854 446 146 115 5561 0.0802014 0.8728646 0.0262543 0.0206797 0.9530660 0.32
4858 442 150 111 5561 0.0794821 0.8735839 0.0269736 0.0199604 0.9530660 0.33
4862 437 155 107 5561 0.0785830 0.8743032 0.0278727 0.0192411 0.9528862 0.34
4864 433 159 105 5561 0.0778637 0.8746628 0.0285920 0.0188815 0.9525265 0.35
4866 429 163 103 5561 0.0771444 0.8750225 0.0293113 0.0185218 0.9521669 0.36
4869 426 166 100 5561 0.0766049 0.8755619 0.0298507 0.0179824 0.9521669 0.37
4873 425 167 96 5561 0.0764251 0.8762812 0.0300306 0.0172631 0.9527063 0.38
4876 421 171 93 5561 0.0757058 0.8768207 0.0307499 0.0167236 0.9525265 0.39
4876 420 172 93 5561 0.0755260 0.8768207 0.0309297 0.0167236 0.9523467 0.40
4876 417 175 93 5561 0.0749865 0.8768207 0.0314692 0.0167236 0.9518072 0.41
4878 413 179 91 5561 0.0742672 0.8771804 0.0321885 0.0163640 0.9514476 0.42
4878 410 182 91 5561 0.0737277 0.8771804 0.0327279 0.0163640 0.9509081 0.43
4879 405 187 90 5561 0.0728286 0.8773602 0.0336270 0.0161841 0.9501888 0.44
4881 401 191 88 5561 0.0721093 0.8777198 0.0343463 0.0158245 0.9498292 0.45
4883 398 194 86 5561 0.0715699 0.8780795 0.0348858 0.0154648 0.9496493 0.46
4886 390 202 83 5561 0.0701313 0.8786190 0.0363244 0.0149254 0.9487502 0.47
4890 382 210 79 5561 0.0686927 0.8793382 0.0377630 0.0142061 0.9480309 0.48
4892 377 215 77 5561 0.0677936 0.8796979 0.0386621 0.0138464 0.9474915 0.49
4893 372 220 76 5561 0.0668944 0.8798777 0.0395612 0.0136666 0.9467722 0.50
4896 368 224 73 5561 0.0661751 0.8804172 0.0402805 0.0131271 0.9465923 0.51
4897 365 227 72 5561 0.0656357 0.8805970 0.0408200 0.0129473 0.9462327 0.52
4898 355 237 71 5561 0.0638374 0.8807768 0.0426182 0.0127675 0.9446143 0.53
4899 348 244 70 5561 0.0625787 0.8809567 0.0438770 0.0125877 0.9435353 0.54
4903 341 251 66 5561 0.0613199 0.8816760 0.0451358 0.0118684 0.9429959 0.55
4904 337 255 65 5561 0.0606006 0.8818558 0.0458551 0.0116885 0.9424564 0.56
4906 333 259 63 5561 0.0598813 0.8822154 0.0465744 0.0113289 0.9420967 0.57
4906 331 261 63 5561 0.0595217 0.8822154 0.0469340 0.0113289 0.9417371 0.58
4909 329 263 60 5561 0.0591620 0.8827549 0.0472937 0.0107894 0.9419169 0.59
4913 326 266 56 5561 0.0586225 0.8834742 0.0478331 0.0100701 0.9420967 0.60
4917 321 271 52 5561 0.0577234 0.8841935 0.0487322 0.0093508 0.9419169 0.61
4917 314 278 52 5561 0.0564647 0.8841935 0.0499910 0.0093508 0.9406582 0.62
4917 311 281 52 5561 0.0559252 0.8841935 0.0505305 0.0093508 0.9401187 0.63
4918 306 286 51 5561 0.0550261 0.8843733 0.0514296 0.0091710 0.9393994 0.64
4920 301 291 49 5561 0.0541270 0.8847330 0.0523287 0.0088114 0.9388599 0.65
4920 299 293 49 5561 0.0537673 0.8847330 0.0526884 0.0088114 0.9385003 0.66
4923 291 301 46 5561 0.0523287 0.8852724 0.0541270 0.0082719 0.9376012 0.67
4923 289 303 46 5561 0.0519691 0.8852724 0.0544866 0.0082719 0.9372415 0.68
4924 287 305 45 5561 0.0516094 0.8854523 0.0548463 0.0080921 0.9370617 0.69
4924 285 307 45 5561 0.0512498 0.8854523 0.0552059 0.0080921 0.9367020 0.70
4924 282 310 45 5561 0.0507103 0.8854523 0.0557454 0.0080921 0.9361626 0.71
4927 279 313 42 5561 0.0501708 0.8859917 0.0562848 0.0075526 0.9361626 0.72
4931 271 321 38 5561 0.0487322 0.8867110 0.0577234 0.0068333 0.9354433 0.73
4932 268 324 37 5561 0.0481928 0.8868908 0.0582629 0.0066535 0.9350836 0.74
4933 262 330 36 5561 0.0471138 0.8870707 0.0593418 0.0064737 0.9341845 0.75
4934 257 335 35 5561 0.0462147 0.8872505 0.0602410 0.0062938 0.9334652 0.76
4935 253 339 34 5561 0.0454954 0.8874303 0.0609603 0.0061140 0.9329257 0.77
4936 250 342 33 5561 0.0449559 0.8876101 0.0614997 0.0059342 0.9325661 0.78
4937 248 344 32 5561 0.0445963 0.8877900 0.0618594 0.0057544 0.9323863 0.79
4939 244 348 30 5561 0.0438770 0.8881496 0.0625787 0.0053947 0.9320266 0.80
4939 235 357 30 5561 0.0422586 0.8881496 0.0641971 0.0053947 0.9304082 0.81
4940 230 362 29 5561 0.0413595 0.8883294 0.0650962 0.0052149 0.9296889 0.82
4942 225 367 27 5561 0.0404603 0.8886891 0.0659953 0.0048552 0.9291494 0.83
4942 221 371 27 5561 0.0397411 0.8886891 0.0667146 0.0048552 0.9284301 0.84
4942 213 379 27 5561 0.0383025 0.8886891 0.0681532 0.0048552 0.9269915 0.85
4942 210 382 27 5561 0.0377630 0.8886891 0.0686927 0.0048552 0.9264521 0.86
4943 201 391 26 5561 0.0361446 0.8888689 0.0703111 0.0046754 0.9250135 0.87
4943 195 397 26 5561 0.0350656 0.8888689 0.0713900 0.0046754 0.9239345 0.88
4945 184 408 24 5561 0.0330876 0.8892286 0.0733681 0.0043158 0.9223161 0.89
4945 174 418 24 5561 0.0312893 0.8892286 0.0751663 0.0043158 0.9205179 0.90
4946 164 428 23 5561 0.0294911 0.8894084 0.0769646 0.0041359 0.9188995 0.91
4947 160 432 22 5561 0.0287718 0.8895882 0.0776839 0.0039561 0.9183600 0.92
4947 153 439 22 5561 0.0275130 0.8895882 0.0789426 0.0039561 0.9171012 0.93
4948 137 455 21 5561 0.0246359 0.8897680 0.0818198 0.0037763 0.9144039 0.94
4951 121 471 18 5561 0.0217587 0.8903075 0.0846970 0.0032368 0.9120662 0.95
4956 110 482 13 5561 0.0197806 0.8912066 0.0866751 0.0023377 0.9109872 0.96
4961 98 494 8 5561 0.0176227 0.8921057 0.0888329 0.0014386 0.9097285 0.97
4962 82 510 7 5561 0.0147455 0.8922856 0.0917101 0.0012588 0.9070311 0.98
4963 72 520 6 5561 0.0129473 0.8924654 0.0935084 0.0010789 0.9054127 0.99
testProbs$predClass  = ifelse(testProbs$pred > .38 ,1,0)

xtab.regCalgary <- caret::confusionMatrix(reference = as.factor(testProbs$obs),
                       data = as.factor(testProbs$predClass),
                       positive = "1")

Confusion Matrix

There are four possible scenarios that will result from the use of this model: 1. Model predicts inundation and there is inundation (True Positive) 2. Model predicts no inundation and there is no inundation (True Negative) 3. Model predicts inundation and there is no inundation (False Positive) 4. Model predicts no inundation and there is inundation (False Negative)

The results of each scenario at the optimal threshold for accuracy (0.38) is displayed in the table below (with the confusion matrix of 0 and 1 directly below the table). These results support our above observation that the model predicts well for areas of no inundation than areas of inundation.

Type of Prediction Description Results
True Positive (TP) Model predicts inundation and there is inundation 425
True Negative (TN) Model predicts no inundation and there is no inundation 4873
False Positive (FP) Model predicts inundation and there is no inundation 167
False Negative (FN) Model predicts no inundation and there is inundation 96
as.matrix(xtab.regCalgary) %>% kable(caption = "Confusion Matrix") %>% kable_styling("striped", full_width = T, font_size = 14, position = "left")
Confusion Matrix
0 1
0 4873 167
1 96 425

Confusion Matrix - Statistics

Further statistics associated with the confusion matrix are displayed in the table below. Of particular note are the sensitivity (rate of true positives) and specificity (rate of true negatives). While the results indicate that the model performs very well for true negatives (a specificity of 98%), it also predicts the true positives fairly well (a sensitivity of ~72%).

Since it is better to be overprepared for a flooding event than underprepared, we concluded that the number of false positive predictions was not a concern, especially given the precarious and unexpected nature of extreme weather events during an era of climate change.

as.matrix(xtab.regCalgary, what="classes") %>% kable(caption = "Confusion Matrix - Statistics") %>% kable_styling(font_size = 14, full_width = T,
                bootstrap_options = c("striped", "hover"))
Confusion Matrix - Statistics
Sensitivity 0.7179054
Specificity 0.9806802
Pos Pred Value 0.8157390
Neg Pred Value 0.9668651
Precision 0.8157390
Recall 0.7179054
F1 0.7637017
Prevalence 0.1064557
Detection Rate 0.0764251
Detection Prevalence 0.0936882
Balanced Accuracy 0.8492928

Map of Predictions for Calgary Testing Set

To spatially investigate the distribution of these four predicted outcomes, the map here shows the four confusion matrix results associated with each fishnet grid cell in the testing set. We elected to only show the predicts for the testing set since the amount of observations was already at a large number and patterns are already discernible.

Note how there are few false positives throughout the city, indicating that there are few places for which the model did not accurately predict inundation. The true positives are concentrated around the area where we saw major hydrological features and the stream network.

test_predictions <- testProbs %>%
  mutate(TN = predClass==0 & obs==0,
         TP = predClass==1 & obs==1,
         FN =  predClass==0 & obs==1,
         FP = predClass==1 & obs==0)


test_predictions <- test_predictions %>%
  mutate(confResult=case_when(TN == TRUE ~ "True_Negative",
                              TP == TRUE ~ "True_Positive",
                              FN == TRUE ~ "False_Negative",
                              FP == TRUE ~ "False_Positive"))


#join with geometry for the map of prediction classes
cal_test_predictions_mapdata <- cbind(calgaryTest, test_predictions, by= "ID_FISHNET") %>% st_as_sf()


ggplot() +
    geom_sf(data=cal_test_predictions_mapdata, aes(fill=confResult), colour=NA)+
  scale_fill_discrete()+
  mapTheme() +
  labs(title="Map of Prediction Classes on Calgary Test")

ROC Curve

The Receiver Operating Characteristic (ROC) Curve for the model is shown below. This curve is a helpful goodness of fit indicator, while helping to visualize trade-offs between true positive and false positive metrics at each threshold from 0.01 to 1. A line going “over” the curve indicates a useful fit. The area under the curve (AUC) here is about 0.96, indicating a useful fit. A reasonable AUC is between 0.5 and 1.

ggplot(testProbs, aes(d = obs, m = pred)) +
  geom_roc(n.cuts = 50, labels = FALSE) +
  style_roc(theme = theme_grey) +
  geom_abline(slope = 1, intercept = 0, size = 1.5, color = 'grey') + labs(title="ROC Curve", subtitle="AUC ~ 0.96") + plotTheme()

auc_calgary <- auc(testProbs$obs, testProbs$pred)
print(auc_calgary)
## Area under the curve: 0.9554

Cross-Validation

While the ROC indicates a relatively good fit, cross-validation is key to ensure that the model generalizes well across contexts. A k-fold cross validation (CV) is performed here.

#convert inundation to factor
calgary_fishnet_final$inundation <-as.factor(as.character(calgary_fishnet_final$inundation))

ctrl <- trainControl(method = "cv",
                     number = 100,
                     savePredictions = TRUE)

cvFit_inundationModel <- train(as.factor(inundation) ~ ., data = calgary_fishnet_final %>%
                            as.data.frame() %>%
                            select(-ID_FISHNET, -geometry, -slope_max, -imperv_pixels_sum),
                          method="glm", family="binomial", trControl=ctrl)

cvFit_inundationModel
## Generalized Linear Model
##
## 18538 samples
##     6 predictor
##     2 classes: '0', '1'
##
## No pre-processing
## Resampling: Cross-Validated (100 fold)
## Summary of sample sizes: 18352, 18353, 18353, 18353, 18353, 18352, ...
## Resampling results:
##
##   Accuracy  Kappa
##   0.94741   0.6813391

Accuracy and Kappa

To evaluate the algorithm further, the accuracy an kappa metrics shed light on goodness of fit. Accuracy is the percentage of instances classified correctly out of the total. While this does not provide the nuance provided in the confusion matrix (showing the breakdown of accuracy across the four classes), it is useful in defending the model’s success at a high level.

Kappa (also known as Cohen’s Kappa) is normalized accuracy. This is useful here because we have an imbalance in the 0s and 1s for no inundation and inundation, respectively. While the imbalance is expected given that the previous flood extent showed that the majority of Calgary was not inundated, Kappa is useful in its comparison of observed accuracy to expected accuracy, taking into account random chance. It is still a relatively high value with a mean of over 68%. According to best practices in statistics, a kappa value of 0.68 falls within the range of “substantial agreement.”

dplyr::select(cvFit_inundationModel$resample, -Resample) %>%
  gather(metric, value) %>%
  left_join(gather(cvFit_inundationModel$results[2:4], metric, mean)) %>%
  ggplot(aes(value)) +
    geom_histogram(bins=35, fill = "navy") +
    facet_wrap(~metric) +
    geom_vline(aes(xintercept = mean), colour = "red", linetype = 3, size = 1.5) +
    scale_x_continuous(limits = c(0, 1)) +
    labs(x="Goodness of Fit", y="Count", title="Accuracy and Kappa",
         subtitle = "Across-fold mean represented as dotted lines") +plotTheme()

Mapping Calgary Predictions

The final predictions for inundation across Calgary are mapped for each fishnet grid cell. We elected to display the predictions on a probabilities scale to achieve a more continuous surface.

allPredictions <-
  predict(cvFit_inundationModel, calgary_fishnet_final, type="prob")[,2]

calgary_fishnet_final_preds <- calgary_fishnet_final %>%
  cbind(calgary_fishnet_final,allPredictions) %>%
  mutate(allPredictions = round(allPredictions * 100)) 
ggplot() +
    geom_sf(data=calgary_fishnet_final_preds, aes(fill=(allPredictions)), colour=NA) +
  scale_fill_viridis(name = "Probability")+
  mapTheme() +
  labs(title="Predictions for Inundation in Calgary")

This second map shows the predicted probabilities, but with the 2013 flood extent overlay. This helps to compare the predicted inundation outcomes with the actual data used to train the model in the first place.

ggplot() +
    geom_sf(data=calgary_fishnet_final_preds, aes(fill=(allPredictions)), colour=NA) +
     scale_fill_viridis(name = "Probability")+
  geom_sf(data = calgary_fishnet_final_preds %>%
          filter(inundation=="1"),
          aes(), color="transparent", fill="red", alpha=0.5)+
  mapTheme() +
  labs(title="Predictions for Inundation in Calgary", subtitle = "2013 flood extent in red overlay")

Turning to Denver: How Does This Model Perform in Another City?

Now that we have confirmed the model’s high quality performance in Calgary, let’s see if is generalizable to other cities. Denver, Colorado has a similar topography, infrastructure, and population size to Calgary. We perform the same feature engineering for the selected features of the model and run the regression for the City of Denver.

den_boundary <- st_read("Data_Denver/county_boundary/county_boundary.shp") # need full shp package

# DENVER: Load Calgary fishnet generated in ArcGIS Pro - NEED IT AS A SHAPEFILE
den_fishnet <- st_read("Data_Denver/den_fishnet/den_fishnet.shp") %>% st_as_sf() %>% st_transform(st_crs(den_boundary))

den_fishnet$ID_FISHNET <-as.character(den_fishnet$ID_FISHNET)

# Denver: Elevation mean (calculated using zonal statistics (mean)).
den_elev <- st_read("Data_Denver/den_elev_mean.xls") %>%
  dplyr::select(FID, MEAN) %>%
  dplyr::rename(elevation_mean = MEAN) %>%
  dplyr::rename(ID_FISHNET = FID)

den_elev$ID_FISHNET <-as.character(den_elev$ID_FISHNET)

den_elev_grid <- full_join(den_elev, den_fishnet) %>% st_as_sf()
den_fishnet_new <- den_elev_grid

# DENVER: Streams - distance
den_streams_neardist <- st_read("Data_Denver/den_dist_streams2.xls")  %>%
  dplyr::select(-OBJECTID, -NEAR_FID) %>%
  dplyr::rename(ID_FISHNET = IN_FID) %>%
  dplyr::rename(dist_stream = NEAR_DIST)

den_streams_neardist$ID_FISHNET <-as.character(den_streams_neardist$ID_FISHNET)
den_streams_neardist_grid <- full_join(den_streams_neardist, den_fishnet_new) %>% st_as_sf()
den_fishnet_new <-  den_streams_neardist_grid

# DENVER: flow accumulation max per fishnet cell. Generated using zonal statistics (maximum).
den_flowacc <- st_read("Data_Denver/den_fac_max.xls") %>%
  dplyr::rename(flowacc_max = MAX) %>%
  dplyr::select(-COUNT, -AREA) %>%
  dplyr::rename(ID_FISHNET = FID)

den_flowacc$ID_FISHNET <-as.character(den_flowacc$ID_FISHNET)
den_flowacc_grid <- full_join(den_flowacc, den_fishnet_new) %>%st_as_sf()
den_fishnet_new <- den_flowacc_grid

# DENVER: Parks distance - from open data calgary. Used arcgis pro to create a near distance table.
den_parks_neardist <- st_read("Data_Denver/den_dist_parks2.xls")  %>%
  dplyr::select(-OBJECTID, -NEAR_FID) %>%
  dplyr::rename(ID_FISHNET = IN_FID) %>%
  dplyr::rename(dist_parks = NEAR_DIST)

den_parks_neardist$ID_FISHNET <-as.character(den_parks_neardist$ID_FISHNET)
den_parks_neardist_grid <- full_join(den_parks_neardist, den_fishnet_new) %>% st_as_sf()
den_fishnet_new <-  den_parks_neardist_grid

# DENVER:  hydro shapefile import
den_hydro <- st_read("Data_Denver/den_water/den_water.shp") %>%
  st_transform(st_crs(den_fishnet_new)) %>% st_as_sf()

# DENVER: hydro features intersections
den_fishnet_new <- den_fishnet_new %>% mutate(n_hydro_int = lengths(st_intersects(den_fishnet_new, den_hydro)))

# DENVER: Distance to steep slopes per fishnet cell. Generated using near table, near distance.
den_steepslope_dist <- st_read("Data_Denver/den_dist_slope2.xls")  %>%
  dplyr::select(-OBJECTID, -NEAR_FID) %>%
  dplyr::rename(ID_FISHNET = IN_FID) %>%
  dplyr::rename(steepslope_dist = NEAR_DIST)

den_steepslope_dist$ID_FISHNET <-as.character(den_steepslope_dist$ID_FISHNET)
den_steepslope_dist_grid <- full_join(den_steepslope_dist, den_fishnet_new) %>%st_as_sf()
den_fishnet_new <- den_steepslope_dist_grid

# Remove unnecessary columns
den_fishnet_new <- den_fishnet_new %>%
    dplyr::select(-Id, -OBJECTID) 
den_selectCentroids <-
  st_centroid(den_fishnet_new)[den_boundary,] %>%
  st_drop_geometry() %>%
  left_join(dplyr::select(den_fishnet_new, ID_FISHNET)) %>%
  st_sf()

den_fishnet_final <- den_selectCentroids
allPredictions_den <-
  predict(inundationModel, den_fishnet_final, type="response")


den_fishnet_final_preds <- den_fishnet_final %>%
  cbind(den_fishnet_final, allPredictions_den) %>%
  mutate(allPredictions_den = round(allPredictions_den*100)) %>% st_as_sf()
ggplot() +
    geom_sf(data=den_fishnet_final_preds, aes(fill=(allPredictions_den)), colour=NA) +
  scale_fill_viridis(name = "Probability")+
  mapTheme() +
  labs(title="Predictions for Inundation in Denver")

Conclusion

The results of this model are crucial to informing planners and allowing a city to be prepared for flood disasters. Successfully leveraging the knowledge and insight gained can save lives, cities, homes, and money. There are many implementation challenges when it comes to appropriately executing emergency preparedness programs such as funding, political and community support, and collaboration across departments. However, we are confident that the story being told through these data visualizations can be an effective tool in convincing relevant decision-makers.