Motivation

According to the World Health Organization, floods are the most frequent natural disaster affecting countries across the globe. Floods have the potential to leave communities devastated not only because of the direct loss of life due to drowning and infrastructure damage, but also because of indirect impacts such as increased transmission of disease, higher risk of injury and hypothermia, disrupted or disabled infrastructure systems, and increased likelihood of causing other natural disasters. In the past ten years, climate change has exacerbated the effects of natural disasters like flooding, drought, sea level rise, and extreme precipitation and their frequency and intensity are expected to continue to rise if left unchecked.

City planners have the utmost responsibility to ensure their cities are prepared to respond to such natural disasters and minimize harm to their most vulnerable communities. The following document outlines a machine learning algorithm that predicts which areas of a city are at the highest risk of flooding disasters. An Emergency Management team can use the resulting model of this algorithm to inform their preparedness tactics to minimize damage and respond to future flooding disasters more efficiently. Ultimately, the algorithm is a helpful tool for individuals in Public Works, Public Health, City Planning, and Community-Based Organizations to proactively plan for resilience.

The model is created using past flood data from the City of Calgary in Canada to measure accuracy and then is applied to the City of Denver in the United States to measure generalizability. This document explains each step of the process such that City Planning departments can input their city’s data and accurately interpret the resulting model. We also wanted to provide transparency into the feature engineering steps of the process, which is why the procedures in ArcGIS and R are described in detail.

For audiences who would like the main takeaways, we invite you to view this three-page summary document.

Setup

The algorithm was created using ArcGIS Pro and R. First, the necessary R libraries are loaded and themes are defined for the maps and plots that will be displayed later in the process.

# SETUP

# load libraries
library(caret)
library(pscl)
library(plotROC)
library(pROC)
library(sf)
library(tidyverse)
library(knitr)
library(kableExtra)
library(FNN)
library(scales)
library(jtools)
library(viridis)
library(gridExtra)


# load themes and palettes
mapTheme <- function(base_size = 12) {
  theme(
    text = element_text( color = "black"),
    plot.title = element_text(size = 14,colour = "black"),
    plot.subtitle=element_text(face="italic"),
    plot.caption=element_text(hjust=0),
    axis.ticks = element_blank(),
    panel.background = element_blank(),axis.title = element_blank(),
    axis.text = element_blank(),
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_rect(colour = "black", fill=NA, size=2),
    strip.text.x = element_text(size = 14))
}

plotTheme <- function(base_size = 12) {
  theme(
    text = element_text( color = "black"),
    plot.title = element_text(size = 14,colour = "black"),
    plot.subtitle = element_text(face="italic"),
    plot.caption = element_text(hjust=0),
    axis.ticks = element_blank(),
    panel.background = element_blank(),
    panel.grid.major = element_line("grey80", size = 0.1),
    panel.grid.minor = element_blank(),
    panel.border = element_rect(colour = "black", fill=NA, size=2),
    strip.background = element_rect(fill = "grey80", color = "white"),
    strip.text = element_text(size=12),
    axis.title = element_text(size=12),
    axis.text = element_text(size=10),
    plot.background = element_blank(),
    legend.background = element_blank(),
    legend.title = element_text(colour = "black", face = "italic"),
    legend.text = element_text(colour = "black", face = "italic"),
    strip.text.x = element_text(size = 14)
  )
}

Calgary Data

Initial Data

The fishnet was generated in ArcGIS Pro using the Create Fishnet tool in order to generate a grid of the appropriate extent with 200 meter by 200 meter cells as the unit of measurement. This cell size was determined using the overall extent of the city boundary and the size of the resulting dataset. Cell size will vary depending on the city of focus and the computing power of the machine being used.

The inundation raster data was also generated in ArcGIS Pro using the Reclassify tool to indicate pixels within the flooding extent and pixels outside of the flooding extent. Inundated pixels are assigned a value of 1 and not inundated pixels are assigned a value of 0. Zonal Statistics were then used to calculate the maximum level of flooding per fishnet grid cell in order to visualize and identify which grid cells are at the highest risk of flooding.

# CALGARY: Load Calgary fishnet generated in ArcGIS Pro (200x200m cells).
calgary_fishnet <- st_read("Data_Calgary/Calgary_new/for_R/calgary_fishnet.shp") %>%
st_as_sf() %>% st_transform('EPSG:3780') %>% rename(ID_FISHNET = ID_fishnet)

calgary_fishnet$ID_FISHNET <-as.character(calgary_fishnet$ID_FISHNET)

calgary_boundary <- st_read("provided_midTermProject_Data/CALGIS_CITYBOUND_LIMIT/CALGIS_CITYBOUND_LIMIT.shp")
calgary_boundary <- calgary_boundary %>% st_transform('EPSG:3780')

#ggplot() +
#  geom_sf(data=calgary_boundary, fill="grey") +
#  geom_sf(data=calgary_fishnet, alpha=0.2) + mapTheme()

Feature Engineering

Next, data for features that contribute to flooding and flood risk is loaded into ArcGIS Pro for calculations. The following sections describe this process in brief and explains why they are included in the model or not. Once the calculations are complete, the resulting data tables are exported from ArcGIS Pro and loaded into R to inform the model. We also used a select-by-centroids approach to select fishnet grid cells that have centroids situated within the 2013 flood extent.

Knowledge about the characteristics of the land such as topography, land cover, and hydrology can indicate how susceptible the area is to floods.

# CALGARY: Inundation classification (generated by reclassifying the initial inundation raster, and calculated using zonal statistics (maximum)).
calgary_inundation <- st_read("Data_Calgary/Calgary_new/for_R/cal_inun_max.xls")  %>%
  dplyr::select(ID_FISHNET, MAX) %>%
  #dplyr::rename(ID_fishnet = FID) %>%
  dplyr::rename(inundation = MAX)

calgary_inundation$ID_FISHNET <-as.character(calgary_inundation$ID_FISHNET)

calgary_inundation_grid <- left_join(calgary_inundation, calgary_fishnet)
calgary_fishnet_new <-  calgary_inundation_grid

# CALGARY: Elevation mean (calculated using zonal statistics (mean)).
calgary_elev <- st_read("Data_Calgary/Calgary_new/for_R/cal_elev_mean.xls") %>%
  dplyr::select(ID_FISHNET, MEAN) %>%
  dplyr::rename(elevation_mean = MEAN)

calgary_elev$ID_FISHNET <-as.character(calgary_elev$ID_FISHNET)

calgary_elev_grid <- full_join(calgary_elev, calgary_fishnet_new) %>% st_as_sf()
calgary_fishnet_new <- calgary_elev_grid

# CALGARY: Stream network, generated using the ArcHydro workflow from the week4 lab in ArcGIS Pro.
calgary_streams <- st_read("Data_Calgary/Calgary_new/for_R/Hydrology/cal_stream.shp")
calgary_streams_neardist <- st_read("Data_Calgary/Calgary_new/for_R/cal_stream_neardist.xls")  %>%
  dplyr::select(-Rowid, -OBJECTID, -NEAR_FID) %>%
  dplyr::rename(ID_FISHNET = IN_FID) %>%
  dplyr::rename(dist_stream = NEAR_DIST)

## used the generate near table in ArcGIS to calculate from fishnet to stream
calgary_streams_neardist$ID_FISHNET <-as.character(calgary_streams_neardist$ID_FISHNET)

calgary_streamneardist_grid <- full_join(calgary_streams_neardist, calgary_fishnet_new) %>% st_as_sf()
calgary_fishnet_new <-  calgary_streamneardist_grid

# CALGARY: flow accumulation max per fishnet cell. Generated using zonal statistics (maximum).
calgary_flowacc <- st_read("Data_Calgary/Calgary_new/for_R/cal_flowacc_max.xls") %>%
  dplyr::rename(flowacc_max = MAX) %>%
  dplyr::select(-Rowid, -COUNT, -AREA, -ZONE.CODE)

calgary_flowacc$ID_FISHNET <-as.character(calgary_flowacc$ID_FISHNET)

calgary_flowacc_grid <- full_join(calgary_flowacc, calgary_fishnet_new) %>%st_as_sf()
calgary_fishnet_new <- calgary_flowacc_grid


# CALGARY: Parks distance - from open data calgary. Used arcgis pro to create a near distance table.
calgary_parks_neardist <- st_read("Data_Calgary/Calgary_new/for_R/cal_parksfeat_near.xls")  %>%
  dplyr::select(-Rowid, -OBJECTID, -NEAR_FID) %>%
  dplyr::rename(ID_FISHNET = IN_FID) %>%
  dplyr::rename(dist_parks = NEAR_DIST)

calgary_parks_neardist$ID_FISHNET <-as.character(calgary_parks_neardist$ID_FISHNET)

calgary_parks_neardist_grid <- full_join(calgary_parks_neardist, calgary_fishnet_new) %>% st_as_sf()
calgary_fishnet_new <-  calgary_parks_neardist_grid


# CALGARY: Slope max per fishnet cell. Generated using zonal statistics (max). Ended up not using this in the model, but still explored in feature engineering.

calgary_slope <- st_read("Data_Calgary/Calgary_new/for_R/cal_slope_max.xls") %>%
  dplyr::rename(slope_max = MAX) %>%
  dplyr::select(-Rowid, -COUNT, -AREA, -ZONE.CODE)

calgary_slope$ID_FISHNET <-as.character(calgary_slope$ID_FISHNET)

calgary_slope_grid <- full_join(calgary_slope, calgary_fishnet_new) %>% st_as_sf()
calgary_fishnet_new <- calgary_slope_grid

# CALGARY: Distance to steep slopes per fishnet cell. Generated using near table, near distance.
calgary_steepslope_dist <- st_read("Data_Calgary/Calgary_new/for_R/cal_steepslope_dist.xls")  %>%
  dplyr::select(-Rowid, -OBJECTID, -NEAR_FID) %>%
  dplyr::rename(ID_FISHNET = IN_FID) %>%
  dplyr::rename(steepslope_dist = NEAR_DIST)

calgary_steepslope_dist$ID_FISHNET <-as.character(calgary_steepslope_dist$ID_FISHNET)

calgary_steepslope_dist_grid <- full_join(calgary_steepslope_dist, calgary_fishnet_new) %>%st_as_sf()
calgary_fishnet_new <- calgary_steepslope_dist_grid

# CALGARY: Intersections with hydro features
#shapefile import
calgary_hydro <- st_read("Data_Calgary/Calgary_new/for_R/Hydrology/hydrology_features.shp") %>%
  st_transform(st_crs(calgary_fishnet_new))

#visualize hydrology features - more extensive than what the stream network from ArcGIS hydrology workflow. so it is streams + major hydrology features like lakes and ponds, etc.
#ggplot() +
#  geom_sf(data=calgary_boundary, fill="grey") +
#  #geom_sf(data=calgary_fishnet, alpha=0.2) +
#  geom_sf(data=calgary_hydro, color="blue") +mapTheme()

calgary_fishnet_new <- calgary_fishnet_new %>% mutate(n_hydro_int = lengths(st_intersects(calgary_fishnet_new, calgary_hydro)))

# CALGARY: majority land cover type per fishnet cell. Reclassified the data by land cover impervious and not impervious, converted it to a raster, and used zonal statistics (sum) for pixels. Not used in the model ultimately, but still explored in feature engineering process - worthwhile to discuss.

calgary_impervious_sum <- st_read("Data_Calgary/Calgary_new/for_R/cal_lcimp_sumt.xls") %>%
  dplyr::rename(imperv_pixels_sum = SUM) %>%
  dplyr::select(-Rowid, -COUNT, -AREA, -ZONE.CODE)

calgary_impervious_sum$ID_FISHNET <-as.character(calgary_impervious_sum$ID_FISHNET)
calgary_impervious_sum_grid <- full_join(calgary_impervious_sum, calgary_fishnet_new) %>% st_as_sf()
calgary_impervious_sum_grid[is.na(calgary_impervious_sum_grid)] = 0
calgary_fishnet_new <- calgary_impervious_sum_grid

# Centroid-in-polygon join to see which cells have their centroid in the calgary boundary
calgary_selectCentroids <-
  st_centroid(calgary_fishnet_new)[calgary_boundary,] %>%
  st_drop_geometry() %>%
  left_join(dplyr::select(calgary_fishnet_new, ID_FISHNET)) %>%
  st_sf()

calgary_fishnet_final <- calgary_selectCentroids

calgary_fishnet_final <- calgary_fishnet_final %>%
  dplyr::select(-Shape_Leng, -Shape_Area)

calgary_fishnet_final$inundation <-as.numeric(as.character(calgary_fishnet_final$inundation))
calgary_fishnet_final$slope_max <-as.numeric(as.character(calgary_fishnet_final$slope_max))
calgary_fishnet_final$flowacc_max <-as.numeric(as.character(calgary_fishnet_final$flowacc_max))
calgary_fishnet_final$elevation_mean <-as.numeric(as.character(calgary_fishnet_final$elevation_mean))
calgary_fishnet_final$imperv_pixels_sum <-as.numeric(as.character(calgary_fishnet_final$imperv_pixels_sum))
calgary_fishnet_final$dist_stream <-as.numeric(as.character(calgary_fishnet_final$dist_stream))
calgary_fishnet_final$n_hydro_int <-as.numeric(as.character(calgary_fishnet_final$n_hydro_int))
calgary_fishnet_final$dist_parks <-as.numeric(as.character(calgary_fishnet_final$dist_parks))
calgary_fishnet_final$steepslope_dist <-as.numeric(as.character(calgary_fishnet_final$steepslope_dist))

Significant Features

Mean Elevation

Elevation generally describes the topography of the land. Areas at lower levels of elevation are at higher risk for floods than areas at higher elevations. We explored calculating the lowest degree of elevation per fishnet grid cell in Calgary, but found that using the mean helped the performance of our model. Mean elevation is stored in the elevation_mean feature variable, measured in meters.

To calculate our elevation_mean feature variable, a Digital Elevation Model (DEM) was loaded into ArcGIS. Zonal Statistics were used to calculate the mean per fishnet grid cell.

map.elevation <- ggplot() +
  geom_sf(data = calgary_fishnet_final, aes(fill = elevation_mean), color=NA) +
  scale_fill_viridis()+
  labs(title = "Elevation \n(Mean)", fill="Elevation \n(meters)") + mapTheme()

Distance to Nearest Stream

Understanding the structure of streams helps inform the intensity of potential flooding. If the water level of the stream exceeds the stream channel, then flooding occurs. Areas that are closer to streams are more vulnerable to floods than areas that are farther away.

The hydrology tools in ArcGIS Pro are used to turn the Calgary DEM into a stream network. First, the Fill tool was used to fill any sinks. The resulting surface is input in the Flow Direction tool to generate a raster showing the direction of flow out of each pixel. Direction is used to calculate Flow Accumulation per pixel - this step is described further in the next feature section. Assuming a drainage threshold of 25 square kilometers, Set Null is used to calculate the number of pixels that represent a drainage of this area or more, assign these cells a value of 1 and all other pixels a value of NODATA. The resulting Calgary stream network is displayed below. Lastly, the Generate Near Table tool is used to calculate the distance of each fishnet grid cell to a stream, stored as the dist_stream feature variable, measured in meters.

#visualize streams
map.streams <- ggplot() +
  geom_sf(data=calgary_boundary, fill="grey") +
  #geom_sf(data=calgary_fishnet, alpha=0.2) +
  geom_sf(data=calgary_streams, color="blue") + labs(title="Stream Network in Calgary") + mapTheme()

#distance to streams
map.diststreams <- ggplot() +
  geom_sf(data = calgary_fishnet_final, aes(fill = dist_stream), color=NA) +
  scale_fill_viridis()+
  labs(title = "Distance to Nearest \nStream", fill="Distance \n(meters)") + mapTheme()

Maximum Flow Accumulation

Flooding occurs when water levels exceed the stream channel. Based on the direction of the streams calculated above, Flow Accumulation tool is used to calculate the accumulated weight of all pixels flowing into each downslope pixel in the raster. Zonal Statistics (maximum) is used to calculate the maximum flow accumulation per each fishnet grid cell. Using the maximum value allows us to evaluate at the highest risk level. In other words, the resulting model will present a “worst-case-scenario” picture enabling planners to be over-prepared rather than under-prepared. This feature variable is called flow_maxacc.

map.flowaccmax <- ggplot() +
  geom_sf(data = calgary_fishnet_final, aes(fill = flowacc_max), color=NA) +
  scale_fill_viridis(breaks = c(400000, 800000, 1200000), labels = c("400K", "800K", "1200K"))+
  labs(title = "Flow Accumulation \n(Maximum)", fill="Pixels \n") + mapTheme()

Distance to Nearest Steep Slope

Steep slopes in the land can contribute to both flow direction and flow accumulation. Additionally, a steeper slope likely accelerates the speed of flow and can intensify in the case of a flood. Thus, it is important to be aware of the location of steep slopes in the topography and their relative distance to past floods.

Slopes can be calculated using the Slope tool in ArcGIS. To distinguish which slopes are steep, Reclassify is used to assign slopes greater than or equal to 10 a value of 1 and all other slopes a value of NODATA. These steep slopes are converted from raster data to polygon features using the Raster to Polygon tool. Then, Near can be used once again to calculate the distance of each fishnet grid cell to a steep slope. this is imported into R as the dist_steepslope feature variable.

map.steepslopedist <- ggplot() +
  geom_sf(data = calgary_fishnet_final, aes(fill = steepslope_dist), color=NA) +
  scale_fill_viridis()+
  labs(title = "Distance to \nNearest Steep Slope", fill="Distance \n(meters)") + mapTheme()

Distance to Nearest Park

Land cover or soil type can be indicative of areas prone to flooding. The more permeable the surface is, the more risky it is for floods. Initially, land cover data for Calgary was used to identify which fishnet grid cells were impervious based on their land cover type; however, this did not help the model perform well. Instead, distance to parks is incorporated into the model. Parks are generally covered in grass which is fairly permeable. Thus, the closer a fishnet grid cell is to a park, the more likely it is to become inundated.

After bringing in the complete parks dataset in Calgary from the city’s open data portal, the Near tool in ArcGIS is used to perform this calculation for the distance to parks feature variable.

#distance to parks
map.distparks <- ggplot() +
  geom_sf(data = calgary_fishnet_final, aes(fill = dist_parks), color=NA) +
  scale_fill_viridis()+
  labs(title = "Distance to \nNearest Park", fill="Distance \n(meters)") + mapTheme()

Number of Hydrological Feature Intersections

In addition to mere distance to stream lines, the frequency of intersections with hydrological features could indicate damper soils and areas at greater risk of inundation. To calculate the number of intersections with existing hydrological features per fishnet grid cell, the lengths(st_intersects) functionality in R is used to create a count feature variable, n_hydro_int.

#number of hydrology feature intersections
map.hydroints <- ggplot() +
  geom_sf(data = calgary_fishnet_final, aes(fill = n_hydro_int), color=NA) +
  scale_fill_viridis()+
  labs(title = "Number of \nHydrology Intersections", fill="Number \n") + mapTheme()

Maps of All Significant Features included in the Model

grid.arrange(map.elevation, map.diststreams, map.flowaccmax, map.steepslopedist, map.distparks, map.hydroints,
             ncol=3,
             top = "")

Plotting Features per Inundation Outcome

These six selected significant features are shown below in the plots to show differences in across fishnet grid cells that flooded and those that did not, according to the classification from the 2013 inundation extent.

inundationPlotVariables <-
  calgary_fishnet_final %>% st_drop_geometry()

# plotting independent variables for elevation_mean and slope_max
calgary_fishnet_variables <- inundationPlotVariables %>%
  dplyr::select(inundation, elevation_mean, steepslope_dist, dist_stream, flowacc_max, n_hydro_int, dist_parks) %>%
  gather(key, value, elevation_mean:dist_parks)%>%
   mutate(value = ifelse(key == "flowacc_max", value/10, value)) %>%
  mutate(value=ifelse(key=="n_hydro_int", value*10000, value))


ggplot(calgary_fishnet_variables, aes(as.factor(inundation), value, fill=as.factor(inundation))) +
    geom_bar(stat = "identity")+
    facet_wrap(~key) +
    scale_y_continuous(labels = comma) +
  scale_fill_manual(values = c("darkgreen", "navy"),
                    labels = c("Not Flooded","Flooded"),
                    name = "")   + labs(x="Inundation: Flooding or Not?", y="Value", title= "Independent Variables used in the Model", caption="\nNumber of hydro intersections scaled up by 10,000 \nFlow Accumulation (max) scaled down by 10") + plotTheme()

Model Development & Testing on Calgary

Now that the features have been prepared, we can begin to build the predictive model. First, we split up our known data set into a training and a test set.

Iterative Development Process

A linear regression is performed on the training set to determine the correlation between the set of selected features for the training data and the inundation data for the training data. This process was repeated multiple times to determine which combination of engineered features performed the best. For instance, recall from the Distance to Parks section that an impervious surfaces feature had been engineered but ultimately was not included in the final model. Reviewing the results from multiple regressions helped us determine which combination of features would provide the best results. The results of the final model are displayed and discussed below.

set.seed(3456)
trainIndex <- createDataPartition(calgary_fishnet_final$inundation, p = .70,
                                  list = FALSE,
                                  times = 1)
calgaryTrain <- calgary_fishnet_final[ trainIndex,]
calgaryTest  <- calgary_fishnet_final[-trainIndex,]

inundationModel <- glm(inundation ~ ., family="binomial"(link="logit"),
                  data=(calgaryTrain) %>% as.data.frame() %>% dplyr::select(-ID_FISHNET, -geometry, -slope_max, -imperv_pixels_sum))
 print(inundationModel)

Results

Model Summary

The output displays various statistics that indicate the quality of the model. For instance, the p-value listed for each feature describes how significant the feature is to the outcome of whether or not a flood occurs. All of these features are statistically significant as their p-values are <0.05. Additionally, measures like the Pseudo-R^2 and the AIC score describe how closely related our features are to the outcome. This is the point at which features can be removed, added, or adjusted within the model to enhance performance.

summ(inundationModel)

Observations	12977
Dependent variable	inundation
Type	Generalized linear model
Family	binomial
Link	logit

χ²(6)	4779.80
Pseudo-R² (Cragg-Uhler)	0.64
Pseudo-R² (McFadden)	0.56
AIC	3812.46
BIC	3864.75

	Est.	S.E.	z val.	p
(Intercept)	37.84	1.89	20.06	0.00
steepslope_dist	-0.00	0.00	-20.60	0.00
dist_parks	-0.00	0.00	-2.95	0.00
flowacc_max	0.00	0.00	5.49	0.00
dist_stream	-0.00	0.00	-2.50	0.01
elevation_mean	-0.04	0.00	-20.64	0.00
n_hydro_int	3.14	0.10	32.37	0.00
Standard errors: MLE

Histogram of classProbs

Once the features have been finalized, we can run the regression model on our test data to further examine its accuracy. The results are displayed in the histogram below showing the frequency of each probability level that a flood will occur.

classProbs <- predict(inundationModel, calgaryTest, type="response")

hist((classProbs), main = paste("Histogram of classProbs"), col = "blue", xlab = "Inundation Probability") + plotTheme()

Distribution of Probabilities Visualization

The visualization below shows the distribution of outcomes for the model. It is clear that the model is better at predicting 0s (no inundation) than predicting 1s (inundation).

testProbs <- data.frame(obs = as.numeric(calgaryTest$inundation),
                        pred = classProbs,
                        ID_FISHNET = calgaryTest$ID_FISHNET)

ggplot(testProbs, aes(x = pred, fill=as.factor(obs))) + geom_density() +
  facet_grid(obs ~ .) + xlab("Probability") + geom_vline(xintercept = .38) +
  scale_fill_manual(values = c("darkgreen", "navy"),
                      labels = c("Not Flooded","Flooded"),
                                 name="") +
                      labs(title = "Distribution of Probabilities") + plotTheme()

Finding the Optimal Threshold

In order to best optimize this model for use, identifying the threshold that provides the greatest accuracy is important. This is done by running the iterateThresholds function and looking at the results for all thresholds the table of thresholds 0.01 through 0.99. The optimal threshold in this case is 0.38.

# iterateThresholds function
iterateThresholds <- function(data, group) {
   group <- enquo(group)
  x = .01
  all_prediction <- data.frame()
  while (x <= 1) {

  this_prediction <-
      testProbs %>%
      mutate(predOutcome = ifelse(pred > x, 1, 0)) %>%
         group_by(!!group) %>%
      dplyr::count(predOutcome, obs) %>%
      dplyr::summarize(sum_TN = sum(n[predOutcome==0 & obs==0]),
                sum_TP = sum(n[predOutcome==1 & obs==1]),
                sum_FN = sum(n[predOutcome==0 & obs==1]),
                sum_FP = sum(n[predOutcome==1 & obs==0]),
            total=sum(n)) %>%
    mutate(True_Positive = sum_TP / total,
         True_Negative = sum_TN / total,
         False_Negative = sum_FN / total,
         False_Positive = sum_FP / total,
         Accuracy = (sum_TP + sum_TN) / total, Threshold = x)

  all_prediction <- rbind(all_prediction, this_prediction)
  x <- x + .01
  }
return(all_prediction)
}

whichThreshold <- iterateThresholds(testProbs)

allThresholds<-kable(whichThreshold) %>%
  kable_styling(bootstrap_options = "striped", full_width = FALSE)%>%
  scroll_box(width = "100%", height = "500px")

allThresholds

sum_TN	sum_TP	sum_FN	sum_FP	total	True_Positive	True_Negative	False_Negative	False_Positive	Accuracy	Threshold
2786	580	12	2183	5561	0.1042978	0.5009890	0.0021579	0.3925553	0.6052868	0.01
3287	574	18	1682	5561	0.1032188	0.5910807	0.0032368	0.3024636	0.6942996	0.02
3576	570	22	1393	5561	0.1024996	0.6430498	0.0039561	0.2504945	0.7455494	0.03
3797	560	32	1172	5561	0.1007013	0.6827909	0.0057544	0.2107535	0.7834922	0.04
3971	556	36	998	5561	0.0999820	0.7140802	0.0064737	0.1794641	0.8140622	0.05
4080	555	37	889	5561	0.0998022	0.7336810	0.0066535	0.1598633	0.8334832	0.06
4176	552	40	793	5561	0.0992627	0.7509441	0.0071930	0.1426003	0.8502068	0.07
4245	547	45	724	5561	0.0983636	0.7633519	0.0080921	0.1301924	0.8617155	0.08
4308	543	49	661	5561	0.0976443	0.7746808	0.0088114	0.1188635	0.8723251	0.09
4368	533	59	601	5561	0.0958461	0.7854702	0.0106096	0.1080741	0.8813163	0.10
4418	529	63	551	5561	0.0951268	0.7944614	0.0113289	0.0990829	0.8895882	0.11
4486	521	71	483	5561	0.0936882	0.8066894	0.0127675	0.0868549	0.9003776	0.12
4530	513	79	439	5561	0.0922496	0.8146017	0.0142061	0.0789426	0.9068513	0.13
4572	507	85	397	5561	0.0911707	0.8221543	0.0152850	0.0713900	0.9133249	0.14
4608	501	91	361	5561	0.0900917	0.8286279	0.0163640	0.0649164	0.9187197	0.15
4644	500	92	325	5561	0.0899119	0.8351016	0.0165438	0.0584427	0.9250135	0.16
4667	494	98	302	5561	0.0888329	0.8392375	0.0176227	0.0543068	0.9280705	0.17
4695	490	102	274	5561	0.0881136	0.8442726	0.0183420	0.0492717	0.9323863	0.18
4719	488	104	250	5561	0.0877540	0.8485884	0.0187017	0.0449559	0.9363424	0.19
4733	484	108	236	5561	0.0870347	0.8511059	0.0194210	0.0424384	0.9381406	0.20
4744	482	110	225	5561	0.0866751	0.8530840	0.0197806	0.0404603	0.9397590	0.21
4759	479	113	210	5561	0.0861356	0.8557813	0.0203201	0.0377630	0.9419169	0.22
4769	477	115	200	5561	0.0857759	0.8575796	0.0206797	0.0359648	0.9433555	0.23
4779	470	122	190	5561	0.0845172	0.8593778	0.0219385	0.0341665	0.9438950	0.24
4790	468	124	179	5561	0.0841575	0.8613559	0.0222981	0.0321885	0.9455134	0.25
4803	467	125	166	5561	0.0839777	0.8636936	0.0224780	0.0298507	0.9476713	0.26
4815	463	129	154	5561	0.0832584	0.8658515	0.0231973	0.0276929	0.9491099	0.27
4824	461	131	145	5561	0.0828988	0.8674699	0.0235569	0.0260744	0.9503686	0.28
4831	454	138	138	5561	0.0816400	0.8687286	0.0248157	0.0248157	0.9503686	0.29
4839	454	138	130	5561	0.0816400	0.8701672	0.0248157	0.0233771	0.9518072	0.30
4848	450	142	121	5561	0.0809207	0.8717857	0.0255350	0.0217587	0.9527063	0.31
4854	446	146	115	5561	0.0802014	0.8728646	0.0262543	0.0206797	0.9530660	0.32
4858	442	150	111	5561	0.0794821	0.8735839	0.0269736	0.0199604	0.9530660	0.33
4862	437	155	107	5561	0.0785830	0.8743032	0.0278727	0.0192411	0.9528862	0.34
4864	433	159	105	5561	0.0778637	0.8746628	0.0285920	0.0188815	0.9525265	0.35
4866	429	163	103	5561	0.0771444	0.8750225	0.0293113	0.0185218	0.9521669	0.36
4869	426	166	100	5561	0.0766049	0.8755619	0.0298507	0.0179824	0.9521669	0.37
4873	425	167	96	5561	0.0764251	0.8762812	0.0300306	0.0172631	0.9527063	0.38
4876	421	171	93	5561	0.0757058	0.8768207	0.0307499	0.0167236	0.9525265	0.39
4876	420	172	93	5561	0.0755260	0.8768207	0.0309297	0.0167236	0.9523467	0.40
4876	417	175	93	5561	0.0749865	0.8768207	0.0314692	0.0167236	0.9518072	0.41
4878	413	179	91	5561	0.0742672	0.8771804	0.0321885	0.0163640	0.9514476	0.42
4878	410	182	91	5561	0.0737277	0.8771804	0.0327279	0.0163640	0.9509081	0.43
4879	405	187	90	5561	0.0728286	0.8773602	0.0336270	0.0161841	0.9501888	0.44
4881	401	191	88	5561	0.0721093	0.8777198	0.0343463	0.0158245	0.9498292	0.45
4883	398	194	86	5561	0.0715699	0.8780795	0.0348858	0.0154648	0.9496493	0.46
4886	390	202	83	5561	0.0701313	0.8786190	0.0363244	0.0149254	0.9487502	0.47
4890	382	210	79	5561	0.0686927	0.8793382	0.0377630	0.0142061	0.9480309	0.48
4892	377	215	77	5561	0.0677936	0.8796979	0.0386621	0.0138464	0.9474915	0.49
4893	372	220	76	5561	0.0668944	0.8798777	0.0395612	0.0136666	0.9467722	0.50
4896	368	224	73	5561	0.0661751	0.8804172	0.0402805	0.0131271	0.9465923	0.51
4897	365	227	72	5561	0.0656357	0.8805970	0.0408200	0.0129473	0.9462327	0.52
4898	355	237	71	5561	0.0638374	0.8807768	0.0426182	0.0127675	0.9446143	0.53
4899	348	244	70	5561	0.0625787	0.8809567	0.0438770	0.0125877	0.9435353	0.54
4903	341	251	66	5561	0.0613199	0.8816760	0.0451358	0.0118684	0.9429959	0.55
4904	337	255	65	5561	0.0606006	0.8818558	0.0458551	0.0116885	0.9424564	0.56
4906	333	259	63	5561	0.0598813	0.8822154	0.0465744	0.0113289	0.9420967	0.57
4906	331	261	63	5561	0.0595217	0.8822154	0.0469340	0.0113289	0.9417371	0.58
4909	329	263	60	5561	0.0591620	0.8827549	0.0472937	0.0107894	0.9419169	0.59
4913	326	266	56	5561	0.0586225	0.8834742	0.0478331	0.0100701	0.9420967	0.60
4917	321	271	52	5561	0.0577234	0.8841935	0.0487322	0.0093508	0.9419169	0.61
4917	314	278	52	5561	0.0564647	0.8841935	0.0499910	0.0093508	0.9406582	0.62
4917	311	281	52	5561	0.0559252	0.8841935	0.0505305	0.0093508	0.9401187	0.63
4918	306	286	51	5561	0.0550261	0.8843733	0.0514296	0.0091710	0.9393994	0.64
4920	301	291	49	5561	0.0541270	0.8847330	0.0523287	0.0088114	0.9388599	0.65
4920	299	293	49	5561	0.0537673	0.8847330	0.0526884	0.0088114	0.9385003	0.66
4923	291	301	46	5561	0.0523287	0.8852724	0.0541270	0.0082719	0.9376012	0.67
4923	289	303	46	5561	0.0519691	0.8852724	0.0544866	0.0082719	0.9372415	0.68
4924	287	305	45	5561	0.0516094	0.8854523	0.0548463	0.0080921	0.9370617	0.69
4924	285	307	45	5561	0.0512498	0.8854523	0.0552059	0.0080921	0.9367020	0.70
4924	282	310	45	5561	0.0507103	0.8854523	0.0557454	0.0080921	0.9361626	0.71
4927	279	313	42	5561	0.0501708	0.8859917	0.0562848	0.0075526	0.9361626	0.72
4931	271	321	38	5561	0.0487322	0.8867110	0.0577234	0.0068333	0.9354433	0.73
4932	268	324	37	5561	0.0481928	0.8868908	0.0582629	0.0066535	0.9350836	0.74
4933	262	330	36	5561	0.0471138	0.8870707	0.0593418	0.0064737	0.9341845	0.75
4934	257	335	35	5561	0.0462147	0.8872505	0.0602410	0.0062938	0.9334652	0.76
4935	253	339	34	5561	0.0454954	0.8874303	0.0609603	0.0061140	0.9329257	0.77
4936	250	342	33	5561	0.0449559	0.8876101	0.0614997	0.0059342	0.9325661	0.78
4937	248	344	32	5561	0.0445963	0.8877900	0.0618594	0.0057544	0.9323863	0.79
4939	244	348	30	5561	0.0438770	0.8881496	0.0625787	0.0053947	0.9320266	0.80
4939	235	357	30	5561	0.0422586	0.8881496	0.0641971	0.0053947	0.9304082	0.81
4940	230	362	29	5561	0.0413595	0.8883294	0.0650962	0.0052149	0.9296889	0.82
4942	225	367	27	5561	0.0404603	0.8886891	0.0659953	0.0048552	0.9291494	0.83
4942	221	371	27	5561	0.0397411	0.8886891	0.0667146	0.0048552	0.9284301	0.84
4942	213	379	27	5561	0.0383025	0.8886891	0.0681532	0.0048552	0.9269915	0.85
4942	210	382	27	5561	0.0377630	0.8886891	0.0686927	0.0048552	0.9264521	0.86
4943	201	391	26	5561	0.0361446	0.8888689	0.0703111	0.0046754	0.9250135	0.87
4943	195	397	26	5561	0.0350656	0.8888689	0.0713900	0.0046754	0.9239345	0.88
4945	184	408	24	5561	0.0330876	0.8892286	0.0733681	0.0043158	0.9223161	0.89
4945	174	418	24	5561	0.0312893	0.8892286	0.0751663	0.0043158	0.9205179	0.90
4946	164	428	23	5561	0.0294911	0.8894084	0.0769646	0.0041359	0.9188995	0.91
4947	160	432	22	5561	0.0287718	0.8895882	0.0776839	0.0039561	0.9183600	0.92
4947	153	439	22	5561	0.0275130	0.8895882	0.0789426	0.0039561	0.9171012	0.93
4948	137	455	21	5561	0.0246359	0.8897680	0.0818198	0.0037763	0.9144039	0.94
4951	121	471	18	5561	0.0217587	0.8903075	0.0846970	0.0032368	0.9120662	0.95
4956	110	482	13	5561	0.0197806	0.8912066	0.0866751	0.0023377	0.9109872	0.96
4961	98	494	8	5561	0.0176227	0.8921057	0.0888329	0.0014386	0.9097285	0.97
4962	82	510	7	5561	0.0147455	0.8922856	0.0917101	0.0012588	0.9070311	0.98
4963	72	520	6	5561	0.0129473	0.8924654	0.0935084	0.0010789	0.9054127	0.99

testProbs$predClass  = ifelse(testProbs$pred > .38 ,1,0)

xtab.regCalgary <- caret::confusionMatrix(reference = as.factor(testProbs$obs),
                       data = as.factor(testProbs$predClass),
                       positive = "1")

Confusion Matrix

There are four possible scenarios that will result from the use of this model: 1. Model predicts inundation and there is inundation (True Positive) 2. Model predicts no inundation and there is no inundation (True Negative) 3. Model predicts inundation and there is no inundation (False Positive) 4. Model predicts no inundation and there is inundation (False Negative)

The results of each scenario at the optimal threshold for accuracy (0.38) is displayed in the table below (with the confusion matrix of 0 and 1 directly below the table). These results support our above observation that the model predicts well for areas of no inundation than areas of inundation.

Type of Prediction	Description	Results
True Positive (TP)	Model predicts inundation and there is inundation	425
True Negative (TN)	Model predicts no inundation and there is no inundation	4873
False Positive (FP)	Model predicts inundation and there is no inundation	167
False Negative (FN)	Model predicts no inundation and there is inundation	96

as.matrix(xtab.regCalgary) %>% kable(caption = "Confusion Matrix") %>% kable_styling("striped", full_width = T, font_size = 14, position = "left")

Confusion Matrix
	0	1
0	4873	167
1	96	425

Confusion Matrix - Statistics

Further statistics associated with the confusion matrix are displayed in the table below. Of particular note are the sensitivity (rate of true positives) and specificity (rate of true negatives). While the results indicate that the model performs very well for true negatives (a specificity of 98%), it also predicts the true positives fairly well (a sensitivity of ~72%).

Since it is better to be overprepared for a flooding event than underprepared, we concluded that the number of false positive predictions was not a concern, especially given the precarious and unexpected nature of extreme weather events during an era of climate change.

as.matrix(xtab.regCalgary, what="classes") %>% kable(caption = "Confusion Matrix - Statistics") %>% kable_styling(font_size = 14, full_width = T,
                bootstrap_options = c("striped", "hover"))

Confusion Matrix - Statistics
Sensitivity	0.7179054
Specificity	0.9806802
Pos Pred Value	0.8157390
Neg Pred Value	0.9668651
Precision	0.8157390
Recall	0.7179054
F1	0.7637017
Prevalence	0.1064557
Detection Rate	0.0764251
Detection Prevalence	0.0936882
Balanced Accuracy	0.8492928

Map of Predictions for Calgary Testing Set

To spatially investigate the distribution of these four predicted outcomes, the map here shows the four confusion matrix results associated with each fishnet grid cell in the testing set. We elected to only show the predicts for the testing set since the amount of observations was already at a large number and patterns are already discernible.

Note how there are few false positives throughout the city, indicating that there are few places for which the model did not accurately predict inundation. The true positives are concentrated around the area where we saw major hydrological features and the stream network.

test_predictions <- testProbs %>%
  mutate(TN = predClass==0 & obs==0,
         TP = predClass==1 & obs==1,
         FN =  predClass==0 & obs==1,
         FP = predClass==1 & obs==0)


test_predictions <- test_predictions %>%
  mutate(confResult=case_when(TN == TRUE ~ "True_Negative",
                              TP == TRUE ~ "True_Positive",
                              FN == TRUE ~ "False_Negative",
                              FP == TRUE ~ "False_Positive"))


#join with geometry for the map of prediction classes
cal_test_predictions_mapdata <- cbind(calgaryTest, test_predictions, by= "ID_FISHNET") %>% st_as_sf()


ggplot() +
    geom_sf(data=cal_test_predictions_mapdata, aes(fill=confResult), colour=NA)+
  scale_fill_discrete()+
  mapTheme() +
  labs(title="Map of Prediction Classes on Calgary Test")

ROC Curve

The Receiver Operating Characteristic (ROC) Curve for the model is shown below. This curve is a helpful goodness of fit indicator, while helping to visualize trade-offs between true positive and false positive metrics at each threshold from 0.01 to 1. A line going “over” the curve indicates a useful fit. The area under the curve (AUC) here is about 0.96, indicating a useful fit. A reasonable AUC is between 0.5 and 1.

ggplot(testProbs, aes(d = obs, m = pred)) +
  geom_roc(n.cuts = 50, labels = FALSE) +
  style_roc(theme = theme_grey) +
  geom_abline(slope = 1, intercept = 0, size = 1.5, color = 'grey') + labs(title="ROC Curve", subtitle="AUC ~ 0.96") + plotTheme()

auc_calgary <- auc(testProbs$obs, testProbs$pred)
print(auc_calgary)

## Area under the curve: 0.9554

Cross-Validation

While the ROC indicates a relatively good fit, cross-validation is key to ensure that the model generalizes well across contexts. A k-fold cross validation (CV) is performed here.

#convert inundation to factor
calgary_fishnet_final$inundation <-as.factor(as.character(calgary_fishnet_final$inundation))

ctrl <- trainControl(method = "cv",
                     number = 100,
                     savePredictions = TRUE)

cvFit_inundationModel <- train(as.factor(inundation) ~ ., data = calgary_fishnet_final %>%
                            as.data.frame() %>%
                            select(-ID_FISHNET, -geometry, -slope_max, -imperv_pixels_sum),
                          method="glm", family="binomial", trControl=ctrl)

cvFit_inundationModel

## Generalized Linear Model
##
## 18538 samples
##     6 predictor
##     2 classes: '0', '1'
##
## No pre-processing
## Resampling: Cross-Validated (100 fold)
## Summary of sample sizes: 18352, 18353, 18353, 18353, 18353, 18352, ...
## Resampling results:
##
##   Accuracy  Kappa
##   0.94741   0.6813391

Accuracy and Kappa

To evaluate the algorithm further, the accuracy an kappa metrics shed light on goodness of fit. Accuracy is the percentage of instances classified correctly out of the total. While this does not provide the nuance provided in the confusion matrix (showing the breakdown of accuracy across the four classes), it is useful in defending the model’s success at a high level.

Kappa (also known as Cohen’s Kappa) is normalized accuracy. This is useful here because we have an imbalance in the 0s and 1s for no inundation and inundation, respectively. While the imbalance is expected given that the previous flood extent showed that the majority of Calgary was not inundated, Kappa is useful in its comparison of observed accuracy to expected accuracy, taking into account random chance. It is still a relatively high value with a mean of over 68%. According to best practices in statistics, a kappa value of 0.68 falls within the range of “substantial agreement.”

dplyr::select(cvFit_inundationModel$resample, -Resample) %>%
  gather(metric, value) %>%
  left_join(gather(cvFit_inundationModel$results[2:4], metric, mean)) %>%
  ggplot(aes(value)) +
    geom_histogram(bins=35, fill = "navy") +
    facet_wrap(~metric) +
    geom_vline(aes(xintercept = mean), colour = "red", linetype = 3, size = 1.5) +
    scale_x_continuous(limits = c(0, 1)) +
    labs(x="Goodness of Fit", y="Count", title="Accuracy and Kappa",
         subtitle = "Across-fold mean represented as dotted lines") +plotTheme()

Mapping Calgary Predictions

The final predictions for inundation across Calgary are mapped for each fishnet grid cell. We elected to display the predictions on a probabilities scale to achieve a more continuous surface.

allPredictions <-
  predict(cvFit_inundationModel, calgary_fishnet_final, type="prob")[,2]

calgary_fishnet_final_preds <- calgary_fishnet_final %>%
  cbind(calgary_fishnet_final,allPredictions) %>%
  mutate(allPredictions = round(allPredictions * 100))

ggplot() +
    geom_sf(data=calgary_fishnet_final_preds, aes(fill=(allPredictions)), colour=NA) +
  scale_fill_viridis(name = "Probability")+
  mapTheme() +
  labs(title="Predictions for Inundation in Calgary")

This second map shows the predicted probabilities, but with the 2013 flood extent overlay. This helps to compare the predicted inundation outcomes with the actual data used to train the model in the first place.

ggplot() +
    geom_sf(data=calgary_fishnet_final_preds, aes(fill=(allPredictions)), colour=NA) +
     scale_fill_viridis(name = "Probability")+
  geom_sf(data = calgary_fishnet_final_preds %>%
          filter(inundation=="1"),
          aes(), color="transparent", fill="red", alpha=0.5)+
  mapTheme() +
  labs(title="Predictions for Inundation in Calgary", subtitle = "2013 flood extent in red overlay")

Turning to Denver: How Does This Model Perform in Another City?

Now that we have confirmed the model’s high quality performance in Calgary, let’s see if is generalizable to other cities. Denver, Colorado has a similar topography, infrastructure, and population size to Calgary. We perform the same feature engineering for the selected features of the model and run the regression for the City of Denver.

den_boundary <- st_read("Data_Denver/county_boundary/county_boundary.shp") # need full shp package

# DENVER: Load Calgary fishnet generated in ArcGIS Pro - NEED IT AS A SHAPEFILE
den_fishnet <- st_read("Data_Denver/den_fishnet/den_fishnet.shp") %>% st_as_sf() %>% st_transform(st_crs(den_boundary))

den_fishnet$ID_FISHNET <-as.character(den_fishnet$ID_FISHNET)

# Denver: Elevation mean (calculated using zonal statistics (mean)).
den_elev <- st_read("Data_Denver/den_elev_mean.xls") %>%
  dplyr::select(FID, MEAN) %>%
  dplyr::rename(elevation_mean = MEAN) %>%
  dplyr::rename(ID_FISHNET = FID)

den_elev$ID_FISHNET <-as.character(den_elev$ID_FISHNET)

den_elev_grid <- full_join(den_elev, den_fishnet) %>% st_as_sf()
den_fishnet_new <- den_elev_grid

# DENVER: Streams - distance
den_streams_neardist <- st_read("Data_Denver/den_dist_streams2.xls")  %>%
  dplyr::select(-OBJECTID, -NEAR_FID) %>%
  dplyr::rename(ID_FISHNET = IN_FID) %>%
  dplyr::rename(dist_stream = NEAR_DIST)

den_streams_neardist$ID_FISHNET <-as.character(den_streams_neardist$ID_FISHNET)
den_streams_neardist_grid <- full_join(den_streams_neardist, den_fishnet_new) %>% st_as_sf()
den_fishnet_new <-  den_streams_neardist_grid

# DENVER: flow accumulation max per fishnet cell. Generated using zonal statistics (maximum).
den_flowacc <- st_read("Data_Denver/den_fac_max.xls") %>%
  dplyr::rename(flowacc_max = MAX) %>%
  dplyr::select(-COUNT, -AREA) %>%
  dplyr::rename(ID_FISHNET = FID)

den_flowacc$ID_FISHNET <-as.character(den_flowacc$ID_FISHNET)
den_flowacc_grid <- full_join(den_flowacc, den_fishnet_new) %>%st_as_sf()
den_fishnet_new <- den_flowacc_grid

# DENVER: Parks distance - from open data calgary. Used arcgis pro to create a near distance table.
den_parks_neardist <- st_read("Data_Denver/den_dist_parks2.xls")  %>%
  dplyr::select(-OBJECTID, -NEAR_FID) %>%
  dplyr::rename(ID_FISHNET = IN_FID) %>%
  dplyr::rename(dist_parks = NEAR_DIST)

den_parks_neardist$ID_FISHNET <-as.character(den_parks_neardist$ID_FISHNET)
den_parks_neardist_grid <- full_join(den_parks_neardist, den_fishnet_new) %>% st_as_sf()
den_fishnet_new <-  den_parks_neardist_grid

# DENVER:  hydro shapefile import
den_hydro <- st_read("Data_Denver/den_water/den_water.shp") %>%
  st_transform(st_crs(den_fishnet_new)) %>% st_as_sf()

# DENVER: hydro features intersections
den_fishnet_new <- den_fishnet_new %>% mutate(n_hydro_int = lengths(st_intersects(den_fishnet_new, den_hydro)))

# DENVER: Distance to steep slopes per fishnet cell. Generated using near table, near distance.
den_steepslope_dist <- st_read("Data_Denver/den_dist_slope2.xls")  %>%
  dplyr::select(-OBJECTID, -NEAR_FID) %>%
  dplyr::rename(ID_FISHNET = IN_FID) %>%
  dplyr::rename(steepslope_dist = NEAR_DIST)

den_steepslope_dist$ID_FISHNET <-as.character(den_steepslope_dist$ID_FISHNET)
den_steepslope_dist_grid <- full_join(den_steepslope_dist, den_fishnet_new) %>%st_as_sf()
den_fishnet_new <- den_steepslope_dist_grid

# Remove unnecessary columns
den_fishnet_new <- den_fishnet_new %>%
    dplyr::select(-Id, -OBJECTID)

den_selectCentroids <-
  st_centroid(den_fishnet_new)[den_boundary,] %>%
  st_drop_geometry() %>%
  left_join(dplyr::select(den_fishnet_new, ID_FISHNET)) %>%
  st_sf()

den_fishnet_final <- den_selectCentroids

allPredictions_den <-
  predict(inundationModel, den_fishnet_final, type="response")


den_fishnet_final_preds <- den_fishnet_final %>%
  cbind(den_fishnet_final, allPredictions_den) %>%
  mutate(allPredictions_den = round(allPredictions_den*100)) %>% st_as_sf()

ggplot() +
    geom_sf(data=den_fishnet_final_preds, aes(fill=(allPredictions_den)), colour=NA) +
  scale_fill_viridis(name = "Probability")+
  mapTheme() +
  labs(title="Predictions for Inundation in Denver")

Conclusion

The results of this model are crucial to informing planners and allowing a city to be prepared for flood disasters. Successfully leveraging the knowledge and insight gained can save lives, cities, homes, and money. There are many implementation challenges when it comes to appropriately executing emergency preparedness programs such as funding, political and community support, and collaboration across departments. However, we are confident that the story being told through these data visualizations can be an effective tool in convincing relevant decision-makers.

Forecasting Flood Inundation

Kristin Chang & Jenna Epstein

March 31, 2021