1 minute read
FORECASTING FLOOD INUNDATION
academic - group / 2022 / predictive modeling programs used: ArcGIS, R Studio
The worsening effects of climate change mean that the average 100year floodplain in Canadian river cities is projected to expand by 45% by the year 2100. In 2013, Calgary experienced its worst flood since 1932 mainly due to heavy rainfall. For this project we created a binomial logistic regression model based on data from the 2013 flood in Calgary to predict inundation in other comparable river cities and used Edmonton to test the model’s efficacy.
Advertisement
To build the inundation model, we first used ArcGIS to transform raster data into fishnet grid cell features. We included six features: 1) distance to all water bodies, (2) distance to stream network, (3) flow accumulation sum for each cell, (4) distance to steep slopes. (5) slope categorization – flat to very steep, and (6) percent of cells that are impervious. To the right, are the four features that proved to be the most statistically significant.
We then turned to R to train our model and compare the cells our model predicted as inundated/not inundated with the cells that were actually flooded in 2013. Our regression model presented us with an ROC curve that had an area of almost one and the distribution of predictions clustered around 0.9. This means our model performs well. To see our code, follow this link to my GitHub page.
Calgary training set results
# Building the Innundation Model calgary <calgary %>% select(-geometry) %>% mutate(slope_cat = as.factor(slope_cat)) set.seed(3456) #random set of numbers to replicate sample trainIndex <- createDataPartition(calgary$slope_cat, p = .70, list = FALSE, times = 1) #include all categorical variables as first arguments calgaryTrain <- calgary[ trainIndex,] calgaryTest <- calgary[-trainIndex,]
Statistically Significant Variables
calgaryModel <- glm(inundation ~ ., family=“binomial”(link=“logit”), data = calgaryTrain %>% as.data.frame() %>% select(-geometry))
# Testing the Model classProbs <- predict(calgaryModel, calgaryTest, type=”response”) testProbs <- data.frame(obs = as.numeric(calgaryTest$inundation), pred = classProbs) testProbs$predClass = ifelse(testProbs$pred > .25,1,0) caret::confusionMatrix(reference = as.factor(testProbs$obs), data = as.factor(testProbs$predClass), positive = “1”)
# Training the Model auc(testProbs$obs, testProbs$pred) ctrl <- trainControl(method = “cv”, number = 100, savePredictions = TRUE) cvFit <- train(as.factor(inundation) ~ ., data = calgary %>% as.data.frame() %>% select(-geometry, -slope_cat), #remove categorical variables method=”glm”, family=”binomial”, trControl = ctrl) cvFit allPredictions <predict(cvFit, calgary, type=”prob”)[,2] calgary <cbind(calgary,allPredictions) %>% mutate(allPredictions = round(allPredictions * 100)) inundation prediction from trained data set
#Plot Map with Inundation Predictions for Calgary calgary %>% ggplot()+ geom_sf(aes(fill = allPredictions), color = “transparent”, apha = 0.75)+ geom_sf(data=calgary_boundary, fill = “transparent”, color = “#474747”, size = 0.5)+ scale_fill_viridis(alpha = 1,begin = 1, end = 0.6,direction = 1, discrete = FALSE, option = “A”) + labs(title=“Inundation Prediction for Calgary”, fill = “flooding\n probability (%)”)+ mapTheme,
# code adapted from “CPLN675: Spatial Predictive Modeling for Classification” by Ken Steif & Michael Fichman.