-
Notifications
You must be signed in to change notification settings - Fork 1
/
summer_progress.Rmd
479 lines (393 loc) · 19.7 KB
/
summer_progress.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
---
title: "Summary of sediment-storm relationship"
author: "Lindsay R.C. Platt"
date: "2023-08-22"
output:
html_document:
toc: true
toc_float: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo=FALSE,
warning=FALSE,
message=FALSE
)
```
## Background
We hypothesize that more frequent, high intensity storms are contributing to the increases in algal blooms in western arm of Lake Superior. Following the remotely sensed sediment classification workflow, we set out to start exploring the relationship between sediment presence with conditions in the contributing basins.
[This issue on GitHub](https://github.com/LimnoDataScience/plume_bloom_drivers/issues/11) chronologically captures my exploration process. However, I am summarizing our current findings and next steps in this document. The code captured here assumes that you have run the pipeline and have created the targets referenced in `tar_read()` or `tar_load()` commands.
```{r load_things}
library(cowplot)
library(ggpubr)
library(scico)
library(sf)
library(targets)
library(tidyverse)
# General info
tar_load(p1_nwis_sites)
tar_load(p2_obs_blooms_details)
# Timeseries data
rs_classified <- read_csv('GTB_by_mission_v2023-08-03.csv') # This has yet to be pulled into the pipeline and can be downloaded from https://drive.google.com/drive/folders/1DVs8635XP-zwDUuzd6P2UxfFCD5biP4K
tar_load(p2_prism_data_huc)
tar_load(p1_nwis_Q)
# Spatial elements
tar_load(p1_nwis_sites_sf)
tar_load(p1_huc10_nwis_sites)
tar_load(p2_lake_superior_watershed_dissolved)
# Helper functions/settings
site_huc_xwalk <- p1_huc10_nwis_sites %>%
st_drop_geometry() %>%
select(nwis_site, HUC10 = huc10) %>%
left_join(p1_nwis_sites, by = "nwis_site")
# Quantifying "95th percentile rainfall event" as severe storm
# https://www.epa.gov/sites/default/files/2015-08/documents/epa_swm_guidance.pdf
storm_perc <- 0.95
add_storm_ppt <- function(all_ppt, storm_percentile = 0.95) {
quantile(all_ppt, probs = storm_percentile) %>% unname()
}
# Pass in a vector of month numbers (1 thru 12) and get a
# vector of the same size with the season back.
month_to_season <- function(month_num) {
# Create vector with season as the value and month number as the name
month_season <- setNames(rep(c("Winter", "Spring", "Summer", "Fall"), each = 3), c(12, 1:11))
# Arrange so that DEC is in the 12th spot
month_season <- month_season[order(as.numeric(names(month_season)))]
# Pull out the season based on month number as the indices
season_out <- month_season[month_num]
names(season_out) <- NULL # Drop the names attribute
return(season_out)
}
order_seasons <- function(season_vec) {
ordered(season_vec, levels = c('Winter', 'Spring', 'Summer', 'Fall'))
}
```
## Area of interest
We chose four 10-digit Hydrologic Unit Codes (HUC10) in the Lake Superior watershed with a USGS National Water Information System (NWIS) stream gage. The approach for downloading and processing the climate driver and discharge data is currently automated. Adding a new basin in the future would be easy; we only need to manually identify the NWIS stream gage associated with it. Below is a map of the four basins and their NWIS stream gages (not all gages are located at the outlet of the basin into Lake Superior, some are further upstream).
```{r basin_map}
crs_to_map <- st_crs(p2_lake_superior_watershed_dissolved)
huc10_sf <- p1_huc10_nwis_sites %>%
left_join(p1_nwis_sites, by = "nwis_site") %>%
mutate(map_name = sprintf('%s (NWIS Site %s)',
river, nwis_site)) %>%
st_transform(crs_to_map)
ggplot() +
geom_sf(data = huc10_sf, aes(fill = map_name), lwd=0.75) +
geom_sf(data = p2_lake_superior_watershed_dissolved,
fill=NA, lwd=1) +
geom_sf(data = st_transform(p1_nwis_sites_sf, crs_to_map),
color = "#d8392b", size = 4, shape = 'X') +
# scale_shape(name = "NWIS Site Number") +
scale_fill_scico_d(name = "") +
theme_void() +
theme(legend.position = c(0.80,0.70),
plot.title = element_text(hjust = 0.5)) #+
# ggtitle('NWIS site locations and HUC10 watersheds')
```
## Visualizing data
### Sediment presence
The main data input is our sediment presence data. I started with the tabular output available from the [Superior-Plume-Bloom classification workflow](https://github.com/rossyndicate/Superior-Plume-Bloom). These tabular data report sediment percentages for the full Lake Superior AOI per satellite date. There is no spatial information associated with them, and I filtered to only those mission-dates where at least 50% of the AOI was able to be classified. There are now exported rasters of the classified output, but I have yet to explore them in this analysis. I will be moving onto that next; see [Next Steps](#next-steps).
<a name="sediment_ts"></a>
```{r sediment_ts, fig.width=10, fig.height=8}
sediment <- rs_classified %>%
filter(extent == 'aoi no sc') %>%
# Remove dates and missions where <50% of the AOI
# scene was able to be classified
filter(perc_extent_classified >= 50) %>%
# Keep only the percentages for sediment
select(mission, date,
cloud_pct = cloud_perc_extent_area,
light_nearshore_pct = lightNSSed_perc_extent_area,
dark_nearshore_pct = darkNSSed_perc_extent_area,
offshore_pct = offShoreSed_perc_extent_area) %>%
mutate(sediment_total_pct = light_nearshore_pct + dark_nearshore_pct + offshore_pct) %>%
mutate(year = year(date),
year_mission = sprintf('%s_%s', year, mission),
season = order_seasons(month_to_season(month(date))))
p1 <- ggplot(sediment, aes(x=date, y=sediment_total_pct)) +
geom_point(aes(alpha = cloud_pct, fill = season),
shape = 22, color = "transparent", size=2) +
theme_bw() +
theme(legend.position="none") +
facet_grid(. ~ season) +
geom_smooth(color = 'black') +
ylab('Daily % sediment') +
xlab('Date') +
ggtitle('Daily timeseries of % sediment, transparency based on cloud cover')
p2 <- sediment %>%
filter(cloud_pct < 50) %>%
ggplot(aes(x=date, y=sediment_total_pct,
color = season, fill = season)) +
geom_smooth(color = 'black') +
geom_point(shape = 22, color = "transparent", size=2) +
theme_bw() +
theme(legend.position = "none") +
facet_grid(. ~ season) +
ylab('Daily % sediment') +
xlab('Date') +
ggtitle('Daily timeseries of % sediment, only clouds below 50%')
p3 <- sediment %>%
group_by(year, season) %>%
summarize(avg_sediment = mean(sediment_total_pct)) %>%
ggplot(aes(x = year, y = avg_sediment,
color = season, fill = season)) +
geom_smooth(color = 'black') +
geom_point() +
theme_bw() +
theme(legend.position = "none") +
facet_grid(. ~ season) +
ylab('Average annual % sediment') +
xlab('Year') +
ggtitle('Annual timeseries of % sediment')
cowplot::plot_grid(p1, p2, p3, nrow = 3, ncol = 1)
```
### Precipitation
Precipitation data was downloaded from PRISM and summarized into daily totals per basin for each of the four basins (more about that process is documented in [this pull request](https://github.com/LimnoDataScience/plume_bloom_drivers/pull/8)). You'll notice a slight dip in the precip totals between 2000-2010. This happens around the same time as a dip in the sediment timeseries (see the [sediment figure](#sediment_ts)).
```{r precip_ts, fig.width=8, fig.height=10}
ppt_ready <- p2_prism_data_huc %>%
filter(variable == 'ppt') %>%
rename(ppt = value_huc, HUC10 = huc) %>%
# GROUP BY HUC AND CALC STORM VALUE
group_by(HUC10) %>%
mutate(ppt_huc_storm = add_storm_ppt(ppt, 0.95)) %>%
ungroup() %>%
mutate(season = order_seasons(season)) %>% # Add the season as a factor
left_join(site_huc_xwalk, by = "HUC10") # Add river names
# Filter data to only days with storm level rainfall
ppt_storm <- ppt_ready %>%
filter(ppt >= ppt_huc_storm)
precip_ts_p1 <- ggplot(ppt_storm, aes(x=date, y=ppt)) +
geom_point(aes(color = season), shape = 1) +
geom_smooth(color = 'black') +
theme_bw() +
theme(legend.position = "none") +
facet_grid(river ~ season, scales = 'free') +
ylab('Daily precip, mm') +
xlab('Date') +
ggtitle(sprintf('Timeseries of storm event rainfall totals (>= %sth percentile)', storm_perc*100)) #+
# theme(legend.position='none')
precip_ts_p2 <- ppt_storm %>%
group_by(river, year, season) %>%
tally() %>%
ggplot(aes(x=year, y = n, fill = season)) +
geom_bar(stat = 'identity', position = position_dodge()) +
geom_smooth(color = 'black') +
theme_bw() +
theme(legend.position = "none") +
facet_grid(river ~ season, scales = 'free') +
ylab('Number of storm events') +
xlab('Date') +
ggtitle(sprintf('Timeseries of storm event occurrence (>= %sth percentile)', storm_perc*100))
cowplot::plot_grid(precip_ts_p1, precip_ts_p2, nrow = 2, ncol = 1)
```
### NWIS discharge
Discharge was downloaded using `dataRetrieval`. Daily mean values are visualized below for each of the four NWIS sites identified. Note that the Siskiwit site is a newer site and has fewer data points. As such, it may be difficult to draw conclusions about a sediment-discharge relationship for this basin in future analyses.
Similar to the sediment and precip, we see a slight dip between 2000-2010. Interesting ...
```{r discharge_ts, fig.width=8, fig.height=6}
# Filter data to only days with storm level rainfall
Q_ready <- p1_nwis_Q %>%
mutate(year = year(date),
month = month(date)) %>%
mutate(year.month = as.numeric(sprintf('%s.%s', year, month)),
season = order_seasons(month_to_season(month))) %>%
left_join(site_huc_xwalk, by = "nwis_site") # Add river names
ggplot(Q_ready, aes(x=date, y=Q)) +
geom_point(aes(color = season), shape = 1) +
geom_smooth(color = 'black') +
theme_bw() +
theme(legend.position = "none") +
facet_grid(river ~ season, scales = 'free') +
ylab('Daily mean discharge, ft3/sec') +
xlab('Date')
```
## Building relationships
### Precip-Q
I took some time to evaluate the relationship between our weighted cumulative precip data and the discharge reported at each basin. The initial relationships were not encouraging, so we tried a few different things:
1. accounting for basin shape/size and gage placement using a lag,
1. separating out the seasons, and
1. filtering to only storm days.
A lag between discharge and precipitation for each basin was determined by using a cross correlation function and identifying the maximum correlation for each basin. This plot shows differences in how the discharge and precipitation are correlated for each basin.
```{r q_precip_lag, fig.height=6, fig.width=7}
precip_Q <- ppt_ready %>%
left_join(Q_ready, by = c("river", "nwis_site", "HUC10", "date"))
# Run a cross correlation between Q & precip for each basin
ccf_ppt_q_byriver <- precip_Q %>%
filter(!is.na(Q), !is.na(ppt)) %>%
split(.$river) %>%
purrr::map(~ccf(.x$ppt, .x$Q, plot = F, ci = 0.99))
par(mfrow = c(2,2))
for(i in 1:length(ccf_ppt_q_byriver)) {
plot(ccf_ppt_q_byriver[[i]],
main = names(ccf_ppt_q_byriver)[i])
arrows(x0 = 0, y0 = -0.1, y1 = 0,
col='red', lwd=2, length=0.1)
}
par(mfrow = c(1,1))
# Extract the maximum lag above a correlation value of 0.05
statsig_lags <- ccf_ppt_q_byriver %>%
purrr::map(function(x) {
blue_line_val <- qnorm((1 + 0.99)/2) / sqrt(x$n.used)
lags_above <- x$lag[x$acf > 0.20]
# Choose the biggest lag (negative or positive)
lag_out <- lags_above[which.max(abs(lags_above))]
return(lag_out)
})
```
Both by actual date and by using each of the individual basin lags, precipitation was matched to discharge values. All four panels show the relationship between discharge vs precipitation. It mostly makes sense but might not be as close to a 1:1 as I thought it would be. Note that you will not see a difference for Siskiwit when lag is added because the lag found from cross correlation was 0 days.
```{r precip_q, fig.height=7, fig.width=10}
# Apply the lag to the precip-Q relationship part.
ppt_q_comparison_data <- precip_Q %>%
# Need to add a column that represents the precip if you include the lag per river
# This method assumes that `split(.$river)` and `statsig_lags` have rivers in the same order
split(.$river) %>%
purrr::map2(statsig_lags, ~mutate(.x, ppt_lag = lag(ppt, abs(.y)))) %>%
bind_rows() %>%
# Change season to factor for plotting
mutate(season = ordered(month_to_season(month(date)),
levels = c('Winter', 'Spring', 'Summer', 'Fall')))
p <- ggplot(ppt_q_comparison_data, aes(x = ppt, y = Q, color = season)) +
geom_point(alpha = 0.70) +
facet_grid(river ~ season, scales="free_y") +
geom_smooth(color = 'black') +
ylab('Daily mean discharge, ft^3/sec') + xlab('Total daily precip, mm') +
theme_bw() +
theme(legend.position = "none") +
ggtitle('Q-PPT relationship')
p_storm <- ppt_q_comparison_data %>%
# Filter to only storms
filter(ppt >= ppt_huc_storm) %>%
ggplot(aes(x = ppt, y = Q, color = season)) +
geom_point(alpha = 0.70) +
facet_grid(river ~ season, scales="free_y") +
geom_smooth(color = 'black') +
ylab('Daily mean discharge, ft^3/sec') + xlab('Total daily precip, mm') +
theme_bw() +
theme(legend.position = "none") +
ggtitle('Q-PPT relationship, only storms')
p_lag <- ggplot(ppt_q_comparison_data, aes(x = ppt_lag, y = Q, color = season)) +
geom_point(alpha = 0.70) +
facet_grid(river ~ season, scales="free_y") +
geom_smooth(color = 'black') +
ylab('Daily mean discharge, ft^3/sec') + xlab('Total daily precip, mm') +
theme_bw() +
theme(legend.position = "none") +
ggtitle('Q-PPT relationship with lag')
p_lag_storm <- ppt_q_comparison_data %>%
# Filter to only storms
filter(ppt_lag >= ppt_huc_storm) %>%
ggplot(aes(x = ppt_lag, y = Q, color = season)) +
geom_point(alpha = 0.70) +
facet_grid(river ~ season, scales="free_y") +
geom_smooth(color = 'black') +
ylab('Daily mean discharge, ft^3/sec') + xlab('Total daily precip, mm') +
theme_bw() +
theme(legend.position = "none") +
ggtitle('Q-PPT relationship with lag, only storms')
cowplot::plot_grid(p, p_lag, p_storm, p_lag_storm)
```
### Precip-sediment
Our initial hypothesis was the more intense rainfall events have led to the increased algal blooms. To start to understand whether this could be true, I looked at the relationship between precipitation and sediment presence. I filtered the dataset to only those mission-dates around an observed bloom event. The precipitation was totaled per basin for 7-day ahead cumulative precipitation values using only those days that experienced "storm" (greater than the 95th percentile) precipitation. Since we have not started to use the rasterized output data for sediment, each of the relationships are using the same sediment data - daily percent of sediment in the entire classifiable AOI without shoreline contamination. For each bloom date, I calculated the sediment presence as the maximum sediment percentage using daily values from 7 days ahead PLUS 3 days after the bloom date.
```{r precip_sediment, fig.width=8, fig.height=8}
# Identify exact bloom dates
exact_bloom_dates <- unique(p2_obs_blooms_details$`Start Date`)
# Create n-day window leading up to each bloom to capture rainfall
days_ahead <- 7
window_bloom_dates_before <- map(exact_bloom_dates, function(date) {
tibble(bloom_date = date,
date = seq(date - days_ahead, date, by = 1))
}) %>% bind_rows() %>% distinct() %>%
arrange(bloom_date) %>%
# For bloom dates that may overlap, only keep the early bloom id
arrange(date) %>%
group_by(date) %>%
filter(bloom_date == min(bloom_date))
# Create n-day window after each bloom to capture sediment plumes
days_after <- 3
window_bloom_dates_after <- map(exact_bloom_dates, function(date) {
tibble(bloom_date = date,
date = seq(date, date + days_after , by = 1))
}) %>% bind_rows() %>% distinct() %>%
arrange(bloom_date) %>%
# For bloom dates that may overlap, only keep the early bloom id
arrange(date) %>%
group_by(date) %>%
filter(bloom_date == min(bloom_date))
# Combine all the dates
window_bloom_dates_all <- window_bloom_dates_before %>%
bind_rows(window_bloom_dates_after) %>%
distinct()
ppt <- ppt_ready %>%
# Filter only to storm data
filter(ppt >= ppt_huc_storm) %>%
inner_join(window_bloom_dates_before, by = 'date') %>%
# For each bloom event, total up the precipitation
group_by(HUC10, bloom_date) %>%
summarize(ppt_bloom_total = sum(ppt))
# Average the sediment percentage if more than one mission is present
sediment_onevalperday <- sediment %>%
group_by(date) %>%
summarize(sediment_total_pct_missionavg = mean(sediment_total_pct)) %>%
inner_join(window_bloom_dates_all, by = 'date') %>%
# For each bloom event, find the maximum sediment
group_by(bloom_date) %>%
summarize(sediment_pct_max = mean(sediment_total_pct_missionavg))
# Join the sediment percentage per bloom date to the precip value per bloom date
ppt_sediment <- ppt %>%
full_join(sediment_onevalperday, by = 'bloom_date') %>%
left_join(site_huc_xwalk)
# Correlation plot as fxn
correlate_sed_ppt <- function(in_data, in_title) {
ggpubr::ggscatter(
in_data,
x = "ppt_bloom_total",
y = "sediment_pct_max",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "kendall",
xlab = sprintf("%s-day cumulative storm precip\nleading up to bloom date", days_ahead),
ylab = "% sediment",
ylim = c(-50, 100),
title = in_title
)
}
huc_plot_list <- ppt_sediment %>%
split(.$river) %>%
purrr::map(~correlate_sed_ppt(.x, unique(.x$river)))
cowplot::plot_grid(plotlist=huc_plot_list)
```
### Q-sediment
Since the precip-sediment relationships were not as strong as I thought they would be, I also tried looking at discharge-sediment. These are still not very strong, though Bois Brule looks more promising, so I think trying them again but using only sediment in areas that are near the river outlets (see [Next Steps](#next-steps)) might strengthen this relationship.
For each bloom event, I calculated a matching discharge by finding the maximum discharge over the previous 7 days.
```{r q_sediment, fig.height=8, fig.width=8}
# This figure uses a lot of the R objects created in the previous chunk
Q_bloom <- Q_ready %>%
# For each bloom event, average the discharge ahead of the bloom
inner_join(window_bloom_dates_before, by = 'date') %>%
group_by(river, bloom_date) %>%
summarize(bloom_Q = max(Q))
# Join the sediment percentage per bloom date to the precip value per bloom date
discharge_sediment <- Q_bloom %>%
full_join(sediment_onevalperday, by = 'bloom_date')
# Correlation plot as fxn
correlate_sed_discharge <- function(in_data, in_title) {
ggpubr::ggscatter(
in_data,
x = "bloom_Q",
y = "sediment_pct_max",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "kendall",
xlab = sprintf("%s-day max discharge\nleading up to bloom date, cfs", days_ahead),
ylab = "% sediment",
ylim = c(-50, 100),
title = in_title
)
}
huc_plot_list <- discharge_sediment %>%
split(.$river) %>%
purrr::map(~correlate_sed_discharge(.x, sprintf('%s', unique(.x$river))))
cowplot::plot_grid(plotlist=huc_plot_list)
```
## Next steps
<a name="next-steps"></a>
At this time, I only have two ideas of what to try next. I will be interested in connecting about where we can go next with this.
1. Use [the raster sediment presence output](https://www.hydroshare.org/resource/17cd38e9ac7845c29b0f45dab15e7073/) (maybe automate the download with [this tool](https://castronova.github.io/hstools/getting-started/)) to rebuild precip-sediment and Q-sediment relationships by grouping raster cells that are near each of the basin outlets.
1. Try the discharge-sediment relationship again but use custom windows based on the identified basin lag from the cross correlation function to set the window for finding maximum discharge ahead of a bloom event.