--- title: "Weight for the death basin for the North-East America application of GR3D" author: "P. Lambert" date: "`r format(Sys.time(), '%d %B %Y')`" bibliography: biblio.bib csl: ecological-modelling.csl output: bookdown::word_document2: df_print: kable reference_docx: ref_doc.dotx # toc: yes # fig_caption: yes plots: style: Normal align: center caption: style: Image Caption pre: 'Figure ' sep: ': ' # do: default editor_options: chunk_output_type: console --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, include = FALSE, fig.alig = 'left') ``` ```{r library} library(dplyr) library(tidyr) library(ggplot2) library(knitr) library(flextable) library(bookdown) library(stringr) library(purrr) ``` ```{r load data} rm(list = ls()) # load list of basins and distance matrix (with "_" instead of " " and "." in basin name) distanceAA <- as.matrix(read.csv("../../data/input/atlanticarea/distanceGridAA.csv", row.names = 1, stringsAsFactors = FALSE)) distanceAA <- distanceAA %>% replace(., col(.) == row(.), NA) %>% as.data.frame() %>% mutate(destination = str_replace_all(row.names(.), "([ '\\-\\.])", "_")) %>% pivot_longer(cols = -destination, names_to = 'departure', values_to = 'distance') %>% mutate(departure = str_replace_all(departure, "([ '\\-\\.])", "_")) %>% dplyr::select(departure, destination, distance) %>% arrange(departure, distance) riverBasinsAA = read.csv("../../data/input/atlanticarea/aa_basins.csv") %>% mutate(basin_name = str_replace_all(basin_name, "([ '\\-\\.])", "_")) distanceNEA <- as.matrix(read.csv("../../data/input/northeastamerica/distanceGridNEA.csv", row.names = 1, stringsAsFactors = FALSE)) distanceNEA <- distanceNEA %>% replace(., col(.) == row(.), NA) %>% as.data.frame() %>% mutate(destination = str_replace_all(row.names(.), "([ '\\-\\.])", "_")) %>% pivot_longer(cols = -destination, names_to = 'departure', values_to = 'distance') %>% mutate(departure = str_replace_all(departure, "([ '\\-\\.])", "_")) %>% dplyr::select(departure, destination, distance) %>% arrange(departure, distance) riverBasinsNEA = read.csv("../../data/input/northeastamerica/nea_riverbasins.csv") %>% mutate(basin_name = str_replace_all(basin_name, "([ '\\-\\.])", "_")) # nameRiver = riverBasinsNEA %>% select(basin_name) %>% arrange(basin_name) # nameDistance = distanceNEA %>% distinct(departure) %>% rename(basin_name = departure) %>% arrange(basin_name) # nameRiver %>% setdiff(nameDistance) # nameDistance %>% setdiff(nameRiver) # Rougier at al 2015 application riverAARougier2015 = read.csv("basinsRougieretal2015.csv", stringsAsFactors = FALSE) %>% rename(basin_id = id, basin_name = nomBV) %>% mutate(basin_name =replace(basin_name, basin_name == "Sevre_Niortaise", "Sevre Niortaise")) distanceAARougier2015 <- as.matrix(read.csv("distanceGridRougieretal2015.csv", row.names = 1, stringsAsFactors = FALSE)) %>% as.data.frame() %>% mutate(destination = colnames(.)) %>% pivot_longer(cols = -destination, names_to = 'departure', values_to = 'distance') %>% mutate(distance = replace(distance, destination == departure, NA)) %>% arrange(departure, distance) ``` ```{r load GR3D function} source("GR3Dfunction.R") ``` # Computation of the death basin weight ## Baseline The kernel function presently used in GR3D is only based on basin accessibility (linked to distance between basins) even if a generic formulation including basin attractiveness (related to basin size) and fish ability (based on fish length) is proposed in @rougier2015. The first step is to compute for a departure $j_1$ the weight of each destination basin $j_2$ using: $$w_{j_1\rightarrow j_2} = \frac {1} {1 + e ^{\alpha_0 + \alpha_1 \cdot {\frac {( D_{j_1\rightarrow j_2} - \mu_D)} {\sigma_D} } } }$$ where $D_{j_1\rightarrow j_2}$ is the distance between the departure and destination basins, $\alpha_0$ and $\alpha_1$ are the kernel parameters, $\mu_D$ and $\sigma_D$ are the mean and standard deviation between inter basin distances. The last two parameters were introduced to standardise distance when accessibility is combining with attractiveness and ability. When only considering only accessibility, $\mu_D$ and $\sigma_D$ are simply linked to the distance strayer can reach. There is no need to be changed when the basins network (number and location of basins) changes. Definitely, the definition as mean and standard deviation of distances is confusing. The sum of weights for the departure basin $j_1$ is: $$w_{j_1} = \sum_{j_2 \neq j_1} {w_{j_1\rightarrow j_2}}$$ A strayers mortality is added by considering a "death basin" with a constant weight $w_{death}$. The probability $p_{j_1 \rightarrow j_2}$ for strayer to reach a destination basin $j_2$ from the departure basin $j_1$ is $$ p_{j_1 \rightarrow j_2} = \frac {w_{j_1 \rightarrow j_2}} {w_{death} + w_{j_1}} $$ \#\# Two metrics to qualify the straying The strayers mortality rate $sm_{j_1}$ from a departure basin calculate the portion of fish that ends in the death basin. It is given by: $$ sm_{ j_1} = \frac {w_{death}} { w_{death}+w_{j_1} }$$ The efficiency for strayers $se_{j2}$ informs on the proportion of fish that are able to reach a destination basin considering abundance in departure basins proportional to surface area of these basins. It is computed with: $$ se_{j_2} = \frac {\sum_{j_1}{A_{j_1} \cdot p_{j_1 \rightarrow j_2} }} {\sum_{j_1} {A_{j_1}}} $$ ## Atlantic area application ```{r} alpha0 = -2.9 alpha1 = 19.7 meanInterDistance = 300 standardDeviationInterDistance = 978 WDeathBasinRougier2005 = .4 ``` The kernel parameters for Atlantic Area application defined by @rougier2015 were $\alpha_0$ = `r alpha0`, $\alpha_1$ = `r alpha1`, $\mu_D$ = `r meanInterDistance` km and $\sigma_D$ = `r standardDeviationInterDistance` km. The death basin weight was $r$ = `r WDeathBasinRougier2005`. ```{r drawKernelFunctionAA, echo = FALSE, warning = FALSE, include = TRUE, fig.cap = "Kernel function for AA application"} data.frame(dist = seq(1,500, 10)) %>% mutate(W = logitKernel(dist, alpha0, alpha1, meanInterDistance, standardDeviationInterDistance)) %>% ggplot(aes(x=dist, y=W)) + geom_line() + labs(x = 'distance between departure and destination basins (km)') ``` ```{r computeDeathBassinWAA} # compute weight distanceAA <- distanceAA %>% mutate(W = logitKernel(distance, alpha0, alpha1, meanInterDistance, standardDeviationInterDistance)) meanW_AA <- riverBasinsAA %>% inner_join(distanceAA, by = c("basin_name" = "departure")) %>% group_by(basin_name) %>% summarise(sumW = sum(W, na.rm = TRUE), .groups = 'drop') %>% summarise(mean(sumW)) %>% unlist() %>% unname() meanW_AARouguier2015 <- riverAARougier2015 %>% inner_join(distanceAARougier2015, by = c("basin_name" = "departure")) %>% mutate(W = logitKernel(distance, alpha0, alpha1, meanInterDistance, standardDeviationInterDistance)) %>% group_by(basin_name) %>% summarise(sumW = sum(W, na.rm = TRUE), .groups = 'drop') %>% summarise(mean(sumW)) %>% unlist() %>% unname() WDeathBasinAA = round(meanW_AA * WDeathBasinRougier2005 / meanW_AARouguier2015,2) ``` In Rougier et al. (2015) application, the mean of weight sums is `r round(meanW_AARouguier2015,2)` for a death basin weight of `r round(WDeathBasinRougier2005,2)`. In the present AA application, the mean of weigth sums becomes `r round(meanW_AA,2)`. To keep in concordance with the previous application, the weight of the death basin is now `r round(WDeathBasinAA, 2)`. ```{r computeWeigthAA, echo = TRUE} # add surface area for departure basins extendedDistance = distanceAA %>% inner_join(riverBasinsAA %>% select(basin_name, surface_area_drainage_basin), by = c('departure' = 'basin_name')) %>% rename(surface_departure = surface_area_drainage_basin) # calculate sum W extendedDistance <- extendedDistance %>% inner_join(extendedDistance %>% group_by(departure) %>% summarise(sumW = sum(W, na.rm = TRUE), .groups = "drop"), by = c('departure')) # compute strayer mortality and strayer efficiency resultAA <- extendedDistance %>% distinct(departure, sumW) %>% mutate(sm_departure = WDeathBasinAA / (WDeathBasinAA + sumW)) %>% select(departure, sm_departure) %>% rename(basin_name = departure) %>% inner_join( extendedDistance %>% mutate(p12 = W / (WDeathBasinAA + sumW), Ap12 = surface_departure * p12) %>% group_by(destination) %>% summarise( sumA = sum(surface_departure), sumAp12 = sum(Ap12, na.rm = TRUE), .groups = 'drop' ) %>% mutate(se_destination = sumAp12 / sumA) %>% select (destination, se_destination) %>% rename(basin_name = destination), by = 'basin_name' ) %>% inner_join(riverBasinsAA %>% select(basin_name, lat_outlet), by='basin_name') %>% rename(latitude = lat_outlet) #resultAA ``` Strayers from basins at the distribution edge experience a higher mortality rate (Figure \@ref(fig:strayerMortalityAA)) that means if these basins are colonised they will contribute less to settlements in other basins. No trend is detected in the strayer efficiency except at the extreme northern range of the distribution (Figure \@ref(fig:stayerEfficiencyAA)). More or less if all basins are populated, every basin will receive strayers. Nevertheless there is a particular spot around latitude 43°c. ```{r strayerMortalityAA, echo =FALSE, warning = FALSE, include = TRUE, fig.cap="Evolution of the mortality rate according to the latitude of the departure basin in the AA zone"} resultAA %>% ggplot(aes(x = latitude, y = sm_departure)) + geom_point() + labs(x = "departure latitude (°)", y = "strayer mortality rate") #geom_abline(slope = 0, intercept = WDeathBasin / (WDeathBasin + meanW_AA)) ``` ```{r stayerEfficiencyAA, echo =FALSE, warning = FALSE, include = TRUE, fig.cap="Evolution of the strayers' efficiency according to the latitude of the destination basin in the AA zone"} resultAA %>% ggplot(aes(x = latitude, y = se_destination)) + geom_point() + labs(x = "destination latitude (°)", y = "strayer efficiency") + geom_text(data = filter(resultAA, se_destination > .02), aes(label = basin_name), hjust = 0, nudge_x = 0.05) ``` ## North East America NEA application ```{r computeWeighthNEA} distanceNEA <- distanceNEA %>% mutate(W = logitKernel(distance, alpha0, alpha1, meanInterDistance, standardDeviationInterDistance)) distanceNEA %>% group_by(departure) %>% summarise(min_interdistance = min( distance, na.rm = TRUE), .groups = 'drop') %>% mutate(pct19 = sum(min_interdistance < 19)/n(), pct111 = sum(min_interdistance < 111)/n(), median = median(min_interdistance)) distanceNEA %>% group_by(departure) %>% summarise(min_interdistance = min( distance, na.rm = TRUE), .groups = 'drop') %>% inner_join(riverBasinsNEA %>% select(basin_name, lat_outlet), by = c('departure' = 'basin_name')) %>% arrange(lat_outlet) %>% print(n = Inf) ggplot(aes(x = lat_outlet, y = min_interdistance )) +geom_point() + geom_abline(intercept = 111, slope = 0) ecdf1 <- ecdf(distanceNEA %>% group_by(departure) %>% summarise(min_interdistance = min( distance, na.rm = TRUE), .groups = 'drop') %>% select(min_interdistance) %>% unlist()) plot(ecdf1) abline(v=c(19, 111)) meanW_NEA <- riverBasinsNEA %>% inner_join(distanceNEA, by = c("basin_name" = "departure")) %>% mutate(W = logitKernel(distance, alpha0, alpha1, meanInterDistance, standardDeviationInterDistance)) %>% group_by(basin_name) %>% summarise(sumW = sum(W, na.rm = TRUE), .groups = 'drop') %>% summarise(mean(sumW)) %>% unlist(use.names = FALSE) WDeathBasinNEA = round(meanW_NEA * WDeathBasinRougier2005 / meanW_AARouguier2015,2) extendedDistance = distanceNEA %>% inner_join(riverBasinsNEA %>% select(basin_name, areasqkm), by = c('departure' = 'basin_name')) %>% rename(surface_departure = areasqkm) extendedDistance <- extendedDistance %>% inner_join(extendedDistance %>% group_by(departure) %>% summarise(sumW = sum(W, na.rm = TRUE), .groups = "drop"), by = c('departure')) resultNEA <- extendedDistance %>% distinct(departure, sumW) %>% mutate(sm_departure = WDeathBasinNEA / (WDeathBasinNEA + sumW)) %>% select(departure, sm_departure) %>% rename(basin_name = departure) %>% inner_join( extendedDistance %>% mutate(p12 = W / (WDeathBasinNEA + sumW), Ap12 = surface_departure * p12) %>% group_by(destination) %>% summarise( sumA = sum(surface_departure), sumAp12 = sum(Ap12, na.rm = TRUE), .groups = 'drop' ) %>% mutate(se_destination = sumAp12 / sumA) %>% select (destination, se_destination) %>% rename(basin_name = destination), by = 'basin_name' ) %>% inner_join(riverBasinsNEA %>% select(basin_name, lat_outlet), by ='basin_name') %>% rename(latitude = lat_outlet) ``` The strayer mortality for NEA zone displayed high values for latitudes close to 37° and higher than 42.5 with literally no survival after 50° (Figure \@ref(fig:strayerMortalityNEA). Low values of efficiency are calculated at the range of the distribution (Figure \@ref(fig:strayerEfficiencyLatitudeNEA). ```{r strayerMortalityNEA, echo =FALSE, warning = FALSE, include = TRUE, fig.cap="Evolution of strayers mortality according to departure basin latitude in the NEA zone"} resultNEA %>% ggplot(aes(x = latitude, y = sm_departure)) + geom_point() + labs(x = "departure latitude (°)", y = "strayer mortality rate") ``` ```{r strayerEfficiencyLatitudeNEA, echo =FALSE, warning = FALSE, include = TRUE, fig.cap="Evolution of strayers efficiency according to destination basin latitude in the NEA zone"} resultNEA %>% ggplot(aes(x = latitude, y = se_destination)) + geom_point() + labs(x = "destination latitude (°)", y = "strayer's efficiency") # geom_text(aes(label = basin_name), hjust = 0, nudge_x = 0.5) ``` ## Virtual linear network of basin ### Full universe ```{r fakeListOfBasin} nbBasin = 150 distBetweenBasin = 10 # create a fake basin tibble basinFake <- tibble(basin_name=paste0('B', formatC(1:nbBasin,width=3, flag = "0"))) %>% mutate(latitude = row_number()) # create a fake basin-to-basin distance distanceFake <- tibble(departure = basinFake$basin_name, destination = basinFake$basin_name) %>% expand(departure, destination ) %>% arrange(departure, destination) %>% inner_join(basinFake, by=c("departure" = "basin_name")) %>% rename(latitude_departure = latitude) %>% inner_join(basinFake, by=c("destination" = "basin_name")) %>% rename(latitude_destination = latitude) %>% mutate(distance = abs(latitude_departure - latitude_destination ) * distBetweenBasin) %>% mutate(W = logitKernel(distance, alpha0, alpha1, meanInterDistance, standardDeviationInterDistance)) %>% mutate(distance = replace(distance, departure == destination, NA)) %>% arrange(departure, destination) meanW_fake <- basinFake %>% inner_join(distanceFake, by = c("basin_name" = "departure")) %>% group_by(basin_name) %>% summarise(sumW = sum(W, na.rm = TRUE), .groups = 'drop') %>% summarise(mean(sumW)) %>% unlist(use.names = FALSE) WDeathBasinFake = round(meanW_fake * WDeathBasinRougier2005 / meanW_AARouguier2015,2) # compute sumW an p12 extendedDistance = distanceFake %>% inner_join(distanceFake %>% group_by(departure ) %>% summarise(sumW = sum(W, na.rm = TRUE), .groups = 'drop'), by='departure') %>% mutate(p12 = W/(sumW + WDeathBasinFake)) ``` Let a virtual basin with `r nbBasin` basins equally spaced by `r distBetweenBasin`. Figures \@ref(fig:smFake) and \@ref(fig:seFake) present the response in terms of strayers mortality and strayers efficiency. ```{r smFake, echo =FALSE, warning = FALSE, include = TRUE, fig.cap = "Evolution of strayer mortality according to latitude departure"} extendedDistance %>% distinct(departure, latitude_departure, sumW) %>% mutate(sm_departure = WDeathBasinFake / (WDeathBasinFake + sumW)) %>% ggplot(aes(x=latitude_departure, y = sm_departure)) + geom_point() + labs(x = 'latitude rank', y = 'strayer mortality rate') + xlim(0,150) + ylim(0.0,.15) ``` ```{r seFake, echo =FALSE, warning = FALSE, include = TRUE, fig.cap="Evolution of strayer efficiency according to departure latitude " } extendedDistance %>% group_by(destination, latitude_destination) %>% summarise(se_destination = mean(p12), .groups ='drop') %>% ggplot(aes(x=latitude_destination, y = se_destination)) + geom_point() + labs(x = 'latitude rank', y = 'strayer efficiency') + xlim(0,150) + ylim(0.0,.05) ``` ### Sampled universe ```{r irregular sampling of fake universe, echo FALSE} basinSampling <- basinFake %>% sample_n(25) %>% arrange(latitude) distanceSampling <- distanceFake %>% inner_join(basinSampling %>% select(basin_name), by = c("departure" = "basin_name")) %>% inner_join(basinSampling %>% select(basin_name), by = c("destination" = "basin_name")) # compute sumW an p12 extendedDistanceSampling = distanceSampling %>% inner_join(distanceSampling %>% group_by(departure ) %>% summarise(sumW = sum(W, na.rm = TRUE), .groups = 'drop'), by ='departure') meanW_sampling <- basinSampling %>% inner_join(distanceSampling, by = c("basin_name" = "departure")) %>% group_by(basin_name) %>% summarise(sumW = sum(W, na.rm = TRUE), .groups = 'drop') %>% summarise(mean(sumW)) %>% unlist(use.names = FALSE) WDeathBasinSampling = round(meanW_sampling * WDeathBasinRougier2005 / meanW_AARouguier2015,2) extendedDistanceSampling <- extendedDistanceSampling %>% mutate(p12 = W / (sumW + WDeathBasinSampling)) ``` Now we only consider a random sampling of `r nrow(basinSampling)` basins. The consequences of this sampling are presented in figures \@ref(fig :smSampled) and \@ref(fig:seSampled). ```{r smSampled, echo =FALSE, warning = FALSE, include = TRUE, fig.cap="Evolution of strayer mortality according to latitude departure" } extendedDistanceSampling %>% distinct(departure, latitude_departure, sumW) %>% mutate(sm_departure = WDeathBasinSampling /(WDeathBasinSampling + sumW)) %>% ggplot(aes(x=latitude_departure, y = sm_departure)) + geom_point() + labs(x='latitude rank', y = 'strayer mortality rate') + xlim(0,150) + ylim(0.0,.2) ``` ```{r seSampled, echo =FALSE, warning = FALSE, include = TRUE, fig.cap="Evolution of strayer efficiency according to departure latitude with HaDiad kernel function"} extendedDistanceSampling %>% group_by(destination, latitude_destination) %>% summarise(se_destination = mean(p12), .groups ='drop') %>% ggplot(aes(x = latitude_destination, y = se_destination)) + geom_point() + labs(x = 'latitude rank', y = 'strayer efficiency') + xlim(0,150) + ylim(0.0,0.1) ``` \#Comparison with HyDiaD formulation In HyDiad the kernel function in an extended negative exponential of the distance between basins. Mortality of strayers is simulated with a mortality coefficient according to inter-basin distances (\@ref(fig:smFunctionHaDiaD). Resulting strayer mortality pattern is presented in \@ref(fig : smHaDiaD). ```{r smFunctionHaDiaD, echo =FALSE, warning = FALSE, include = TRUE, fig.cap="Evolution of mortality efficiency according to departure latitude with HaDiad kernel function"} alpha_D = 0.0608 beta_D = 0.655 m = -log(0.464)/41 tibble(distance = 0:500) %>% mutate(strayerMortalityRate = 1- strayerSurvival(distance, m)) %>% ggplot(aes(x = distance, y = strayerMortalityRate)) + geom_line() ``` ```{r smHaDiaD, echo =FALSE, warning = FALSE, include = TRUE, fig.cap="Evolution of mortality efficiency according to departure latitude with HaDiad kernel function"} HADiaD <- distanceFake %>% mutate(W = eneKernel(distance, alpha_D, beta_D), p12 = W/ sum(W, na.rm = TRUE), strayerMortalityRate = 1 - strayerSurvival(distance, m)) HADiaD %>% group_by(departure, latitude_departure) %>% summarise(sm_departure = weighted.mean(strayerMortalityRate, W, na.rm = TRUE), .groups = 'drop' ) %>% ggplot(aes(x = latitude_departure, y =sm_departure)) + geom_point() ``` ```{r seHaDiaDHaDiaD, echo =FALSE, warning = FALSE, include = TRUE} # HADiaD %>% group_by(destination, latitude_destination) %>% # summarise(se_destination = weighted.mean(strayerMortalityRate, W, , na.rm = TRUE), .groups = 'drop') %>% # ggplot(aes(x = latitude_destination, y = se_destination)) + geom_point() # ``` # Conclusion The strayer mortality increases at the edge of the distribution. The selection of basins impacts the strayers mortality. It is probably safer to consider a constant rate rather a death basin. It is a priority to improve the coverage of the area by increasing the number of basins considered. The logit function to compute the basin weights introduces a plateau for the short distance that leads to a random destination in the departure vicinity. **Steps to calibrate online the process** - determine the kernel function (to fit the median and maximum distances for strayers); - calculate the destination basins weights for each departure basin; - calculate the sum of destination basins weights for each departure basin; - fix the death basin weight to the mean of these sums. # References