Add the regularization calibration
Add the regularization calibration
It would be nice to implement the regularisation calibration proposed by de Lavenne et al. (2019) (https://doi.org/10.1029/2018WR024266). That would stabilise parameters for downstream basins.
Basically, we need to consider in the calibration process the distance between calibrated parameter values and parameter values of upstream catchments.
That would be useful for @cyril.thebault as well as Laura and other people using airGR-SD.
I would say that we need for that:
-
to have access to upstream parameter values in
Calibration_Michel
-
add a
regul
boolean in CreateCalibOptions and retrieve it inCalibration_Michel
-
either a new
Error_Crit
that allows to combine anyError_Crit
with the distance calculation, or implement directly that inCalibration_Michel
-
distance must be calculated in the transformed space of parameters to avoid having parameters with huge distance (e.g. X1) compared to smaller parameters (e.g. X4)
-
certainly other things that will come up when we try to implement it. :)
Who wants to give it a try? :)
Activity
- Thirel Guillaume added ENHANCEMENT R labels
added ENHANCEMENT R labels
- Delaigue Olivier changed title from Regularization calibration to Add the regularization calibration
changed title from Regularization calibration to Add the regularization calibration
- Developer
The regularisation of de Lavenne has 2 aspects:
- a parameter spacialisation of ungauged stations in the Intermediary Contributing Area (ICA)
- the regularisation of the parameters of the gauging station of the ICA
We choose to only implement the second aspect in airGR at first.
Theory
A priori parameter set
A priori parameter sets tested in the article are:
- parameter set of the closest calibrated station using rescaled Ghosh distance to determine which one it is
- parameter set established from physical parameters and regressions
- optimal parameters obtained by the calibration of a lumped model of the whole catchment
For airGR, given the available data already used in the package, we have decided to start by using the upstream gauging station that provides the highest contribution to the flow.
@alban.de-lavenne has begin an R-Package calculating Ghosh distance from a shape file. This one would be useful to determine which station parameter set to use as "a priori" in future developments.
Objective function
Regularisation consists in taking into account the difference between the a priori parameters and the "optimal" ones. It results in the following objective function F_{obj}, which mixes a classical criterion and the distance between a priori and optimal parameters:
F_{obj}(\theta_{opt}) = (1 - k).C_{hyd}(\theta_{opt})+k.R_{gap}(\theta_{opt})with :
- \theta_{opt} set of parameters used during optimisation in the transformed space
- k the weight given to the regularisation term relatively to the hydrological performance criterion
- C_{hyd} the hydrological performance criterion (could be KGE(Q), NSE(sqrt(Q)) or whatever)
- R_{gap} the regularisation term based on the gap between a priori and optimal parameter sets (see below)
R_{gap}(\theta_{opt}) = 1 - \sqrt{\sum_{i=1}^{n} \left( \frac{\theta_{apr}^{i} - \theta_{opt}^{i}}{\theta_{range}} \right )^2}. \max(0, C_{hyd}(\theta_{apr}))with:
- \theta_{apr}^i and \theta_{opt}^i ith item of respectively the a priori and the optimal parameter sets
- \theta_{range}^i range of the ith parameter
Implementation
Modification to
RunModel_*
airGR uses functions named
ErrorCrit_*
to calculate the objective function, which use the following arguments:-
InputsCrit
an object containing the criterion function(s) and the input data used for calculating the criteria -
OutputsModel
the object produced byRunModel_*
The parameter set \theta_{opt} is missing in these objects and the only parameter which can get it during calibration is
OutputsModel
. Amending the output ofRunModel_*
with a new itemOutputsModel$ParamR
would solve this issue.Development of an
ErrorCrit_GAPX
and calculation of the final objective functionA priori parameter set \theta_{apr} and performance criterion C_{hyd}(\theta_{apr}) of the a priori gauging station can be added to the
InputsCrit
object (in the parameters of the (see below).The range of the parameters \theta_{range}^i is dependant on the model used and this information could be produced by the
CreateRunOptions
function and provided by the argumentRunOptions
at the creation of theInputsCrit
object.@guillaume.thirel talked about using the
CalibOptions
object to store the weight parameterk
. It seems that it's not feasible because this parameter is not involved in the calculation of the performance criterion. In fact such objective function would be possible to be written as follow without adding complexity toCreateInputsCrit
:CreateInputsCrit_DeLavenne <- function(FUN_CRIT = ErrorCrit_KGE, InputsModel, RunOptions, AprParamR, AprCrit, k = 0.15, ...) { InputsCrit <- CreateInputsCrit( FUN_CRIT = list(FUN_CRIT, ErrorCrit_GAPX), InputsModel, RunOptions, Obs = list(Q = Qobs, AprParamR = AprParamR, AprCrit = AprCrit) Weights = c(1-k, k), ... }
I'm not completely sure that a priori information can be inserted in the argument
Obs
with the current code but I suggest to do it this way in order to avoid to add new parameters toCreateInputsCrit
function each time we want to add new kind of input criteria in it in the future.Reference
Lavenne, A. de, Andréassian, V., Thirel, G., Ramos, M.-H., Perrin, C., 2019. A Regularization Approach to Improve the Sequential Calibration of a Semidistributed Hydrological Model. Water Resources Research 55, 8821–8839. https://doi.org/10.1029/2018WR024266
Edited by Dorchies David - Author Owner
Thanks @david.dorchies . I amended your message with some typos.
Some comments:
-
regarding the parameter ranges. In airGR, with the GR models, these ranges are equal to 20 for all parameters as we use transformed space for parameter calibration (this is a kind of norming). We can either fix this range to 20 in the calculation, or allow them to be given in case we want to have a generic code (@olivier.delaigue can you check if in airGRplus the range is also 20 ?).
-
"@guillaume.thirel talked about using the
CalibOptions
object to store the weight parameterk
. It seems that it's not feasible because this parameter is not involved in the calculation of the performance criterion". You're right: actually my comment was correct talking about calibration only, as inCalibration_Michel
we have access toCalibOptions
. But if we want to calculate the regularised criterion outside calibration, e.g. for evaluation, that does not work anymore. I agree on providingAprParamR
,AprCrit
andk
toCreateInputsCrit
without ading arguments. In my opinion,AprParamR
,AprCrit
can be provided toObs
if we modifyCreateInputsCrit
, such as we did in the past forSCA
andSWE
.
-
- Owner
One comment about R_{gap}(\theta_{opt}): \max(0, C_{hyd}(\theta_{apr})) should only be used for KGE or NSE. It aims to reduce the influence of the regularisation when \theta_{apr} did not provide good performances: below a performance of 0, no regularization is done. If you want to use it with any criterion (such as RMSE), this part of the equation should be written more generically. It could be rescaled between 0 and 1, with 0 defining the acceptable performance threshold.
- Developer
Since the criterion of the upstream basin is directly provided by the user as a number, it's not possible to verify it's origin. However, we can add a fatal error if the criterion is more than one.
Concerning the downstream criterion, it's not actually possible to use RMSE as composed criterion: https://gitlab.irstea.fr/HYCAR-Hydro/airgr/-/blob/dev/R/UtilsErrorCrit.R#L13. So that already constrains the user possibilities.
One other thing to notice: the composed criterion F_{obj}(\theta_{opt}) is used for calibration, but for the next downstream sub-basin, we need to recompute a single criterion (KGE or whatever) for using C_{hyd}(\theta_{apr}).
Edited by Dorchies David - Owner
Maybe a safe solution would be to define this additional term \max(0, C_{hyd}(\theta_{apr})) as something optional, and let the user decide whether it makes sense to use it or not. R_{gap}(\theta_{opt}) can live without it.
- Developer
We can set
AprCrit = 1
as default parameter. That will do the trick! - Please register or sign in to reply
- Author Owner
That's a good point!
- Dorchies David created merge request !51 (merged) to address this issue
created merge request !51 (merged) to address this issue
- Dorchies David mentioned in merge request !51 (merged)
mentioned in merge request !51 (merged)
- Developer
We must also check the compatibility of the a priori model with the downstream model! It's not possible to constrain the calibration parameters of a GR4J with the ones of a GR6J for example. Can we consider GR4J and CemaNeigeGR4J as compatible on hydrologic model parameters?
What do you think about this strategy:
- add the hydrologic model name in the classes of
OutputsModel
- copy the classes of
OutputsModel
inOutputsCrit
- pass an
OutputsCrit
object in the argumentObs = list(AprCrit = OutputsCrit)
inCreateInputsCrit
- in
ErrorCrit_GAPX
checking compatibility of classes ofAprCrit
andOutputsModel
- add the hydrologic model name in the classes of
- Delaigue Olivier closed via merge request !51 (merged)
closed via merge request !51 (merged)
- Delaigue Olivier reopened
reopened
- Developer
I have an issue with the implementation.
We need to use the transformed parameters for the evaluation of GAPX. The parameters available in
RunModel
and communicable toOutputsModel
aren't transformed. So I need to do the transformation inErrorCrit_GAPX
. Unfortunately, the transformation function is only defined inCreateCalibOptions
and stored inCalibOptions
which is not available in the evaluation of the performance criterion.For the record,
ErrorCrit
functions only takeInputsCrit
,OutputsModel
, andFUN_CRIT
as arguments.I don't think that change the content of
InputsCrit
orOutputsModel
for including the parameter transformation function in them is a good idea.With the information contained in
OutputsModel
, it's possible to set the transformation function but it could be inefficient to do this at each calculation of the performance criteria.The last possible way is to directly include the transformation function in the
ErrorCrit_GAPX
function. This can be done with a function factory (also called "closure") taking the transformation as parameter and returning theErrorCrit_GAPX
function.This closure can look like this:
CreateErrorCrit_GAPX <- function(FUN_TRANSFO) { function(InputsCrit, OutputsModel) { ParamTrans <- FUN_TRANSFO(OutputsModel$Param, "RT") return(1 - sqrt(...)) } }
The user create its ErrorCrit function as follow:
ErrorCrit_GAPX <- CreateErrorCrit_GAPX(CalibOptions$FUN_TRANSFO)
And then, he can use it as any classical
ErrorCrit
function. That adds an extra step in the process but with a well explained vignette that would be OK.Edited by Dorchies David - Dorchies David created merge request !52 (closed) to address this issue
created merge request !52 (closed) to address this issue
- Dorchies David mentioned in merge request !52 (closed)
mentioned in merge request !52 (closed)
- Dorchies David mentioned in commit c7d09d67
mentioned in commit c7d09d67
- Dorchies David mentioned in commit 62d442e4
mentioned in commit 62d442e4
- Dorchies David mentioned in commit 8aa83dc0
mentioned in commit 8aa83dc0
- Dorchies David mentioned in commit 6a087e83
mentioned in commit 6a087e83
- Dorchies David mentioned in commit 797bcca2
mentioned in commit 797bcca2
- Dorchies David mentioned in commit 409d49ff
mentioned in commit 409d49ff
- Dorchies David mentioned in commit 860ab0c1
mentioned in commit 860ab0c1
- Developer
@alban.de-lavenne I have a theoretical question:
Why calculate the sum of the normalised parameter differences and not the mean?
Imagine, you want to compare GR4J and GR6J models with regularisation. Because of the sum, the 6 model parameters will be more penalised than the 4 one, even if the normalised difference between all parameters is the same. Is there a reason for that?
Edited by Dorchies David - Owner
The idea was simply to calculate an Euclidean distance between two points in the parameter space: one point is the a priori and the other point is the optimised one. It is true that this distance can increase if the number of dimensions increases. But the strength of the constraint is supposed to be adjusted by the value of k. I did not think of this situation where one would want to use the same k value for two different models. Using the mean could solve this problem, but may lead to an update of the k value.
- Dorchies David mentioned in commit 578bf7cc
mentioned in commit 578bf7cc
- Dorchies David mentioned in commit c81769fa
mentioned in commit c81769fa
- Dorchies David mentioned in commit 0829d8e4
mentioned in commit 0829d8e4
- Dorchies David mentioned in commit 734b0ef1
mentioned in commit 734b0ef1
- Dorchies David mentioned in commit 75b19ef6
mentioned in commit 75b19ef6
- Dorchies David mentioned in commit 092aab71
mentioned in commit 092aab71
- Dorchies David mentioned in commit e88b48a0
mentioned in commit e88b48a0
- Dorchies David mentioned in commit ce6b470a
mentioned in commit ce6b470a
- Dorchies David mentioned in commit 37c1d111
mentioned in commit 37c1d111
- Dorchies David mentioned in commit 0b84067a
mentioned in commit 0b84067a
- Developer
I have finished a first version of the implementation of the De Lavenne criterion.
You can install this version with:
install.packages("remotes") remotes::install_gitlab("HYCAR-Hydro/airgr@111-add-the-regularization-calibration-2", host = "gitlab.irstea.fr", dependencies = TRUE, build_vignettes = TRUE)
(On Windows, you also need to have Rtools installed for compilation: https://cran.r-project.org/bin/windows/Rtools/)
For the review of the code, clone the depo and checkout the branch
111-add-the-regularization-calibration-2
and please use the comments of the dedicated merge request: !52 (closed)The new functionalities are split into 2 parts: the GAPX criterion and the "De Lavenne" criterion which is a composed criterion (KGE+GAPX) which handles GAPX transparently. To get started, you can:
- Have a look to the documentation:
?CreateErrorCrit_GAPX
and?CreateInputsCrit_DeLavenne
- Run the examples:
example("CreateErrorCrit_GAPX")
andexample("CreateInputsCrit_DeLavenne")
- Run the vignette
V05_sd_model
which implement an example of calibration with regularisation
Let's discuss here about the user experience with this new functionalities (usage and documentation). For the code implementation and its review, please see: !52 (closed)
- Have a look to the documentation:
Thank you for this first version, I ran the examples and it seems to work !
However I have some questions on the "CreateInputsCrit_DeLavenne". Indeed, I thought this criterion was designed for calibration of the Intermediary Contributing Area (ICA) and when I check the example it looks like a validation of any catchment. However, the vignette is more explicit and gives a good understanding of the use of this criterion. Then, I was wondering if, when we have several upstream subcatchment, the choice of a priori parameters (according to their contribution to the flow at the outlet) is already implemented or not ? Finally, I tried to use the regularisation on one of my catchment (semi-distributed with 1 upstream, and 1 intermediary) but I'm working with hourly data with GR5H and I have an error when I'm doing :
InputsCrit <- CreateInputsCrit_DeLavenne(FUN_CRIT = ErrorCrit_KGE, InputsModel = InputsModel, RunOptions = RunOptions_Cal, Obs = tmp$Qmm[Ind_Run_Cal], VarObs = "Q", AprParamR = Qamont[[1]]CalibParamFinalR, AprCrit = Qamont[[1]]CalibCritFinal)
Error in FUN_GR(ParamIn[, 2:NParam], Direction) : the GR5H model requires 5 parameters
Qamont[[1]]CalibParamFinalR [1] 219.331 -0.179 289.823 5.906 0.189
Edited by Thebault Cyril- Developer
Hi @cyril.thebault,
Thanks for testing!
The downstream sub-basin is a GR5H+Lag model, so the a priori parameter set must contains 6 parameters not 5.
The a velocity is not defined in the upstream basin parameter set, so you have 2 options:
- give an a priori value such as 1 m/s
AprParamR = c(1, Qamont[[1]]$Calib$ParamFinalR)
- give no a priori:
AprParamR = c(NA, Qamont[[1]]$Calib$ParamFinalR)
I didn't test the second option and I'm afraid that the code will crash with it. If so, tell me and I will fixe it quickly.
I agree that the error message is inappropriate and should be more specific.
Edited by Dorchies David - give an a priori value such as 1 m/s
- Developer
I've just checked and the second option works like a charm.
I also checked, indeed it works well!
- Developer
I agree that the error message is inappropriate and should be more specific.
Fixed in bc3d576d.
- Developer
Question: what is the range of the transformed GR parameters?
[-10;10]
or[-20;20]
? - Developer
From
TransfoParam_Lag
, I suppose that it's[-10;10]
so the total range is 20. I have to fix that, actual code is based on 40. - Author Owner
[-10; 10], yes. (sorry, I have not read the rest of the thread)
- Dorchies David mentioned in commit 1039dc73
mentioned in commit 1039dc73
- Dorchies David mentioned in commit dd7ef429
mentioned in commit dd7ef429
- Dorchies David mentioned in commit e03b26b4
mentioned in commit e03b26b4
- Dorchies David mentioned in commit 419baf7e
mentioned in commit 419baf7e
- Dorchies David mentioned in commit 01dff066
mentioned in commit 01dff066
- Dorchies David mentioned in commit 755717fc
mentioned in commit 755717fc
- Dorchies David mentioned in commit b630f3ac
mentioned in commit b630f3ac
- Dorchies David mentioned in commit be85a67c
mentioned in commit be85a67c
- Dorchies David mentioned in commit 6acf3a56
mentioned in commit 6acf3a56
- Dorchies David mentioned in commit 1380d8ff
mentioned in commit 1380d8ff
- Dorchies David mentioned in commit c00db7b3
mentioned in commit c00db7b3
- Dorchies David mentioned in commit 3868c72b
mentioned in commit 3868c72b
- Dorchies David mentioned in commit e1bde972
mentioned in commit e1bde972
- Dorchies David mentioned in commit 8525246a
mentioned in commit 8525246a
- Dorchies David mentioned in commit e99440de
mentioned in commit e99440de