Add the regularization calibration

added ENHANCEMENT R labels

changed title from Regularization calibration to Add the regularization calibration

The regularisation of de Lavenne has 2 aspects:

a parameter spacialisation of ungauged stations in the Intermediary Contributing Area (ICA)
the regularisation of the parameters of the gauging station of the ICA

We choose to only implement the second aspect in airGR at first.

Theory

A priori parameter set

A priori parameter sets tested in the article are:

parameter set of the closest calibrated station using rescaled Ghosh distance to determine which one it is
parameter set established from physical parameters and regressions
optimal parameters obtained by the calibration of a lumped model of the whole catchment

For airGR, given the available data already used in the package, we have decided to start by using the upstream gauging station that provides the highest contribution to the flow.

@alban.de-lavenne has begin an R-Package calculating Ghosh distance from a shape file. This one would be useful to determine which station parameter set to use as "a priori" in future developments.

Objective function

Regularisation consists in taking into account the difference between the a priori parameters and the "optimal" ones. It results in the following objective function F_{obj}, which mixes a classical criterion and the distance between a priori and optimal parameters:

with :

\theta_{opt} set of parameters used during optimisation in the transformed space
k the weight given to the regularisation term relatively to the hydrological performance criterion
C_{hyd} the hydrological performance criterion (could be KGE(Q), NSE(sqrt(Q)) or whatever)
R_{gap} the regularisation term based on the gap between a priori and optimal parameter sets (see below)

with:

\theta_{apr}^i and \theta_{opt}^i i^th item of respectively the a priori and the optimal parameter sets
\theta_{range}^i range of the i^th parameter

Implementation

Modification to `RunModel_*`

airGR uses functions named ErrorCrit_* to calculate the objective function, which use the following arguments:

InputsCrit an object containing the criterion function(s) and the input data used for calculating the criteria
OutputsModel the object produced by RunModel_*

The parameter set \theta_{opt} is missing in these objects and the only parameter which can get it during calibration is OutputsModel. Amending the output of RunModel_* with a new item OutputsModel$ParamR would solve this issue.

Development of an `ErrorCrit_GAPX` and calculation of the final objective function

A priori parameter set \theta_{apr} and performance criterion C_{hyd}(\theta_{apr}) of the a priori gauging station can be added to the InputsCrit object (in the parameters of the (see below).

The range of the parameters \theta_{range}^i is dependant on the model used and this information could be produced by the CreateRunOptions function and provided by the argument RunOptionsat the creation of the InputsCrit object.

@guillaume.thirel talked about using the CalibOptions object to store the weight parameter k. It seems that it's not feasible because this parameter is not involved in the calculation of the performance criterion. In fact such objective function would be possible to be written as follow without adding complexity to CreateInputsCrit:

CreateInputsCrit_DeLavenne <- function(FUN_CRIT = ErrorCrit_KGE, 
                                       InputsModel, RunOptions, 
                                       AprParamR, AprCrit,
                                       k = 0.15, ...) {
  InputsCrit <- CreateInputsCrit(
    FUN_CRIT = list(FUN_CRIT, ErrorCrit_GAPX),
    InputsModel, RunOptions, 
    Obs = list(Q = Qobs, AprParamR = AprParamR, AprCrit = AprCrit)
    Weights = c(1-k, k), ...
}

I'm not completely sure that a priori information can be inserted in the argument Obs with the current code but I suggest to do it this way in order to avoid to add new parameters to CreateInputsCrit function each time we want to add new kind of input criteria in it in the future.

Reference

Lavenne, A. de, Andréassian, V., Thirel, G., Ramos, M.-H., Perrin, C., 2019. A Regularization Approach to Improve the Sequential Calibration of a Semidistributed Hydrological Model. Water Resources Research 55, 8821–8839. https://doi.org/10.1029/2018WR024266

Thanks @david.dorchies . I amended your message with some typos.

Some comments:

regarding the parameter ranges. In airGR, with the GR models, these ranges are equal to 20 for all parameters as we use transformed space for parameter calibration (this is a kind of norming). We can either fix this range to 20 in the calculation, or allow them to be given in case we want to have a generic code (@olivier.delaigue can you check if in airGRplus the range is also 20 ?).
"@guillaume.thirel talked about using the CalibOptions object to store the weight parameter k. It seems that it's not feasible because this parameter is not involved in the calculation of the performance criterion". You're right: actually my comment was correct talking about calibration only, as in Calibration_Michel we have access to CalibOptions. But if we want to calculate the regularised criterion outside calibration, e.g. for evaluation, that does not work anymore. I agree on providing AprParamR, AprCrit and k to CreateInputsCrit without ading arguments. In my opinion, AprParamR, AprCrit can be provided to Obs if we modify CreateInputsCrit, such as we did in the past for SCA and SWE.

One comment about R_{gap}(\theta_{opt}): \max(0, C_{hyd}(\theta_{apr})) should only be used for KGE or NSE. It aims to reduce the influence of the regularisation when \theta_{apr} did not provide good performances: below a performance of 0, no regularization is done. If you want to use it with any criterion (such as RMSE), this part of the equation should be written more generically. It could be rescaled between 0 and 1, with 0 defining the acceptable performance threshold.

Since the criterion of the upstream basin is directly provided by the user as a number, it's not possible to verify it's origin. However, we can add a fatal error if the criterion is more than one.

Concerning the downstream criterion, it's not actually possible to use RMSE as composed criterion: https://gitlab.irstea.fr/HYCAR-Hydro/airgr/-/blob/dev/R/UtilsErrorCrit.R#L13. So that already constrains the user possibilities.

One other thing to notice: the composed criterion F_{obj}(\theta_{opt}) is used for calibration, but for the next downstream sub-basin, we need to recompute a single criterion (KGE or whatever) for using C_{hyd}(\theta_{apr}).

Maybe a safe solution would be to define this additional term \max(0, C_{hyd}(\theta_{apr})) as something optional, and let the user decide whether it makes sense to use it or not. R_{gap}(\theta_{opt}) can live without it.

We can set AprCrit = 1 as default parameter. That will do the trick!

That's a good point!

created merge request !51 (merged) to address this issue

mentioned in merge request !51 (merged)

We must also check the compatibility of the a priori model with the downstream model! It's not possible to constrain the calibration parameters of a GR4J with the ones of a GR6J for example. Can we consider GR4J and CemaNeigeGR4J as compatible on hydrologic model parameters?

What do you think about this strategy:

add the hydrologic model name in the classes of OutputsModel
copy the classes of OutputsModel in OutputsCrit
pass an OutputsCrit object in the argument Obs = list(AprCrit = OutputsCrit) in CreateInputsCrit
in ErrorCrit_GAPX checking compatibility of classes of AprCrit and OutputsModel

closed via merge request !51 (merged)

reopened

I have an issue with the implementation.

We need to use the transformed parameters for the evaluation of GAPX. The parameters available in RunModel and communicable to OutputsModel aren't transformed. So I need to do the transformation in ErrorCrit_GAPX. Unfortunately, the transformation function is only defined in CreateCalibOptions and stored in CalibOptions which is not available in the evaluation of the performance criterion.

For the record, ErrorCrit functions only take InputsCrit, OutputsModel, and FUN_CRIT as arguments.

I don't think that change the content of InputsCrit or OutputsModel for including the parameter transformation function in them is a good idea.

With the information contained in OutputsModel, it's possible to set the transformation function but it could be inefficient to do this at each calculation of the performance criteria.

The last possible way is to directly include the transformation function in the ErrorCrit_GAPX function. This can be done with a function factory (also called "closure") taking the transformation as parameter and returning the ErrorCrit_GAPX function.

This closure can look like this:

CreateErrorCrit_GAPX <- function(FUN_TRANSFO) {
  function(InputsCrit, OutputsModel) {
    ParamTrans <- FUN_TRANSFO(OutputsModel$Param, "RT")
    return(1 - sqrt(...))
  }
}

The user create its ErrorCrit function as follow:

ErrorCrit_GAPX <- CreateErrorCrit_GAPX(CalibOptions$FUN_TRANSFO)

And then, he can use it as any classical ErrorCrit function. That adds an extra step in the process but with a well explained vignette that would be OK.

created merge request !52 (closed) to address this issue

mentioned in merge request !52 (closed)

mentioned in commit c7d09d67

mentioned in commit 62d442e4

mentioned in commit 8aa83dc0

mentioned in commit 6a087e83

mentioned in commit 797bcca2

mentioned in commit 409d49ff

mentioned in commit 860ab0c1

@alban.de-lavenne I have a theoretical question:

Why calculate the sum of the normalised parameter differences and not the mean?

Imagine, you want to compare GR4J and GR6J models with regularisation. Because of the sum, the 6 model parameters will be more penalised than the 4 one, even if the normalised difference between all parameters is the same. Is there a reason for that?

The idea was simply to calculate an Euclidean distance between two points in the parameter space: one point is the a priori and the other point is the optimised one. It is true that this distance can increase if the number of dimensions increases. But the strength of the constraint is supposed to be adjusted by the value of k. I did not think of this situation where one would want to use the same k value for two different models. Using the mean could solve this problem, but may lead to an update of the k value.

mentioned in commit 578bf7cc

mentioned in commit c81769fa

mentioned in commit 0829d8e4

mentioned in commit 734b0ef1

mentioned in commit 75b19ef6

mentioned in commit 092aab71

mentioned in commit e88b48a0

mentioned in commit ce6b470a

mentioned in commit 37c1d111

mentioned in commit 0b84067a

I have finished a first version of the implementation of the De Lavenne criterion.

You can install this version with:

install.packages("remotes")
remotes::install_gitlab("HYCAR-Hydro/airgr@111-add-the-regularization-calibration-2", host = "gitlab.irstea.fr", dependencies = TRUE, build_vignettes = TRUE)

(On Windows, you also need to have Rtools installed for compilation: https://cran.r-project.org/bin/windows/Rtools/)

For the review of the code, clone the depo and checkout the branch 111-add-the-regularization-calibration-2 and please use the comments of the dedicated merge request: !52 (closed)

The new functionalities are split into 2 parts: the GAPX criterion and the "De Lavenne" criterion which is a composed criterion (KGE+GAPX) which handles GAPX transparently. To get started, you can:

Have a look to the documentation: ?CreateErrorCrit_GAPX and ?CreateInputsCrit_DeLavenne
Run the examples: example("CreateErrorCrit_GAPX") and example("CreateInputsCrit_DeLavenne")
Run the vignette V05_sd_model which implement an example of calibration with regularisation

Let's discuss here about the user experience with this new functionalities (usage and documentation). For the code implementation and its review, please see: !52 (closed)

Thank you for this first version, I ran the examples and it seems to work !

However I have some questions on the "CreateInputsCrit_DeLavenne". Indeed, I thought this criterion was designed for calibration of the Intermediary Contributing Area (ICA) and when I check the example it looks like a validation of any catchment. However, the vignette is more explicit and gives a good understanding of the use of this criterion. Then, I was wondering if, when we have several upstream subcatchment, the choice of a priori parameters (according to their contribution to the flow at the outlet) is already implemented or not ? Finally, I tried to use the regularisation on one of my catchment (semi-distributed with 1 upstream, and 1 intermediary) but I'm working with hourly data with GR5H and I have an error when I'm doing :

InputsCrit <- CreateInputsCrit_DeLavenne(FUN_CRIT = ErrorCrit_KGE, InputsModel = InputsModel, RunOptions = RunOptions_Cal, Obs = tmp$Qmm[Ind_Run_Cal], VarObs = "Q", AprParamR = Qamont[[1]]CalibParamFinalR, AprCrit = Qamont[[1]]CalibCritFinal)

Error in FUN_GR(ParamIn[, 2:NParam], Direction) : the GR5H model requires 5 parameters

Qamont[[1]]CalibParamFinalR [1] 219.331 -0.179 289.823 5.906 0.189

Hi @cyril.thebault,

Thanks for testing!

The downstream sub-basin is a GR5H+Lag model, so the a priori parameter set must contains 6 parameters not 5.

The a velocity is not defined in the upstream basin parameter set, so you have 2 options:

give an a priori value such as 1 m/s AprParamR = c(1, Qamont[[1]]$Calib$ParamFinalR)
give no a priori: AprParamR = c(NA, Qamont[[1]]$Calib$ParamFinalR)

I didn't test the second option and I'm afraid that the code will crash with it. If so, tell me and I will fixe it quickly.

I agree that the error message is inappropriate and should be more specific.

I've just checked and the second option works like a charm.

I also checked, indeed it works well!

I agree that the error message is inappropriate and should be more specific.

Fixed in bc3d576d.

Question: what is the range of the transformed GR parameters? [-10;10] or [-20;20] ?

From TransfoParam_Lag, I suppose that it's [-10;10] so the total range is 20. I have to fix that, actual code is based on 40.

[-10; 10], yes. (sorry, I have not read the rest of the thread)

mentioned in commit 1039dc73

mentioned in commit dd7ef429

mentioned in commit e03b26b4

mentioned in commit 419baf7e

mentioned in commit 01dff066

mentioned in commit 755717fc

mentioned in commit b630f3ac

mentioned in commit be85a67c

mentioned in commit 6acf3a56

mentioned in commit 1380d8ff

mentioned in commit c00db7b3

mentioned in commit 3868c72b

mentioned in commit e1bde972

mentioned in commit 8525246a

mentioned in commit e99440de

Add the regularization calibration

Tasks

Activity

Theory

A priori parameter set

Objective function

Implementation

Modification to `RunModel_*`

Development of an `ErrorCrit_GAPX` and calculation of the final objective function

Reference

Menu

Add the regularization calibration

Tasks

Linked items 0

Related merge requests

Activity

Theory

A priori parameter set

Objective function

Implementation

Modification to RunModel_*

Development of an ErrorCrit_GAPX and calculation of the final objective function

Reference

Linked items
0

Modification to `RunModel_*`

Development of an `ErrorCrit_GAPX` and calculation of the final objective function