Structuration of the database

Consideration on VGEST Outputs

Outputs of VGEST executions are used for producing the outputs of Irmara.

Execution of VGEST can be hierarchized from the following parameters:

Input streamflows: historical, generated from historical, generated from climate projection (2 or more)
Reservoir local rule sets: from current rules to less constraint rules (6)
Target downstream stations (9)
Objectives: low-flow and high flow thresholds (7)

I intentionally rule out the method of distributing tasks between reservoirs because we have already chosen which one is the best and there are not a lot of output sensibility on this parameter because of local constraints (See #11 (closed)).

Each execution of VGEST is a combination of the parameters above. Running all the combinations leads at least to 756 executions of VGEST. The task vgest#7 (closed) should have been solved before running all these simulations ⏳. The loops of VGEST execution and storage of each result in a folder structure has already been addressed (See #5 (closed) and #9 (closed)). Currently, all parameters above are handled except for the inputs streamflows.

Now the question is: how organizing VGEST results to use them in IRMaRA?

IRMaRA requirements

We should look at the use case of the data by IRMaRA to see which is the more convenient organisation. There are two main point of view:

the annual risk assessment for one objective at one station (#14)
real time risk assessment for all objectives at all stations (#15)

In the first case, storing and reading data for one objective at one station for one rule set and one streamflow is convenient. In the second case, we need to read all stations and objectives for a particular day.

It seems that input streamflows (for sure) and reservoir rule set (not certain) are always treated separately. So these outputs could be stored in separated files.

If we store all stations and all objectives in one file, for a 5000 years of data, that lead to around 9x2x7x5000x365 = 230 millions of lines. Multiplied by 16 octets for storing volumes and the associated probability of occurrence we have to store more 3 Go of data. Not all years of data are relevant (we can delete all redundant values of probabilities) and data can also be compressed but that will lead to manage file of several hundred of Mb. It's not sustainable on a Shiny application.

The only way to be able to select quickly data in a large amount of data is to use a database management system such as MySQL, PostgreSQL or SQLite.

Depending on the system used, all data can be stored in a single table or a dedicated table for each input streamflow (MySQL, PostgreSQL) or in a dedicated database per input streamflow for SQLite.

Edited Jan 20, 2021 by Dorchies David