Implement automatic tests in the package

added ENHANCEMENT label

Currently, the vignette codes are not evaluated. This is due to the fact that the computation times are too long and do not pass the CRAN checks.

OK, indeed, there is a time limit for the tests that will be automatically executed on RCran (see http://r-pkgs.had.co.nz/tests.html#test-cran). All tests must last less than one minute.

However, you can write these tests to run locally and later in gitlab-ci and indicate that they should not be run on Rcran using the skip_on_cran() command.

For Rcran, we will only leave those tests that do not require computation, such as tests that check for error messages returned by functions when you enter incorrect data. Or at least a single run of each model with a very short time series just to make sure that the compiled functions run correctly.

What I meant to say was that most of the chunks of the vignettes are not evaluated, because the construction of the vignette by the CRAN should not take too much time. Uwe Liggs asked us to do this.

But it is of course possible to run the codes externally to the vignettes and integrate the command lines automatically into these vignettes.

OK, I've just seen that, except for V01_get_started.Rmd all the chunks have the eval=FALSE property

But it is of course possible to run the codes externally to the vignettes and integrate the command lines automatically into these vignettes.

How do you do that? Do you think that, for example, we can have some code chunks that we can both call from the tests and display in vignettes? For the moment, the only way I see to test the code chunks located in the vignettes is to duplicate these code chunks in the tests.

This is possible by putting tags in the vignettes and pasting code instead, but this is not very clean...

In fact, a better idea is probably to do the opposite: extract the code from the vignettes with the knitr::purl() function and run it after.

This will require to remove the hash sign at the beginning of the commands we want to evaluate (because the eval = FALSE argument change these commands to comments), maybe using the purl argument in the chunks.

Rmarkdown source file:

    ```{r}
    1 + 1
    ```
 
    ```{r eval=FALSE, purl=TRUE}
    2 + 2
    ```
 
    ```{r eval=TRUE, purl=FALSE}
    3 + 3
    ```

R result file:

## ----------------------------------
1 + 1


## ----eval=FALSE, purl=TRUE---------
## 2 + 2

I didn't know the knit::purl function that is very promising in this case.

This how I see the thing:

For the tests in general

Put all the scripts related to the tests in the tests directory
Use the testthat package and write test scripts in the tests/testthat directory (as required by testthat)
Store useful function for tests in the tests/R directory

For testing the vignette chunk codes

Write a function RunChunk(file, chunk_number) that will:
- run knit::purl
- remove extra ## for the chunks where eval=FALSE
- store the result in tests/tmp extract (and accordingly modify .gitignore)
- run the requested chunk
write tests which at least runs each chunk and check the inputs (Are the sample data coherent with the one used in the vignettes?) and some outputs (expected data structures and values)

It's a first proposal. Feel free to amend it.

I agree with all the proposals, except I think we need to use the purl argument in the chunks, because we're not interested in all chunks using eval=FALSE.

Since I propose to write and use a function RunChunk that allows you to run selected chunks in a Rmd file, I don't think we need to add a purl argument on the chunks we never want to run in the tests.

Moreover, the function proposes to run a numbered chunk from the pulp extraction. Using pulp=FALSE will force us to count only the chunk without pulp=FALSE in the original Rmd document in order to identify the chunk we want to run. I found that a bit tricky.

made the issue confidential

We can also run the example codes of the help pages using the example() command.

We can also run the example codes of the help pages using the example() command.

Good idea ! That seems easy to do.

I've some updates concerning the propositions of implementation above.

For the tests in general, we don't need to store useful functions in tests/R folder. There is already an internal mechanism in testthat that automatically source the files named tests/testthat/helper-*.R before the test execution. So, I propose to write the useful functions for the tests in such files.

For testing the vignette chunk codes, after discussion with @olivier.delaigue, we're going to execute a whole vignette at a time instead of chunk by chunk due to complication for identifying a particular chunk in a vignette. Arguments eval and purl attached to each chunk allow us to define if the chunk has to be run in the vignette and/or in the tests.

Running automatically vignette chunks and examples in the tests will allow us to detect as soon as possible if they lead to execution error. The second step will be to define tests on the expected results. This part is a bit more tricky and depends on how we consider the results displayed in the vignettes.

Normally, checking the behaviour of a function should be done by a so-called "unit-test". Good unit tests check requirement specification of a function no more nor less. For example, testing that a wrong entry returns an error message, or a good entry returns an expected value (actually it's an automatisation of what you usually do for checking the code).

Defining tests with exact expected results on the vignettes could lead to difficulties in maintaining the tests in the future if these results are supposed to change. So I'm not comfortable with which results I have to check in the vignettes. These tests can deal with dimension of a dataframe, interval in which a return value should be contained in... There are no real limits on what could be tested and how. But I think they should be limited on this part as the main goal here is to check if vignettes and examples are running without error or real aberrations.

I found an interesting lead for what to test in a very generic way. A lot of chunks are not evaluated in the vignettes and the results are loaded. So a good thing to do is to check if the results obtained from the execution of the chunks are identical to the ones stored in the package.

I agree. We just have to manage the object names which are identical between those of chunks and those of rda files

made the issue visible to everyone

created merge request !8 (merged) to address this issue

mentioned in merge request !8 (merged)

mentioned in commit 74be387b

The thing is beginning to work and maybe too well... I've got issues at execution of all the vignette chunks

test-vignettes.R:9: error: V02.1_param_optim works
object 'GoF' not found
Backtrace:
  1. airGR::RunRmdChunks(...) tests/testthat/test-vignettes.R:9:2
 15. hydroPSO::hydroPSO(...) ../tmp/V02.1_param_optim.R:73:0

test-vignettes.R:14: error: V02.2_param_mcmc works
object 'ObsY' not found
Backtrace:
  1. airGR::RunRmdChunks("../../vignettes/V02.2_param_mcmc.Rmd", force.eval = TRUE) tests/testthat/test-vignettes.R:14:2

I tried to run them manually:

> optPSO <- hydroPSO::hydroPSO(fn = OptimGR4J,
+                              lower = lowerGR4J, upper = upperGR4J,
+                              control = list(write2disk = FALSE, verbose = FALSE))
[npart=40 ; maxit=1000 ; method=spso2011 ; topology=random ; boundary.wall=absorbing2011]
         
[ user-definitions in control: write2disk=FALSE ; verbose=FALSE ]
         
Error in hydroPSO::hydroPSO(fn = OptimGR4J, lower = lowerGR4J, upper = upperGR4J,  : 
  object 'GoF' not found

It seems it's an internal issue of the hydroPSO package?

> load(system.file("vignettesData/vignetteParamMCMC.rda", package = "airGR"))
> example("Calibration_Michel", local = TRUE, echo = FALSE, verbose = FALSE, ask = FALSE)
> Likelihood <- sum((ObsY - ModY)^2, na.rm = TRUE)^(-sum(!is.na(ObsY)) / 2)
Error: object 'ObsY' not found

No trace of ObsY in Calibration_Michel manual example nor in loaded data...

Can you help me on these issues?

In the V02.2_param_mcmc vignette, the two following command lines (chunk 3) are just written in order to explain a part of the chunk 4... That is why some objects does not exist at this step.

Chunk 3

Likelihood <- sum((ObsY - ModY)^2, na.rm = TRUE)^(-sum(!is.na(ObsY)) / 2)
LogLike <- -2 * log(Likelihood)

Chunk 4

LogLikeGR4J <- function(ParamOptim) {
  ## Transformation to real space
  RawParamOptim <- airGR::TransfoParam_GR4J(ParamIn = ParamOptim,
                                              Direction = "TR")
  ## Simulation given a parameter set
  OutputsModel <- airGR::RunModel_GR4J(InputsModel = InputsModel,
                                       RunOptions = RunOptions,
                                       Param = RawParamOptim)
  ## Computation of the log-likelihood: N * log(SS)
  ObsY <- InputsCrit$Qobs
  ModY <- OutputsModel$Qsim
  LogLike <- sum(!is.na(ObsY)) * log(sum((ObsY - ModY)^2, na.rm = TRUE))

Great! I'll use purl=FALSE in the chunk!

About 'GoF', it seems to be more complicated. The hydroPSO package has been updated very recently (after being removed from CRAN) and I haven't run the code since (usually I check the code when I build the airGR Website where chunks are evaluated).

'GoF' seems to be an internal variable of the hydroPSO() function. I haven't been able to find out quickly where that comes from in this function...

@guillaume.thirel, can you ask Mauricio?

Ok, I just wrote to him. I can't even run the hydroPSO::hydroPSO example...

Dear Guillaume

I apologise for my late reply, but adapting to "virtual teaching" is taking all my time these days....

Thanks a lot for reporting this problem. A single line was removed in the last update....

The new version will be submitted to CRAN today (including the updated GR4J vignette). Meanwhile, you can install the fixed version with:

library(devtools) install_github("hzambran/hydroPSO")

Bon week-end

Mauricio

@olivier.delaigue @david.dorchies let me know if that solves your issue!

mentioned in commit 32a34205

I'm happy to see that automatic testing has already its utility

But the series is continuing... The execution of V02.2_param_mcmc is still failing:

test-vignettes.R:25: error: V02.2_param_mcmc works
'Param' must be a vector of length 4 and contain no NA
Backtrace:
  1. airGR::RunRmdChunks("../../vignettes/V02.2_param_mcmc.Rmd") C:\DocDD\Irstea\2018 Water JPI IN-WOT\2_Data\sources\airgr/tests/testthat/test-vignettes.R:25:2
 16. global::objective(.par, ...)
 17. airGR::RunModel_GR4J(...)

It crashes here:

> optPORT <- stats::nlminb(start = c(4.1, 3.9, -0.9, -8.7),
+                          objective = LogLikeGR4J,
+                          lower = rep(-9.9, times = 4), upper = rep(9.9, times = 4),
+                          control = list(trace = 0))
Error in airGR::RunModel_GR4J(InputsModel = InputsModel, RunOptions = RunOptions,  : 
  'Param' must be a vector of length 4 and contain no NA

Found! There is a misspelling in InputsCrit$Qobs: it should be InputsCrit$Obs!

You were faster than me! Yes the name of the output has changed and I changed it on the website and not here. Shame on me.

mentioned in commit f41246f6

mentioned in commit 17fb0784

Except for vignette V02.1_param_optim which is skip for the moment because of hydroPSO package issue, everything is running now.

On my laptop, their duration is 182.8 s.

N.B. on V02.2_param_mcmc

vignettesData/vignetteParamMCMC.rda contains unnecessary data. parDRAM is calculated in this evaluated chunk:

parDRAM <- ggmcmc::ggs(multDRAM) ## to convert objet for using by all ggs_* graphical functions
ggmcmc::ggs_traceplot(parDRAM)

@olivier.delaigue should we correct this? Do you need a ticket issue?

Why do you say that is not necessary?

parDRAM is first loaded in vignettesData/vignetteParamMCMC.rda and then overwrites by parDRAM <- ggmcmc::ggs(multDRAM). So, I don't know why it is loaded.

Is it use in another location in the project? If not, it seems to me that we don't have to store it in vignettesData/vignetteParamMCMC.rda.

I confess it's an insignificant detail...

You're right. We can remove parDRAM from the vignettesData/vignetteParamMCMC.rda file. Be careful to save the new rda file using save(..., version = 2) so as not to depend on R version >= 3.5.0.

Done in commit 66edb42e

mentioned in commit 66edb42e

mentioned in commit c75c9fb4

I'm meeting some issues when I want to run or test examples during package development . Examples are not available after a call to devtools::load_all("."):

Restarting R session...

> library(airGR)
> example("Calibration_Michel")
Clbr_M> library(airGR) (...) # Execution OK
> devtools::load_all(".")
Loading airGR
> example("Calibration_Michel")
Warning message:
In example("Calibration_Michel") :
  fichier d'aide introuvable pour ‘Calibration_Michel’
> library(airGR)
> example("Calibration_Michel")
Warning message:
In example("Calibration_Michel") :
  fichier d'aide introuvable pour ‘Calibration_Michel’

It seems that it's relative to devtools::load_all which don't load man pages. (See https://rdrr.io/cran/devtools/man/load_all.html)

Moreover, I don't think it's necessary to test examples in the testthat framework because examples in man pages are already run "as part of R CMD check" (source: http://r-pkgs.had.co.nz/man.html). So if on of the example code crashes, the check should failed.

In conclusion, I propose to not implement test on examples. (It's a shame, I was proud of my little test code above

context("Test examples in help pages")

for(file in list.files("../../man", pattern = "^\\w+\\.Rd$", full.names = TRUE)) {
    sTxt = readLines(file)
    if(length(grep("^\\\\examples\\{", sTxt)) > 0) {
      sExample = basename(substr(file, 1, nchar(file) - 3))
      test_that(paste("Example", sExample, "should work"), {
          expect_true(RunExample(sExample))
      })
    }
}

You are right, examples already run using "R CMD check". So it's useless... By the way, it is maybe a good idea to check the package using the --as-cran option: R CMD check --as-cran

Really good advice! In fact, R CMD check is not even currently working because vignettes are not located in the same place if I run the test in the development environment and in the "checking" environment.

I should consider using system.file(..., package = "airGR") for the chunk extraction!

I tried R CMD check with and without --as-cran option. Both tests passed.

In both cases, tests with skip_on_cran instruction are skipped (Checking takes between 1 mn 40 s and 2 mn 20 s).

If you run the check from Rstudio button, all tests are always executed (takes more than 8 mn).

For forcing execution of all tests with R CMD check, you should first define NOT_CRAN=true in variable environment (On Windows: SET NOT_CRAN=true).

mentioned in commit 9f435b1f

mentioned in commit 5566d731

mentioned in commit 202e83c3

Well, I'm not far from passing the tests. I install the necessary packages on the server. But I'm still stumbling over the hydroPSO package. This time, I can't even install it:

ERROR: dependency ‘latticeExtra’ is not available for package ‘Hmisc’
* removing ‘/home/gitlab-runner/R/x86_64-pc-linux-gnu-library/3.5/Hmisc’
ERROR: dependency ‘Hmisc’ is not available for package ‘hydroPSO’
* removing ‘/home/gitlab-runner/R/x86_64-pc-linux-gnu-library/3.5/hydroPSO’

And it's blocking everything...

It'due to latticeExtra dependency to R >=3.6... and I have R 3.5 installed on the server.

I managed to install the previous version of latticeExtra and then hydroPSO.

The pipeline finished without errors! (See https://gitlab.irstea.fr/HYCAR-Hydro/airgr/pipelines/12432). You can see the details in the console by clicking on each job.

There was actually an error in Latex generation of the manual during check but it is not prohibitive.

Check took only 1 minute 20 seconds. I have a doubt if NOT_CRAN environment variable has been taken into account. I can try with Sys.setenv(NOT_CRAN = "true") added in .Rprofile.

Found a solution for latex generation problem: https://stackoverflow.com/a/34524358/5300212

I used the second solution by modifying default configuration of R because installing 1.7Go of package for getting a font is insane as said in the comments

mentioned in commit 12659d55

mentioned in commit d408f217

Check with --as-rcran option has been successfully added in commit d408f217.

This issue is close to be solved. It only remains the case of the vignette V02.1_param_optim which has been excluded from the test because of the messy hydroPSO dependency. By the way, I have a suggestion for the vignette: attach the library call to chunk where the library is actually used.

In the case of hydroPSO, and while we are waiting for it's resolution by the package maintainer, I suggest this modification:

Remove library(hydroPSO) from the header of the vignette
Add library(hydroPSO) to the chunk where it used and add purl=FALSE besides eval=FALSE in the header of the chunk in order to remove this dependency in the build, the tests and the final check.

I suggest to do that for all dependencies in vignettes in order to be less vulnerable to dependency disorders by isolating function calls and dependencies together. (It can also help in the comprehension of which function depend on which library).

For the moment, I prefer we just put a hash sign in the header before library(hydroPSO). In chunks each function is preceded by packageName::, I think that is clear enough. I prefer to have all package loads in one chunk in the header.

OK let's go this way! I add V02.1_param_optim in the tests with this solution.

mentioned in commit af8c1ae3

closed via merge request !8 (merged)

mentioned in commit 6629f998

I have incremented the version number,even if the tests do not imply changes for users. On the other hand, it was also justified by the minor modification of a vignette, even if I forgot to write it in the commit... Thank you David for all the work!

mentioned in issue #55 (closed)

changed milestone to %v1.6.10

Implement automatic tests in the package

Tasks

Activity

For the tests in general

For testing the vignette chunk codes

Menu

Implement automatic tests in the package

Tasks

Linked items 0

Related merge requests

Activity

For the tests in general

For testing the vignette chunk codes

Linked items
0