gather docs for pipeline, makefile #48

ef021cc8 · Daniel Falster · 68fe1cfd · ef021cc8 · ef021cc8 · ef021cc8
Commit ef021cc8 authored 11 years ago by Daniel Falster
Hide whitespace changes
Inline Side-by-side

Showing

with 441 additions and 45 deletions
+441 -45
--- a/docs/analysis.plan/Makefile
+++ b/docs/analysis.plan/Makefile
--- a/docs/analysis.outline/Makefile copy
+++ b/docs/analysis.outline/Makefile copy
+TARGETS = $(subst md,pdf,$(shell ls *.md))
+
+all: $(TARGETS)
+
+%.pdf: %.md
+	pandoc $< -V linkcolor:black -V geometry:a4paper -V geometry:margin=1in --listings --include-in-header=../include.tex -o $@
+
+clean:
+	rm -f *.pdf
--- a/docs/analysis.plan/analysis.md
+++ b/docs/analysis.plan/analysis.md
--- a/docs/analysis.pipeline/Makefile
+++ b/docs/analysis.pipeline/Makefile
+#$(subst md,pdf,$(shell ls *.md))
+
+all: pipeline
+
+pipeline: pipeline.pdf
+
+pipeline.md:  pipeline.Rmd
+	Rscript -e "require(knitr); require(pander, quietly =TRUE);  knit('$<', '$@');"
+
+%.pdf: %.md
+	pandoc $< --toc -V linkcolor:black -V geometry:a4paper -V geometry:margin=1in --listings --include-in-header=../include.tex -o $@
+
+clean:
+	rm -f *.pdf pipeline.md
--- a/docs/workflow/data.format.md
+++ b/docs/workflow/data.format.md
+
+```{r, results="hide", echo=FALSE, warnings=FALSE}
+opts_chunk$set(warnings = FALSE)
+
+# FUNCTIONS FOR FORMATTING TABLES
+pandify <- function(data.table){
+	pandoc.table.return(data.table, styl="multiline", split.tables= 200, split.cells = 50, justify = "left")
+}
+```
+
+\newpage
+
 % Description of the traits and tree growth data formatting for workshop on traits and competitive interactions
-% Georges Kunstler

-## Department of Biological Sciences Macquarie University, Sydney, NSW / Irstea EMGR Grenoble France <georges.kunstler@gmail.com>

 # Introduction

-This document describes the data structure and the main R functions available so far for the data formatting for the working group on traits and competition. 
+This document describes the data structure and the main R functions available so far for the data formatting for the working group on traits and competition.
+
+![Workflow](workflow.png "Workflow")
+

- 
 # Structure of data for analysis

 For the analysis we need for each ecoregion country (or big tropical plot) a list with three elements.

-* First element is a  data.frame for individual tree data with columns
+## First element is a  data.frame for individual tree data with columns

-	- $obs.id4 a unique identifier of observqtion (if multiple observation for a same tree)
-    - $tree.id$ a unique identifier of each tree
-    - $sp$ the species code
-    - $plot$ the plot code
-    - $ecocode$ the ecoregion code (trying to merge similar ecoregion to have ecoregion with enough observation per ecoregion)
-    - $D$ diameter growth in cm
-    - $G$ the diameter growth rate in mm / yr.
-    - $dead$ a dummy variable 0 alive 1 dead
-    - $year$ the number of year for the growth measurement
-	- $census$ the name of the year of the census 1
-    - $htot$ the height of the individual (m) for the data base for which it is availble to compute max height per species
-    - $Lon$ Longitude of the plot in WGS84
-    - $Lat$ Latitude of teh plots in WGS84
-    - $perc.dead$ the percentage of dead computed on each plot to exlude plot with perturbation (equal 1 for plot with known perturbation)
-	- $weights$ the weigths of teh tree to have an estimation of basal area per m^2 (so 1/(m^2))
-	- then the potential  abiotic variables that we can use in the analysis
-
-* Second element is a data.frame competition index with columns
+```{r,results="asis",echo=FALSE}
+	mytext <- pandify(read.csv("table.tree.csv", stringsAsFactors=FALSE))
+	writeLines(mytext)
+```
+
+##  Second element is a data.frame competition index with columns

    - $tree.id$ a unique identifier of each tree
    - $ecocode$ the species code
    - one column per species with the name as in the species code $sp$ in the previous the plot code
 	- $BATOT.COMPET$ the sum of the basal area of all species

-* Third element is a data.frame for the species traits data with columns
+## Third element is a data.frame for the species traits data with columns

-	- $sp$ the species code as in previous table
-	- $Latin\_name$ the latin name of the species
-    - $Leaf.N.mean$ Leaf Nitrogen per mass in TRY mg/g
-	- $Seed.mass.mean$ dry mass in TRY mg
-	- $SLA.mean$ in TRY mm2 mg-1
-	- $Wood.density.mean$ in TRY mg/mm3
-	- $Max.height.mean$ from NFI data I compute the 99% quantile in m
-	- and the same columns with $sd$ instead of $mean$ with either the mean sd within species if species mean or the mean sd with genus if genus mean because no species data
-	- a dummy variable with true or false if genus mean 
+```{r,results="asis",echo=FALSE}
+	mytext <- pandify(read.csv("table.traits.csv", stringsAsFactors=FALSE))
+	writeLines(mytext)
+```
+and the same columns with $sd$ instead of $mean$ with either the mean sd within species if species mean or the mean sd with genus if genus mean because no species data
+a dummy variable with true or false if genus mean

 # Competition index

@@ -81,5 +78,15 @@ The objective is to have a table with the species mean of the traits or the genu

 * Need to write a function to compute mean per species for each traits and decide if we use the same species sd for these data sets.

-# table with data and progress in formating and work TODO
-see table.data.progress.ods
+### Ecoregions
+
+For the NFI data we will divide the data set by regions with similar ecological conditions. This will allow to estimate the link between competitive interactions and traits within regions of similar conditions and see how the results vary (for instance in the US there is a large variability between the north and the south). This will allow to make comparison with large tropical plot more easy. Then this will help to have smaller data set to speed up the estimation. Please could you either provides a source of ecoregion with a GIS layer that we can use or better directly includes this variable in the data (at the plot level). Similarly in term of climatic variables I was planning to use the best variables available for each data rather than a global data base of lower quality. Could you either give the link of such a data set or better directly do get the variables for each plot.
+I think that we do not have any ecoregion information that was directly measured in the SFI data. However, we have joint each SFI plot with Olson ecoregions.
+
+
+# Progress
+
+```{r,results="asis",echo=FALSE}
+	mytext <- pandify(read.csv("table.data.progress.csv", stringsAsFactors=FALSE))
+	writeLines(mytext)
+```
--- a/docs/analysis.pipeline/pipeline.md
+++ b/docs/analysis.pipeline/pipeline.md
+
+
+
+
+\newpage
+
+% Description of the traits and tree growth data formatting for workshop on traits and competitive interactions
+
+
+# Introduction
+
+This document describes the data structure and the main R functions available so far for the data formatting for the working group on traits and competition.
+
+![Workflow](workflow.png "Workflow")
+
+
+# Structure of data for analysis
+
+For the analysis we need for each ecoregion country (or big tropical plot) a list with three elements.
+
+## First element is a  data.frame for individual tree data with columns
+
+
+----------------------------------------------------------------------------
+var       numeric   units   description                                     
+--------- --------- ------- ------------------------------------------------
+obs.id    0                 a unique identifier of observqtion (if multiple 
+                            observation for a same tree)                    
+
+tree.id   0                 a unique identifier of each tree                
+
+sp        0                 the species code                                
+
+sp.name   0                                                                 
+
+cluster   0                                                                 
+
+plot      0                 the plot code                                   
+
+ecocode   0                 the ecoregion code (trying to merge similar     
+                            ecoregion to have ecoregion with enough         
+                            observation per ecoregion)                      
+
+D         1         cm      diameter growth                                 
+
+G         1         mm/yr   the diameter growth rate                        
+
+dead      1                 a dummy variable 0 alive 1 dead                 
+
+year      1         yr      the number of year for the growth measurement   
+
+htot      1         m       the height of the individual for the data base  
+                            for which it is availble to compute max height  
+                            per species                                     
+
+Lon       1         deg     Longitude of the plot in WGS84                  
+
+Lat       1         deg     Latitude of teh plots in WGS84                  
+
+perc.dead 1                 the percentage of dead computed on each plot to 
+                            exlude plot with perturbation (equal 1 for plot 
+                            with known perturbation)                        
+
+weights   1         /mm2    the weigths of the tree to have an estimation of
+                            basal area per m^2                              
+
+census    1         0       the name of the year of the census 1            
+----------------------------------------------------------------------------
+
+
+##  Second element is a data.frame competition index with columns
+
+    - $tree.id$ a unique identifier of each tree
+    - $ecocode$ the species code
+    - one column per species with the name as in the species code $sp$ in the previous the plot code
+	- $BATOT.COMPET$ the sum of the basal area of all species
+
+## Third element is a data.frame for the species traits data with columns
+
+
+--------------------------------------------------------------------------------------
+var                numeric   units   description                                      
+------------------ --------- ------- -------------------------------------------------
+sp                 0                 the species code used in other tables            
+
+Latin_name         0                 the latin name of the species                    
+
+Leaf.N.mean        1         mg/g    Leaf Nitrogen per mass                           
+
+Seed.mass.mean     1         mg      seed mass                                        
+
+SLA.mean           1         mm2/mg  specific leaf area                               
+
+Wood.density.mean  1         mg/mm3  wood density                                     
+
+Max.height.mean    1                 from NFI data I compute the 99% quantile in m and
+                                     the same columns with ,sd, instead of ,mean, with
+                                     either the mean sd within species if species mean
+                                     or the mean sd with genus if genus mean because  
+                                     no species data a dummy variable with true or    
+                                     false if genus mean                              
+
+Leaf.N.sd          1                                                                  
+
+Seed.mass.sd       1                                                                  
+
+SLA.sd             1                                                                  
+
+Wood.density.sd    1                                                                  
+
+Max.height.sd      1                                                                  
+
+Leaf.N.exp         1                                                                  
+
+Seed.mass.exp      1                                                                  
+
+SLA.exp            1                                                                  
+
+Wood.density.exp   1                                                                  
+
+Leaf.N.genus       1                                                                  
+
+Seed.mass.genus    1                                                                  
+
+SLA.genus          1                                                                  
+
+Wood.density.genus 1                                                                  
+
+Leaf.N.nobs        1                                                                  
+
+Seed.mass.nobs     1                                                                  
+
+SLA.nobs           1                                                                  
+
+Wood.density.nobs  1                                                                  
+--------------------------------------------------------------------------------------
+
+and the same columns with $sd$ instead of $mean$ with either the mean sd within species if species mean or the mean sd with genus if genus mean because no species data
+a dummy variable with true or false if genus mean
+
+# Competition index
+
+## National forest inventory type data
+
+We computes the sum of basal area (BA) per plot (including the weight of each tree to have a basal area in $m^2/ha$) total and per species without the  BA of the target tree (see the R function `BA.SP.FUN` in the file format.function.R).
+
+## Large plot data
+
+Need to compute the basal area ($m^2/ha$) per species in the neighborhood of each individuals in given radius $R$. The function  `BA.SP.FUN.XY` in the file format.function.R should do that but not tested.
+
+
+# Traits data
+
+The objective is to have a table with the species mean of the traits or the genus mean for the traits if no data available.
+
+## TRY data
+
+* The TRY data is provided with one row for each variables measured on a single individuals (traits variable and non traits variables). The function `fun.extract.try` (in FUN.TRY.R) extract the traits variables and the non traits variables that we want to create a table with one row per individual (Observation.ID) and one column per traits or non traits variables.
+
+* Then we compute for each species (and all its potential synonyms) the mean observation of each traits (in log10) without experimental data if possible or with experimental data if no data. If no data is available for a given species we compute the *genus* mean (and a dummy variable indicating that this is genus mean). The function also compute the traits sd. See function `fun.species.traits` . This function also exclude outlier based on the method used by Kattge et al 2011 (GCB) (see function `fun.out.TF2`).
+
+* Then I have computed the mean sd within species (assuming that the within species sd is constant over all species).
+
+* So far on the French data I have only list species potential synonyms self build but it would be great to either creates a list of potential synonyms from existing list or alternatively to match the TRY species and the forest inventory species on the same list to have teh same species.
+
+
+
+## Other data provided for each data
+
+* Need to write a function to compute mean per species for each traits and decide if we use the same species sd for these data sets.
+
+### Ecoregions
+
+For the NFI data we will divide the data set by regions with similar ecological conditions. This will allow to estimate the link between competitive interactions and traits within regions of similar conditions and see how the results vary (for instance in the US there is a large variability between the north and the south). This will allow to make comparison with large tropical plot more easy. Then this will help to have smaller data set to speed up the estimation. Please could you either provides a source of ecoregion with a GIS layer that we can use or better directly includes this variable in the data (at the plot level). Similarly in term of climatic variables I was planning to use the best variables available for each data rather than a global data base of lower quality. Could you either give the link of such a data set or better directly do get the variables for each plot.
+I think that we do not have any ecoregion information that was directly measured in the SFI data. However, we have joint each SFI plot with Olson ecoregions.
+
+
+# Progress
+
+
+---------------------------------------------------------
+Data.name                  Demographic.data              
+-------------------------- ------------------------------
+BCI                        Large 50ha plot with semi     
+                           spatial localisation of tree  
+                           with multiple census          
+
+Fushan                     Large plot with spatial       
+                           localisation of tree with     
+                           multiple census               
+
+Luquillo                   Large plot with spatial       
+                           localisation of tree with     
+                           multiple census               
+
+La Chonta                  Large plot with spatial       
+                           localisation of tree with     
+                           multiple census               
+
+Paracou                    Large plot with spatial       
+                           localisation of tree with     
+                           multiple census               
+
+Mbaiki                     Large plot with spatial       
+                           localisation of tree with     
+                           multiple census               
+
+FIA                        Forest inventory plots in the 
+                           US Formatting M. Vanderwel to 
+                           be done                       
+
+Canada                     Forest inventory plots in     
+                           Canada Formatting John        
+                           Caspersen to be done          
+
+France                     Forest inventory plots        
+
+Spain                      Forest inventory plots check  
+                           with M Zaval formatting       
+                           probably done                 
+
+Sweden                     Forest inventory plots.       
+                           Formatting to be discuss      
+                                                         
+
+Switzerland                Forest inventory plots.       
+                           Formatting to be discuss      
+                                                         
+
+New Zealand                Forest inventory plots.       
+                           Formatting to be discuss      
+                           (Coomes sub sample)           
+
+Autralia NSW Kooyman plots Several medium size plots.    
+                           Formatting in progress        
+                                                         
+
+CSIRO plots                Several medium size plots.    
+                           Formatting in progress        
+                                                         
+---------------------------------------------------------
+
+Table: Table continues below (continued below)
+
+ 
+---------------------------------------------
+Demo.data.availability    Traits.data        
+------------------------- -------------------
+ok                        Available with data
+
+ok                        Available with data
+
+Need to contact Zimmerman Available with data
+
+no                        Available with data
+
+ok                        Available with data
+
+Waite                     Available with data
+
+ok                        TRY                
+
+ok                        TRY                
+
+ok                        TRY                
+
+ok                        TRY                
+
+ok                        TRY                
+
+ok                        TRY                
+
+ok                        Available with data
+
+ok                        Available with data
+
+Waite                     Available with data
+---------------------------------------------
+
+Table: Table continues below
+
+ 
+------------------------------------------------
+Traits.data.vailability   Abiotic.variables     
+------------------------- ----------------------
+ok                        topography and/or soil
+
+ok                        topography and/or soil
+
+Need to contact Swenson   topography and/or soil
+
+ok                        topography and/or soil
+
+ok                        topography and/or soil
+
+Waite                     topography and/or soil
+
+ok                        climate               
+
+ok                        climate               
+
+ok                        climate               
+
+ok                        climate               
+
+ok                        climate               
+
+ok                        climate               
+
+ok                        climate               
+
+ok                        climate               
+
+Waite                     climate               
+------------------------------------------------
+
+ 
+---------------------------------------------------------------
+Progress.in.formatting.the.data   TODO                         
+--------------------------------- -----------------------------
+demo data ok                      compute CI adn process traits
+
+demo data ok                      compute CI adn process traits
+
+NO                                send email                   
+
+NO                                                             
+
+demo and competition index ok     Traits ask Ghislain to do    
+
+Waite ghislain                                                 
+
+Done                              Need to add max height from  
+                                  FIA MISSING CENSUS VARIABLE  
+
+Need to pupdate with new code     waite new data with Quebec   
+per ecoregion                     MISSING CENSUS VARIABLE      
+
+Done                              rewrite to format per        
+                                  ecoregion                    
+
+Demo done                         Competition index and TRY    
+
+demo ok                           missing TreeID and mortality 
+
+demo ok                           missing mortality, ecoregion 
+
+demo ok                                                        
+
+demo and compeitition index       Traits ask Ghislain to do    
+ok                                                             
+
+Waite                             daniel send email with traits
+---------------------------------------------------------------
+
--- a/docs/analysis.pipeline/table.data.progress.csv
+++ b/docs/analysis.pipeline/table.data.progress.csv
+"Data name","Demographic data","Demo data availability","Traits data","Traits data vailability","Abiotic variables","Progress in formatting the data","TODO"
+"BCI","Large 50ha plot with semi spatial localisation of tree with multiple census","ok","Available with data","ok","topography and/or soil","demo data ok ","compute CI adn process traits"
+"Fushan","Large plot with spatial  localisation of tree with multiple census","ok","Available with data","ok","topography and/or soil","demo data ok ","compute CI adn process traits"
+"Luquillo ","Large plot with spatial  localisation of tree with multiple census","Need to contact Zimmerman ","Available with data","Need to contact Swenson","topography and/or soil","NO","send email"
+"La Chonta","Large plot with spatial  localisation of tree with multiple census","no","Available with data","ok","topography and/or soil","NO",
+"Paracou","Large plot with spatial  localisation of tree with multiple census","ok","Available with data","ok","topography and/or soil","demo and competition index ok","Traits ask Ghislain to do"
+"Mbaiki","Large plot with spatial  localisation of tree with multiple census","Waite ","Available with data","Waite ","topography and/or soil","Waite ghislain",
+"FIA","Forest inventory plots in the US Formatting M. Vanderwel to be done","ok","TRY","ok","climate","Done ","Need to add max height from FIA MISSING CENSUS VARIABLE"
+"Canada","Forest inventory plots in Canada
+Formatting John Caspersen to be done","ok","TRY","ok","climate","Need to pupdate with new code per ecoregion","waite new data with Quebec MISSING CENSUS VARIABLE"
+"France","Forest inventory plots","ok","TRY","ok","climate","Done ","rewrite to format per ecoregion"
+"Spain","Forest inventory plots check with M Zaval formatting probably done","ok","TRY","ok","climate","Demo done","Competition index and TRY"
+"Sweden","Forest inventory plots. Formatting to be discuss","ok","TRY","ok","climate","demo  ok","missing TreeID and mortality"
+"Switzerland","Forest inventory plots. Formatting to be discuss","ok","TRY","ok","climate","demo  ok","missing mortality, ecoregion"
+"New Zealand","Forest inventory plots. Formatting to be discuss (Coomes sub sample)","ok","Available with data","ok","climate","demo ok ",
+"Autralia NSW Kooyman plots","Several medium size plots. Formatting in progress","ok","Available with data","ok","climate","demo and compeitition index ok","Traits ask Ghislain to do"
+"CSIRO plots ","Several medium size plots. Formatting in progress","Waite ","Available with data","Waite ","climate","Waite ","daniel send email with traits"
--- a/docs/workflow/cols-traits.csv
+++ b/docs/workflow/cols-traits.csv
--- a/docs/workflow/cols-tree.csv
+++ b/docs/workflow/cols-tree.csv
--- a/docs/meeting.agenda/workflow.png
+++ b/docs/meeting.agenda/workflow.png
--- a/docs/meeting.agenda/agenda.md
+++ b/docs/meeting.agenda/agenda.md
@@ -41,9 +41,9 @@ Kunstler *et al.* “Competitive Interactions Between Forest Trees Are Driven by

 # Goals

-Our main aim for the meeting is to carry out the actual analysis and to test analysis that may be too long to run during the week. So far we have formatted the data (tree demographic data and traits data) and run preliminary analysis for a subset of data. The advance in the formatting of the data is variable between data set. During the workshop we plan to progress in parallel on **1.** formatting of data not ready for analysis, **2.** run the the preliminary analysis on the newly formatted data, **3.** work on improving the preliminary analysis with *(a)* choice of the shape of function, approach to deal with missing data and write the code for this new model and *(b)* test model including traits variability to deals with missing traits values (probably to long to be run within the week of the workshop) and **4.** interpret the preliminary results and write an outline of the paper. 
+Our main aim for the meeting is to carry out the actual analysis and to test analysis that may be too long to run during the week. So far we have formatted the data (tree demographic data and traits data) and run preliminary analysis for a subset of data. The advance in the formatting of the data is variable between data set. During the workshop we plan to progress in parallel on **1.** formatting of data not ready for analysis, **2.** run the the preliminary analysis on the newly formatted data, **3.** work on improving the preliminary analysis with *(a)* choice of the shape of function, approach to deal with missing data and write the code for this new model and *(b)* test model including traits variability to deals with missing traits values (probably to long to be run within the week of the workshop) and **4.** interpret the preliminary results and write an outline of the paper.

-![Workflow](workflow.png "Hotel map")
+![Workflow](../analysis.pipeline/workflow.png "Hotel map")

 \pagebreak

@@ -128,7 +128,7 @@ Participants arrive in Sydney. Personal site seeing in Sydney region.

 # Directions

-Airport to accommodation: Macquarie University is accessible by train or taxi. Macquarie University is on the northern side of Sydney and a taxi ride will cost approximately $100. The Macquarie University train station is on the northern line (red). A train from the airport (green line – change at Central station to northern line) will cost approximately $17. 
+Airport to accommodation: Macquarie University is accessible by train or taxi. Macquarie University is on the northern side of Sydney and a taxi ride will cost approximately $100. The Macquarie University train station is on the northern line (red). A train from the airport (green line – change at Central station to northern line) will cost approximately $17.

 ![Map. A red : Travellodge hotel, B green: Department of Biologicla Sciences.](travellodge.png "Hotel map")


--- a/docs/metadata/Makefile
+++ b/docs/metadata/Makefile
@@ -13,7 +13,7 @@ sites/all.md: $(MD)
 sites/all.pdf: sites/all.md

 %.pdf: %.md
-	pandoc $< -V linkcolor:black -V geometry:a4paper -V geometry:margin=0.5in --listings --include-in-header=../include.tex -o $@
+	pandoc $< --toc -V linkcolor:black -V geometry:a4paper -V geometry:margin=0.5in --listings --include-in-header=../include.tex -o $@

 clean:
 	rm -f sites/*

--- a/docs/table.data.progress.ods
+++ b/docs/table.data.progress.ods
--- a/docs/workflow/calculations.md
+++ b/docs/workflow/calculations.md
-
-
-### Ecoregions
-
-For the NFI data we will divide the data set by regions with similar ecological conditions. This will allow to estimate the link between competitive interactions and traits within regions of similar conditions and see how the results vary (for instance in the US there is a large variability between the north and the south). This will allow to make comparison with large tropical plot more easy. Then this will help to have smaller data set to speed up the estimation. Please could you either provides a source of ecoregion with a GIS layer that we can use or better directly includes this variable in the data (at the plot level). Similarly in term of climatic variables I was planning to use the best variables available for each data rather than a global data base of lower quality. Could you either give the link of such a data set or better directly do get the variables for each plot.
-I think that we do not have any ecoregion information that was directly measured in the SFI data. However, we have joint each SFI plot with Olson ecoregions.