data.format.md

% Description of the traits and tree growth data formatting for workshop on traits and competitive interactions
% Georges Kunstler

## Department of Biological Sciences Macquarie University, Sydney, NSW / Irstea EMGR Grenoble France <georges.kunstler@gmail.com>

# Introduction

This document describes the data structure and the main R functions available so far for the data formatting for the working group on traits and competition. 

 
# Structure of data for analysis

For the analysis we need for each ecoregion country (or big tropical plot) a list with three elements.

* First element is a  data.frame for individual tree data with columns

	- $obs.id4 a unique identifier of observqtion (if multiple observation for a same tree)
    - $tree.id$ a unique identifier of each tree
    - $sp$ the species code
    - $plot$ the plot code
    - $ecocode$ the ecoregion code (trying to merge similar ecoregion to have ecoregion with enough observation per ecoregion)
    - $D$ diameter growth in cm
    - $G$ the diameter growth rate in mm / yr.
    - $dead$ a dummy variable 0 alive 1 dead
    - $year$ the number of year for the growth measurement
	- $census$ the name of the year of the census 1
    - $htot$ the height of the individual (m) for the data base for which it is availble to compute max height per species
    - $Lon$ Longitude of the plot in WGS84
    - $Lat$ Latitude of teh plots in WGS84
    - $perc.dead$ the percentage of dead computed on each plot to exlude plot with perturbation (equal 1 for plot with known perturbation)
	- $weights$ the weigths of teh tree to have an estimation of basal area per m^2 (so 1/(m^2))
	- then the potential  abiotic variables that we can use in the analysis

* Second element is a data.frame competition index with columns

    - $tree.id$ a unique identifier of each tree
    - $ecocode$ the species code
    - one column per species with the name as in the species code $sp$ in the previous the plot code
	- $BATOT.COMPET$ the sum of the basal area of all species

* Third element is a data.frame for the species traits data with columns

	- $sp$ the species code as in previous table
	- $Latin\_name$ the latin name of the species
    - $Leaf.N.mean$ Leaf Nitrogen per mass in TRY mg/g
	- $Seed.mass.mean$ dry mass in TRY mg
	- $SLA.mean$ in TRY mm2 mg-1
	- $Wood.density.mean$ in TRY mg/mm3
	- $Max.height.mean$ from NFI data I compute the 99% quantile in m
	- and the same columns with $sd$ instead of $mean$ with either the mean sd within species if species mean or the mean sd with genus if genus mean because no species data
	- a dummy variable with true or false if genus mean 

# Competition index

## National forest inventory type data

We computes the sum of basal area (BA) per plot (including the weight of each tree to have a basal area in $m^2/ha$) total and per species without the  BA of the target tree (see the R function `BA.SP.FUN` in the file format.function.R).

## Large plot data

Need to compute the basal area ($m^2/ha$) per species in the neighborhood of each individuals in given radius $R$. The function  `BA.SP.FUN.XY` in the file format.function.R should do that but not tested.


# Traits data

The objective is to have a table with the species mean of the traits or the genus mean for the traits if no data available.

## TRY data

* The TRY data is provided with one row for each variables measured on a single individuals (traits variable and non traits variables). The function `fun.extract.try` (in FUN.TRY.R) extract the traits variables and the non traits variables that we want to create a table with one row per individual (Observation.ID) and one column per traits or non traits variables.

* Then we compute for each species (and all its potential synonyms) the mean observation of each traits (in log10) without experimental data if possible or with experimental data if no data. If no data is available for a given species we compute the *genus* mean (and a dummy variable indicating that this is genus mean). The function also compute the traits sd. See function `fun.species.traits` . This function also exclude outlier based on the method used by Kattge et al 2011 (GCB) (see function `fun.out.TF2`).

* Then I have computed the mean sd within species (assuming that the within species sd is constant over all species).

* So far on the French data I have only list species potential synonyms self build but it would be great to either creates a list of potential synonyms from existing list or alternatively to match the TRY species and the forest inventory species on the same list to have teh same species.


## Other data provided for each data

* Need to write a function to compute mean per species for each traits and decide if we use the same species sd for these data sets.

# table with data and progress in formating and work TODO
see table.data.progress.ods