From 23eba576e2db8f6b4b7d3b98a321148419b1477e Mon Sep 17 00:00:00 2001 From: Fize Jacques <jacques.fize@cirad.fr> Date: Tue, 15 Oct 2019 11:10:01 +0200 Subject: [PATCH] change Readme --- README.md | 50 +++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 43 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 175f05a..fecfa38 100644 --- a/README.md +++ b/README.md @@ -12,11 +12,13 @@ This repository contains two ways of executing Biotex (Python and Java). A list ## Requirements - * Python 3 + * Python 3.6+ * Java 7-8 + * TreeTagger (can be found [here](https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/)) +It is advised to put *TreeTagger* in `$HOME/.tree-tagger/`. If not please used the `treetagger_src` parameter in `BiotexWrapper` to set your TreeTagger directory. -## Install +## Setup To make it works, clone this repository using : git clone https://gitlab.irstea.fr/jacques.fize/biotex_python.git @@ -27,13 +29,47 @@ Then, install the module by using the following commands : (sudo) pip3 install . -# Example +# Get Started + +## A first run + +To see if everything work, use the following code. ```python from biotex import BiotexWrapper -wrapper = BiotexWrapper(lang="fr") -corpus= [Load your corpus here] -wrapper.create_corpus_from_txt(corpus) -terminology = wrap.extract_terminology("output.txt") +wrapper = BiotexWrapper(language="french") +corpus= ["D'avantage de lignes en commun de bus.", + 'Les dérèglements climatiques (crue, sécheresse)', + 'Protéger les captages d\'eau potable en interdisant toute activité polluante dans les "périmètres de protection rapprochée" et inciter les collectivités locales à acheter les terrains de ces périmètres. Supprimer les avantages fiscaux sur les produits pétroliers utilisés dans le transport aérien, maritime,BTP... Instaurer une taxe sur les camions traversant la France qui serait utilisée soit pour la transition écologique soit pour soigner les personnes atteintes de maladies respiratoires. Aider l\'agriculture à changer de modèle.', + "Je n'utilise pas la voiture pour des déplacements quotidiens"] +wrapper.terminology(corpus) +``` + +## Parameters of the Biotex Wrapper class + +Here is the list of all the available parameters in the wrapper. + ``` +Parameters +---------- +biotex_jar_path : str, optional + Filepath of the Biotex jar [***] +pattern_path : str, optional + Directory that contains pre-defined patterns [***] +dataset_src : src, optional + FilePath of datasets used by Biotex [***] +stopwords_src : str, optional + Path of the directory that contains stop-words for each language [***] +treetagger_src : str, optional + Path of the directory that contains TreeTagger +type_of_terms : str, optional + number of terms you want to extract ("all","multi"), by default "all" +language : str, optional + language of the data, by default "french" +score : str, optional + score used to sort the extracted term, by default "F-TFIDF-C_M" +patron_number : str, optional + number of pattern used to extract terms, by default "3" +[***] Only change these settings if you are familiar with the Biotex Java API. + ``` \ No newline at end of file -- GitLab