Failed to fetch fork details. Try again later.
-
Midoux Cedric authored82589719
Forked from
Midoux Cedric / easy16S
Source project has a limited visibility.
README.md 2.80 KiB
Biotex
Biotex is a Automated Term Extractor (ATE) for Bio-medical terms (see here for more details).
This repository contains two ways of executing Biotex (Python and Java). A list of improvements made:
- Easy execution
- Python wrapper
- Parameters configuration without changing the source code
- ...
Installation
Requirements
- Python 3.6+
- Java 7-8
- TreeTagger (can be found here)
It is advised to put TreeTagger in $HOME/.tree-tagger/
. If not please used the treetagger_src
parameter in BiotexWrapper
to set your TreeTagger directory.
Setup
To make it works, clone this repository using :
git clone https://gitlab.irstea.fr/jacques.fize/biotex_python.git
Then, install the module by using the following commands :
cd biotex_python
(sudo) pip3 install .
Get Started
A first run
To see if everything work, use the following code.
from biotex import BiotexWrapper
wrapper = BiotexWrapper(language="french")
corpus= ["D'avantage de lignes en commun de bus.",
'Les dérèglements climatiques (crue, sécheresse)',
'Protéger les captages d\'eau potable en interdisant toute activité polluante dans les "périmètres de protection rapprochée" et inciter les collectivités locales à acheter les terrains de ces périmètres. Supprimer les avantages fiscaux sur les produits pétroliers utilisés dans le transport aérien, maritime,BTP... Instaurer une taxe sur les camions traversant la France qui serait utilisée soit pour la transition écologique soit pour soigner les personnes atteintes de maladies respiratoires. Aider l\'agriculture à changer de modèle.',
"Je n'utilise pas la voiture pour des déplacements quotidiens"]
wrapper.terminology(corpus)
Parameters of the Biotex Wrapper class
Here is the list of all the available parameters in the wrapper.
Parameters
----------
biotex_jar_path : str, optional
Filepath of the Biotex jar [***]
pattern_path : str, optional
Directory that contains pre-defined patterns [***]
dataset_src : src, optional
FilePath of datasets used by Biotex [***]
stopwords_src : str, optional
Path of the directory that contains stop-words for each language [***]
treetagger_src : str, optional
Path of the directory that contains TreeTagger
type_of_terms : str, optional
number of terms you want to extract ("all","multi"), by default "all"
language : str, optional
language of the data, by default "french"
score : str, optional
score used to sort the extracted term, by default "F-TFIDF-C_M"
patron_number : str, optional
number of pattern used to extract terms, by default "3"
[***] Only change these settings if you are familiar with the Biotex Java API.