STR
This repository contains all the work on STR or Spatial Textual Representation. The file hierarchy is divided in multiple modules such as :
- config which contains the configuration file and a dedicated class for loading and interact with it
- gmatch4py is a module which contains implementation of various graph matching algorithms
- helpers is a module which contains various helpers methods for requesting the geo database (geodict) or collision between polygons, etc..
- models contains the STR structure and its variations.
- nlp contains all the implementation or interface of nlp methods such as NER, POS, Toponym disambiguation, ...
Generate STR
To generate STR, use the generate_str.py
.
usage: generate_str.py [-h] [-n {spacy,polyglot,stanford}]
[-d {occwiki,most_common,shareprop}] [-t {gen,ext}]
[-o OUTPUT]
input_pkl
positional arguments:
input_pkl Filename of your input. Must be in Pickle format with the following columns :
- filename : original filename that contains the text in `content`
- id_doc : id of your document
- content : text data associated to the document
- lang : language of your document
optional arguments:
-h, --help show this help message and exit
-n {spacy,polyglot,stanford}, --ner {spacy,polyglot,stanford}
The Named Entity Recognizer you wish to use
-d {occwiki,most_common,shareprop}, --disambiguator {occwiki,most_common,shareprop}
The Named Entity disambiguator you wish to use
-t {gen,ext}, --transform {gen,ext}
Transformation to apply
-o OUTPUT, --output OUTPUT
Output Filename