Debug
Fize Jacques authored
Add Test compared to Baseline
Change select candidate for disambiguation
Add new criteria
d17a7d0c

STR

This repository contains all the work on STR or Spatial Textual Representation. The file hierarchy is divided in multiple modules such as :

  • config which contains the configuration file and a dedicated class for loading and interact with it
  • gmatch4py is a module which contains implementation of various graph matching algorithms
  • helpers is a module which contains various helpers methods for requesting the geo database (geodict) or collision between polygons, etc..
  • models contains the STR structure and its variations.
  • nlp contains all the implementation or interface of nlp methods such as NER, POS, Toponym disambiguation, ...

Generate STR

To generate STR, use the generate_str.py.

usage: generate_str.py [-h] [-n {spacy,polyglot,stanford}]
                       [-d {occwiki,most_common,shareprop}] [-t {gen,ext}]
                       [-o OUTPUT]
                       input_pkl

positional arguments:
  input_pkl             Filename of your input. Must be in Pickle format with the following columns :
                          - filename : original filename that contains the text in `content`
                          - id_doc : id of your document
                          - content : text data associated to the document
                          - lang : language of your document

optional arguments:
  -h, --help            show this help message and exit
  -n {spacy,polyglot,stanford}, --ner {spacy,polyglot,stanford}
                        The Named Entity Recognizer you wish to use
  -d {occwiki,most_common,shareprop}, --disambiguator {occwiki,most_common,shareprop}
                        The Named Entity disambiguator you wish to use
  -t {gen,ext}, --transform {gen,ext}
                        Transformation to apply
  -o OUTPUT, --output OUTPUT
                        Output Filename