07c1fe38

STR

This repository contains all the work on STR or Spatial Textual Representation. The file hierarchy is divided in multiple modules such as :

  • config which contains the configuration file and a dedicated class for loading and interact with it
  • gmatch4py is a module which contains implementation of various graph matching algorithms
  • helpers is a module which contains various helpers methods for requesting the geo database (geodict) or collision between polygons, etc..
  • models contains the STR structure and its variations.
  • nlp contains all the implementation or interface of nlp methods such as NER, POS, Toponym disambiguation, ...

Generate STR

To generate STR, use the generate_str.py.

usage: generate_str.py [-h] [-n {spacy,polyglot,stanford}]
                       [-d {occwiki,most_common,shareprop}] [-t {gen,ext}]
                       [-o OUTPUT]
                       input_pkl

positional arguments:
  input_pkl             Filename of your input. Must be in Pickle format with the following columns :
                          - filename : original filename that contains the text in `content`
                          - id_doc : id of your document
                          - content : text data associated to the document
                          - lang : language of your document

optional arguments:
  -h, --help            show this help message and exit
  -n {spacy,polyglot,stanford}, --ner {spacy,polyglot,stanford}
                        The Named Entity Recognizer you wish to use
  -d {occwiki,most_common,shareprop}, --disambiguator {occwiki,most_common,shareprop}
                        The Named Entity disambiguator you wish to use
  -t {gen,ext}, --transform {gen,ext}
                        Transformation to apply
  -o OUTPUT, --output OUTPUT
                        Output Filename