6c5f0b6d

strpython

In [1,2], we propose matching process based on a dedicated graph structured named Spatial Textual Representation. This structure is composed of spatial entities (places like Paris) and the spatial relations that connects them. This library propose an implementation of the Spatial Textual Representation and extensions.

Requirements

  • Python 3
  • Linux or Mac OS X
  • Geodict downlad ES server
  • Install gazpy : sudo pip3 install git+https://github.com/Jacobe2169/gazpy.git

Installation

git clone <gitrepo>
cd str-python
(sudo) pip3 install .

How-to ?

Using the Python API

To generate a STR for one or many documents, you need to define the STR pipeline. At this end, instantiate a Pipeline object.

from strpython import Pipeline, STR

pip = Pipeline(lang="fr")

You can custom the NER, you choose to use : Spacy, Flair, Polyglot (See strpython.nlp.ner).

from strpython import Pipeline, STR
from strpython.nlp.ner.spacy import Spacy

pip = Pipeline(lang="fr",ner= Spacy(lang="fr"))

You can also customize the disambiguation algorithm use in the geocoding step (See strpython.nlp.disambiguator).

from strpython import Pipeline, STR
from strpython.nlp.disambiguator.wikipedia_cooc import WikipediaDisambiguator

dis= WikipediaDisambiguator()
pip = Pipeline(lang="fr",disambiguator=dis)

Then, to generate a STR for a document, use the Pipeline.pipe_build([<list of text>])

text = """EU looks to Northern Ireland-only backstop to break Brexit impasseEU trade commissioner says he believes ‘penny is finally dropping’ for Boris JohnsonDaniel BoffeyLast modified on Tue 10 Sep 2019 21.10 BST EU flagBrussels hopes Boris Johnson’s EU envoy, David Frost, will further pursue the idea at meetings later this week. Photograph: Clemens Bilan/EPAThe EU is pinning its hopes on British negotiators reverting to the Northern Ireland-only backstop previously rejected by Theresa May as a threat to the constitutional integrity of the UK.With Boris Johnson facing a choice between breaking his word and extending the UK’s membership of the EU beyond 31 October, or bringing back a tweaked deal for a last-gasp vote in parliament, officials and diplomats have expressed hope the prime minister will make a U-turn.EU sources insisted there was no other approach that could work and the negotiations were otherwise doomed to hit a “zombie stage” given the likelihood of an imminent general election.“We don’t know what mandate the prime minister has to propose something and obviously there is a strong division between the parliament and the government,” said Nathalie Loiseau, a former French minister for EU affairs.It is hoped in Brussels that Johnson’s EU envoy, David Frost, will further pursue a Northern Ireland-only backstop during meetings with the European commission’s Brexit taskforce on Wednesday and Friday.The newly nominated EU commissioner for trade, Phil Hogan, a former Irish minister, told the Irish Times he believed the “penny is finally dropping” in Johnson’s government over the lack of alternatives.The idea was originally rejected by May on the grounds it was unpalatable to her partners in the Democratic Unionist party, on which she relied for her working majority. At the time, she said “no British prime minister” could accept a regulatory border being drawn in the Irish Sea.No 10 insisted on Tuesday that Johnson was not pursuing the idea again in the hope of winning the support of more hardline Eurosceptics. “We are not seeking a Northern Ireland-only backstop,” a No 10 spokesman said.However, Arlene Foster, the DUP leader, was sufficiently alarmed to demand a private meeting with Johnson in Downing Street on Tuesday evening, which lasted an hour.Following the meeting, Foster said: “The prime minister rejected a Northern Ireland-only backstop in a letter to Donald Tusk on 19 August. It is undemocratic and unconstitutional and would place a tariff border between Northern Ireland and the rest of the United Kingdom. That would be unacceptable.“During today’s meeting, the prime minister confirmed his rejection of the Northern Ireland-only backstop and his commitment to securing a deal which works for the entire United Kingdom as well as our neighbours in the Republic of Ireland.”Johnson has not been specific about how he will get a new deal with Brussels, but before his meeting with Foster, he said “there is a way” to achieve one “but it will take a lot of hard work”, as he fought back against accusations that his five-week prorogation of parliament is anti-democratic. “Donnez-moi un break – what a load of nonsense,” he said, switching to Franglais.The prime minister has said he wants to remove the Irish backstop from the withdrawal agreement as it would tie Northern Ireland into the single market and the whole of the UK into a shared customs territory with the EU. He has described the arrangement as “undemocratic” and railed against signing a treaty that he says would be “inconsistent with the sovereignty of the UK”.But his proposal in recent days of a single all-Ireland agrifood zone has offered some hope in Brussels that the government may return to the initial EU suggestion of an arrangement that solely keeps Northern Ireland within the EU’s structures.Hogan said Johnson, who visited the Irish prime minister, Leo Varadkar, in Dublin on Monday, had offered some grounds for optimism in his recent talks.“Mr Johnson has made a proposal in the last few days talking about an all-Ireland food zone. That is certainly a clear indication of divergence between Northern Ireland and the Republic of Ireland/the EU and the rest of the UK,” he said.“This is the first time that this has been spoken about by a British prime minister where they are prepared to accept some level of divergence between Northern Ireland and the rest of the UK. If we can build on that, we certainly might get closer to one another in terms of a possible outcome.”Hogan warned, however, that the single agrifood zone was some distance from a solution to the Brexit impasse. “It would have to include all goods … in terms of any agreement,” he said.“I remain hopeful that the penny is finally dropping with the UK that there are pragmatic and practical solutions that can actually be introduced into the debate at this stage, albeit at the 11th hour, that may find some common ground between the EU and the UK. The taoiseach has indicated in the last 24 hours that the Northern Ireland-only backstop is quite an interesting idea to revisit.”Fabian Zuleeg, the chief executive of the European Policy Centre thinktank in Brussels, said the only point of the talks in Brussels would be to discuss an extension of article 50 beyond 31 October or the detail of a Northern Ireland-only arrangement.“But in reality I don’t believe that the UK government wants to go down this route,” he said. “So at the moment I don’t see anything of substance that is being discussed because nothing else can be opened.”After his nomination on Tuesday by the European commission’s president-designate, Ursula von der Leyen, Hogan is to take over any trade talks with the UK once the country leaves the bloc, with the former deputy chief EU negotiator Sabine Weyand as his director general.Hogan said the establishment of a new negotiating team “will take probably six to eight months once we know what the outcome of the present negotiations are … Then I expect it will take a number of years before we conclude the negotiations.”More people in France…... like you, are reading and supporting The Guardian’s independent, investigative journalism than ever before. And unlike many new organisations, we have chosen an approach that allows us to keep our journalism accessible to all, regardless of where they live or what they can afford. But we need your ongoing support to keep working as we do.The Guardian will engage with the most critical issues of our time – from the escalating climate catastrophe to widespread inequality to the influence of big tech on our lives. At a time when factual information is a necessity, we believe that each of us, around the world, deserves access to accurate reporting with integrity at its heart.Our editorial independence means we set our own agenda and voice our own opinions. Guardian journalism is free from commercial and political bias and not influenced by billionaire owners or shareholders. This means we can give a voice to those less heard, explore where others turn away, and rigorously challenge those in power.We need your support to keep delivering quality journalism, to maintain our openness and to protect our precious independence. Every reader contribution, big or small, is so valuable. Support The Guardian from as little as €1 – and it only takes a minute. Thank you."""
list_strs = pip.pipe_build([text])

list_str[0]
# Out[24]:
# STR
#	Spatial Entities : {'GD2589931': 'United Kingdom', 'GD3978256': 'Brussels', 'GD2806921': 'Ireland', 'GD4465124': 'Dublin', 'GD3117352': 'France', 'GD5639369': 'Northern Ireland'}
#	Verbose : False

STR Transformation

Multiple transformation have been proposed : spatial-based and thematic-based

Spatial Based

from strpython.models.transformation.transform import Generalisation, Expansion

# Region limited, Generalisation
gen_r = Generalisation().transform(str_, type_trans="gen", type_gen="bounded", bound="region")
# Country limited, Generalisation
gen_c = Generalisation().transform(str_, type_trans="gen", type_gen="bounded", bound="country")

# Extension (n=1)
# Add n=1 entities found in a radius of 50 km around each entity extended
ext_1 = Expansion().transform(str_, type_trans="ext", adjacent_count=1, distance="50")
# Extension (n=2)
# Add n=2 entities found in a radius of 50 km around each entity extended
ext_2 = Expansion().transform(str_, type_trans="ext", adjacent_count=2, distance="50")

Thematic-Based

from strpython.models.thematic_str import ThematicSTR
from strpython.helpers.terminology.matcher import TerminologyMatcher
from strpython.models.transformation.thematic import *

# Thematic entities matched
term_matcher = TerminologyMatcher("EU envoy government backstop".split())

# Build a Thematic STR, first we initialise
t_str = ThematicSTR.from_STR(str_)

# Integrate the thematic
t_str.setup(text,term_matcher,"fr")
t_str.build()

t_str
#Out[2]:
#STR
#	Spatial Entities : {'GD2589931': 'United Kingdom', 'GD3978256': 'Brussels', 'GD2806921': 'Ireland', 'GD4465124': 'Dublin', 'GD3117352': 'France', 'GD5639369': 'Northern Ireland'}
#	Verbose : False
#Thematic : {0: 'EU', 2: 'government'}

# Apply thematic to generalized and extended version of the str
gen_c_t= get_generalized_with_thematic(gen_c,t_str)
gen_r_t= get_generalized_with_thematic(gen_r,t_str)

ext_1_t = get_extended_with_thematic(ext_1,t_str)
ext_2_t = get_extended_with_thematic(ext_2,t_str)

Plot a STR

To visualize the STR graph, four methods are implemented : interactive map (using folium), static map (using geopandas), a network visualization (using networkx draw methods) and a Lateχ\chi TikZ output (using tikz-network python api).

Output type STR class method
interactive map STR.to_folium()
static map STR.map_projection()
network layout STR.plot()
Lateχ\chi (TikZ) STR.to_latex()

Using the command line

To generate STR, use the generate_str.py.

usage: generate_str.py [-h] [-n {spacy,polyglot,stanford}]
                       [-d {occwiki,most_common,shareprop}] [-t {gen,ext}]
                       [-o OUTPUT]
                       input_pkl

positional arguments:
  input_pkl             Filename of your input. Must be in Pickle format with the following columns :
                          - filename : original filename that contains the text in `content`
                          - id_doc : id of your document
                          - content : text data associated to the document
                          - lang : language of your document

optional arguments:
  -h, --help            show this help message and exit
  -n {spacy,polyglot,stanford}, --ner {spacy,polyglot,stanford}
                        The Named Entity Recognizer you wish to use
  -d {occwiki,most_common,shareprop}, --disambiguator {occwiki,most_common,shareprop}
                        The Named Entity disambiguator you wish to use
  -t {gen,ext}, --transform {gen,ext}
                        Transformation to apply
  -o OUTPUT, --output OUTPUT
                        Output Filename

Bibliography

[1]Jacques Fize, Mathieu Roche, Maguelonne Teisseire Matching heterogeneous textual data using spatial features Inteligent Data Analysis Journal

[2]Jacques Fize, Mathieu Roche, Maguelonne Teisseire Matching heterogeneous textual data using spatial features 13th International Workshop on Spatial and Spatiotemporal Data Mining (SSTDM-18)