Commit 694c8203 authored by Fize Jacques's avatar Fize Jacques
Browse files

ADD STR EXPERIMENTS ON PADIWEB

parent 50d2cbde
No related merge requests found
Showing with 1933 additions and 0 deletions
+1933 -0
%% Cell type:code id: tags:
``` python
import pandas as pd
import networkx as nx
import bqplot.pyplot as plt
%matplotlib inline
```
%% Cell type:code id: tags:
``` python
data_bilan=pd.read_csv("is_bilan.csv",sep=";")
```
%% Cell type:code id: tags:
``` python
data_bilan["IS_BILAN"]=data_bilan["IS_BILAN"].apply(lambda x: "BILAN" if x ==1 else "EPIDEMIE")
```
%% Cell type:markdown id: tags:
# Analyse de la structure des STRs avec un cas d'étude : Bilan/Récapitulatif d'une épidémie
**La spatialité s'exprime-t-elle de la même manière dans certaines classes ou types de document ?** Dans le domaine de surveillance d'épidémies animales utilisant Google News, les chercheurs ont besoin de différencier un récapitulatif/bilan de la situation concernant une épidémie et la déclaration de celle-ci. Dans cette expérimentation, nous allons essayer de voir si ces deux classes de documents possèdent des caractéristiques spécifiques au travers de la STR.
## Définition des deux classes
À l'aide du corpus de PadiWeb, on selectionne un échantillon de 100 documents que l'on divise en deux classes:
* **Bilan**. Un récapitulatif d'un événement terminé ou en cours.
* **Épidémie**. Son but est d'annoncer le déclenchment d'une épidémie (le point de départ).
L'effectif de chacune des classes est indiqué ci-dessous.
%% Cell type:code id: tags:
``` python
data_bilan.groupby("IS_BILAN").count().plot.pie("ID_TEXT")
```
%% Output
<matplotlib.axes._subplots.AxesSubplot at 0x109c1af98>
%% Cell type:code id: tags:
``` python
import numpy as np
def number_of_edges(x,color=None):
"""
Dedicated function to count edges based on their color
"""
if not color:
return len(x.number_of_edges())
edges=list(x.edges(data=True))
cp=0
for ed in edges:
if ed[-1]["color"] == color:
cp+=1
return cp
def flattern(A):
rt = []
for i in A:
if isinstance(i, list):
rt.extend(flattern(i))
elif isinstance(i, np.ndarray):
rt.extend(flattern(i.tolist()))
else:
rt.append(i)
return rt
def most_common(lst):
if not lst:
return "P-PPL"
if len(list(set(lst))) >1 and "P-PPL" in set(lst):
lst=[x for x in lst if x != "PPL"]
return max(set(lst), key=lst.count)
```
%% Cell type:code id: tags:
``` python
import nxpd
nxpd.nxpdParams["show"]="ipynb"
from strpython.helpers.gazeteer_helpers import get_data
def class_graph(g):
mapping={}
g2=g.copy()
for n in g2:
c=get_data(n)["class"]
g2.nodes[n]["label"]=most_common(c)
return g2
```
%% Cell type:markdown id: tags:
Pour faire une comparaison entre les STRs générées dans chaque classe de document, on utilise plusieurs indicateurs :
* **Granularité** La granularité est définie le niveau dans l'échelle spatiale ($village$ < $ville < pays$) d'une entité. Ici, elle nous indique à quel niveau la spatialité est utilisé pour décrire la situation.
* **Densité** La densité est définie par le nombre d'arêtes moyen pour un noeud dans un graphe. Un graphe d'une STR avec une forte densité, indique une forte cohésion entre les entités spatiales.
* **Ratio $Relation_i/Relation_j$** Dans la STR, chaque entité peut-être reliée à une autre par deux type de relations : inclusion et adjacence. Avec ce ratio, on souhaite savoir combien il existe de $relation_j$ pour une $relation_i$. Par exemple, pour une relation d'inclusion, combien de relations d'adjacence ?
* **Nombre de noeuds(entités spatiales)** Indique si des textes sont fortement spatialisés.
### Calcul de la granularité d'une STR
On récupére les **classes associées** aux différentes **entités de la STR**, puis on récupére **la classe la plus fréquente**. Par exemple:
$STR_1$ --> France, Montpellier, Clapiers, Caen --> [A-PCLI], [P-PPL, A-ADM4], [A-ADM4], [A-ADM4]
On a donc pour granularité : **A-ADM4**
### Calcul de la densité d'une STR
Le calcul de la densité d'une STR (ici son graphe) se calcule à l'aide de la formule suivante : $$\frac{2\times|E|}{|V|\times(|V|-1)}$$
%% Cell type:code id: tags:
``` python
data_bilan["GRAPH"]=data_bilan["ID_TEXT"].apply(lambda x:nx.read_gexf("str_PADI100/{0}.gexf".format(x)))
data_bilan["GRAPH_C"]=data_bilan["GRAPH"].apply(lambda x:class_graph(x))
```
%% Cell type:code id: tags:
``` python
data_bilan["DENSITY"]=data_bilan["GRAPH"].apply(lambda x: (2*x.number_of_edges())/(x.number_of_nodes()*(x.number_of_nodes()-1)) if len(x) >1 else 0)
data_bilan["NB_NODE"]=data_bilan["GRAPH"].apply(lambda x: len(x))
data_bilan["NB_ED_ADJ"]=data_bilan["GRAPH"].apply(lambda x: number_of_edges(x,color="green"))
data_bilan["NB_ED_INC"]=data_bilan["GRAPH"].apply(lambda x: number_of_edges(x,color="red"))
data_bilan["R_ADJ_INC"]=((data_bilan["NB_ED_ADJ"]/2)/data_bilan["NB_ED_INC"]).replace([np.inf, -np.inf], np.nan).fillna(0)
data_bilan["R_INC_ADJ"]=(data_bilan["NB_ED_INC"]/(data_bilan["NB_ED_ADJ"]/2)).replace([np.inf, -np.inf], np.nan).fillna(0)
```
%% Cell type:code id: tags:
``` python
data_bilan["CLASS"]=data_bilan["GRAPH"].apply(lambda x: flattern([get_data(n)["class"] for n in list(x.nodes())]))
data_bilan["MEAN_LVL"]=data_bilan["CLASS"].apply(lambda x: most_common(x) if len(x)>0 else "")
```
%% Cell type:code id: tags:
``` python
data_bilan.head(9)
```
%% Output
ID_TEXT IS_BILAN MIXED \
0 0 BILAN 0
1 1 EPIDEMIE 0
2 2 BILAN 1
3 3 BILAN 0
4 4 EPIDEMIE 0
5 5 EPIDEMIE 1
6 6 EPIDEMIE 0
7 7 BILAN 0
8 8 BILAN 0
GRAPH \
0 (GD4103071, GD4468122, GD95073, GD791183)
1 (GD1685421)
2 (GD2032795)
3 (GD1626932, GD3274230)
4 (GD639917, GD3789919, GD1316637, GD2055944)
5 (GD639917, GD3995806, GD3789919, GD1316637, GD...
6 (GD639917, GD3789919, GD2055944)
7 (GD5526704, GD976842, GD1316637, GD2055944)
8 (GD2908705, GD1404948, GD9642903, GD3995806, G...
GRAPH_C DENSITY NB_NODE \
0 (GD4103071, GD4468122, GD95073, GD791183) 0.000000 4
1 (GD1685421) 0.000000 1
2 (GD2032795) 0.000000 1
3 (GD1626932, GD3274230) 0.000000 2
4 (GD639917, GD3789919, GD1316637, GD2055944) 0.166667 4
5 (GD639917, GD3995806, GD3789919, GD1316637, GD... 0.200000 5
6 (GD639917, GD3789919, GD2055944) 0.000000 3
7 (GD5526704, GD976842, GD1316637, GD2055944) 0.333333 4
8 (GD2908705, GD1404948, GD9642903, GD3995806, G... 0.285714 7
NB_ED_ADJ NB_ED_INC R_ADJ_INC R_INC_ADJ \
0 0 0 0.0 0.0
1 0 0 0.0 0.0
2 0 0 0.0 0.0
3 0 0 0.0 0.0
4 0 1 0.0 0.0
5 0 2 0.0 0.0
6 0 0 0.0 0.0
7 2 0 0.0 0.0
8 4 2 1.0 1.0
CLASS MEAN_LVL
0 [P-PPLA, P-PPL, P-PPLA, A-ADM1, P-PPLA] P-PPLA
1 [P-PPL] P-PPL
2 [A-PCLI] A-PCLI
3 [A-PCLI, P-PPL] P-PPL
4 [A-PCLI, A-ADM1, P-PPLA, P-PPLC, A-PCLI, A-PCLI] A-PCLI
5 [A-PCLI, P-PPL, A-ADM1, P-PPLA, P-PPLC, A-PCLI... A-PCLI
6 [A-PCLI, A-ADM1, P-PPLA, P-PPLC, A-PCLI] A-PCLI
7 [A-PCLI, A-PCLI, A-PCLI, A-PCLI] A-PCLI
8 [A-ADM1, P-PPL, P-PPL, P-PPL, A-ADM1, A-ADM1, ... A-ADM1
%% Cell type:markdown id: tags:
# Résultats
### Granularité sur les documents de classe **BILAN**
%% Cell type:code id: tags:
``` python
data_bilan[data_bilan["IS_BILAN"] == "BILAN"].groupby("MEAN_LVL").count().plot.pie("ID_TEXT")
```
%% Output
<matplotlib.axes._subplots.AxesSubplot at 0x10a1db828>
%% Cell type:markdown id: tags:
### Granularité sur les documents de classe **EPIDEMIE**
%% Cell type:code id: tags:
``` python
data_bilan[data_bilan["IS_BILAN"] == "EPIDEMIE"].groupby("MEAN_LVL").count().plot.pie("ID_TEXT")
```
%% Output
<matplotlib.axes._subplots.AxesSubplot at 0x10a2c9ba8>
%% Cell type:markdown id: tags:
### Valeurs moyennes obtenues pour chaque indicateur
%% Cell type:code id: tags:
``` python
data_bilan.groupby("IS_BILAN").mean()
```
%% Output
ID_TEXT MIXED DENSITY NB_NODE NB_ED_ADJ NB_ED_INC \
IS_BILAN
BILAN 51.588235 0.014706 0.293545 6.455882 4.117647 2.705882
EPIDEMIE 46.727273 0.030303 0.379501 4.636364 2.969697 1.303030
R_ADJ_INC R_INC_ADJ
IS_BILAN
BILAN 0.559194 1.049048
EPIDEMIE 0.398990 0.240657
%% Cell type:markdown id: tags:
## Analyse des résultats
### Granularité
En regardant les deux camemberts ci-dessus, on remarque que la granularité observé dans les STRs différe selon le type de texte. Les textes de classe **épidémie** sont généralement plus "haut" dans la hiérarchie spatiale, de part la forte présence de classe telles que: A-PCLI ($\approx$Pays), A-ADM1(premier découpage administratif d'un pays *equiv* région en France, état aux Etats-Unis, *etc.*). Ceux de la classe **BILAN**, ont une granularité un peu plus fine avec un spectre de classe plus étendue : T-ISL (ile), S-BLDG (batiment).
En se basant sur la classification proposé, on conclue que les documents de type **bilan** sont plus "fin" spatialement que ceux de la classe **épidémie**.
### Densité/ Nombre de noeuds/ Nombre d'arrêtes
Malheuresement la densité moyenne ne permet de faire aucune conclusion.
On observe que le nombre de noeuds dans les documents de classes Bilan est plus élevé. Ce qui indique que le nombre d'entités spatiales dans ces documents est plus élevés. Ce qui semble tout à fait normal car contrairement à une déclaration d'épidémie, le bilan fait un récapitulatif de la propagation d'une maladie sur un laps de temps et une spatialité (souvent) plus importante.
Pour le nombre de relations d'ajacence et d'inclusion, on observe un même rapport de "force" : Il y a plus d'arêtes d'inclusion que d'arêtes d'adjacence.
### Ratio Adjacence/Inclusion VS Inclusion/Adjacence
| CLASSE | ADJ/INC | INC/ADJ |
|----------|----------|----------|
| BILAN | 0.559194 | 1.04905 |
| EPIDEMIE | 0.39899 | 0.240657 |
On reprend les résultats concernat les rapports ADJ/INC (combien de relations d'inclusion pour une relation d'adjacence ?) et INC/ADJ (le contraire de ADJ/INC). A partir de ces résultats, on observe que les rapports sont inversés ! Pour les documents de classe EPIDEMIE, on va favoriser plus les relations d'inclusion, contrairement aux documents de classe BILAN qui favorisent les relations d'adjacences.
Est-ce que parce que les relations d'inclusions sont favorisés (ratio ADJ/INC élevé), on se retrouve sur des zones limitées, donc plus local ? Ca rentre bien dans le cadre de la classe épidémie.
Est-ce qu'un ratio élevé INC/ADJ traduit une information concernant la dispertion d'une maladie ?
%% Cell type:code id: tags:
``` python
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
```
%% Cell type:code id: tags:
``` python
from nxpd import draw
def f(x):
global data_bilan
return draw(data_bilan[data_bilan["IS_BILAN"]=="BILAN"].iloc[x]["GRAPH_C"],show="ipynb")
interact(f, x=widgets.IntSlider(min=0,max=100,step=1));
```
%% Output
%% Cell type:code id: tags:
``` python
dd=data_bilan.groupby("IS_BILAN").mean()
```
%% Cell type:code id: tags:
``` python
from tabulate import tabulate
print(tabulate(dd[["R_ADJ_INC", "R_INC_ADJ"]],tablefmt="pipe"))
```
%% Output
|:---------|---------:|---------:|
| BILAN | 0.559194 | 1.04905 |
| EPIDEMIE | 0.39899 | 0.240657 |
%% Cell type:code id: tags:
``` python
```
File added
dispersion_bilan_outbreak.png

12.7 KB

is_bilan.csv 0 → 100644
ID_TEXT;IS_BILAN;MIXED
0;1;0
1;0;0
2;1;1
3;1;0
4;0;0
5;0;1
6;0;0
7;1;0
8;1;0
9;0;0
10;0;0
11;0;0
12;0;0
13;1;0
14;1;0
15;1;0
16;1;0
17;0;0
18;0;0
19;1;0
20;1;0
21;1;0
22;1;0
23;0;0
24;1;0
25;0;0
26;0;0
27;1;0
28;1;0
29;1;0
30;1;0
31;1;0
32;1;0
33;0;0
34;0;0
35;1;0
36;1;0
37;1;0
38;1;0
39;1;0
40;1;0
41;1;0
42;0;0
43;1;0
44;1;0
45;1;0
46;1;0
47;1;0
48;1;0
49;0;0
50;1;0
51;1;0
52;1;0
53;1;0
54;1;0
55;0;0
56;1;0
57;1;0
58;1;0
59;0;0
60;1;0
61;0;0
62;0;0
63;1;0
64;1;0
65;0;0
66;0;0
67;0;0
68;0;0
69;1;0
70;1;0
71;1;0
72;1;0
73;0;0
74;0;0
75;1;0
76;1;0
77;1;0
78;1;0
79;1;0
80;1;0
81;1;0
82;1;0
83;0;0
84;1;0
85;1;0
86;1;0
87;1;0
88;1;0
89;1;0
90;1;0
91;1;0
92;0;0
93;1;0
94;1;0
95;1;0
96;0;0
97;0;0
98;1;0
99;0;0
100;0;0
\ No newline at end of file
is_bilan.xlsx 0 → 100644
File added
notebook.tex 0 → 100644
% Default to the notebook output style
% Inherit from the specified cell style.
\documentclass[11pt]{article}
\usepackage[T1]{fontenc}
% Nicer default font (+ math font) than Computer Modern for most use cases
\usepackage{mathpazo}
% Basic figure setup, for now with no caption control since it's done
% automatically by Pandoc (which extracts ![](path) syntax from Markdown).
\usepackage{graphicx}
% We will generate all images so they have a width \maxwidth. This means
% that they will get their normal width if they fit onto the page, but
% are scaled down if they would overflow the margins.
\makeatletter
\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth
\else\Gin@nat@width\fi}
\makeatother
\let\Oldincludegraphics\includegraphics
% Set max figure width to be 80% of text width, for now hardcoded.
\renewcommand{\includegraphics}[1]{\Oldincludegraphics[width=.8\maxwidth]{#1}}
% Ensure that by default, figures have no caption (until we provide a
% proper Figure object with a Caption API and a way to capture that
% in the conversion process - todo).
\usepackage{caption}
\DeclareCaptionLabelFormat{nolabel}{}
\captionsetup{labelformat=nolabel}
\usepackage{adjustbox} % Used to constrain images to a maximum size
\usepackage{xcolor} % Allow colors to be defined
\usepackage{enumerate} % Needed for markdown enumerations to work
\usepackage{geometry} % Used to adjust the document margins
\usepackage{amsmath} % Equations
\usepackage{amssymb} % Equations
\usepackage{textcomp} % defines textquotesingle
% Hack from http://tex.stackexchange.com/a/47451/13684:
\AtBeginDocument{%
\def\PYZsq{\textquotesingle}% Upright quotes in Pygmentized code
}
\usepackage{upquote} % Upright quotes for verbatim code
\usepackage{eurosym} % defines \euro
\usepackage[mathletters]{ucs} % Extended unicode (utf-8) support
\usepackage[utf8x]{inputenc} % Allow utf-8 characters in the tex document
\usepackage{fancyvrb} % verbatim replacement that allows latex
\usepackage{grffile} % extends the file name processing of package graphics
% to support a larger range
% The hyperref package gives us a pdf with properly built
% internal navigation ('pdf bookmarks' for the table of contents,
% internal cross-reference links, web links for URLs, etc.)
\usepackage{hyperref}
\usepackage{longtable} % longtable support required by pandoc >1.10
\usepackage{booktabs} % table support for pandoc > 1.12.2
\usepackage[inline]{enumitem} % IRkernel/repr support (it uses the enumerate* environment)
\usepackage[normalem]{ulem} % ulem is needed to support strikethroughs (\sout)
% normalem makes italics be italics, not underlines
% Colors for the hyperref package
\definecolor{urlcolor}{rgb}{0,.145,.698}
\definecolor{linkcolor}{rgb}{.71,0.21,0.01}
\definecolor{citecolor}{rgb}{.12,.54,.11}
% ANSI colors
\definecolor{ansi-black}{HTML}{3E424D}
\definecolor{ansi-black-intense}{HTML}{282C36}
\definecolor{ansi-red}{HTML}{E75C58}
\definecolor{ansi-red-intense}{HTML}{B22B31}
\definecolor{ansi-green}{HTML}{00A250}
\definecolor{ansi-green-intense}{HTML}{007427}
\definecolor{ansi-yellow}{HTML}{DDB62B}
\definecolor{ansi-yellow-intense}{HTML}{B27D12}
\definecolor{ansi-blue}{HTML}{208FFB}
\definecolor{ansi-blue-intense}{HTML}{0065CA}
\definecolor{ansi-magenta}{HTML}{D160C4}
\definecolor{ansi-magenta-intense}{HTML}{A03196}
\definecolor{ansi-cyan}{HTML}{60C6C8}
\definecolor{ansi-cyan-intense}{HTML}{258F8F}
\definecolor{ansi-white}{HTML}{C5C1B4}
\definecolor{ansi-white-intense}{HTML}{A1A6B2}
% commands and environments needed by pandoc snippets
% extracted from the output of `pandoc -s`
\providecommand{\tightlist}{%
\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
\DefineVerbatimEnvironment{Highlighting}{Verbatim}{commandchars=\\\{\}}
% Add ',fontsize=\small' for more characters per line
\newenvironment{Shaded}{}{}
\newcommand{\KeywordTok}[1]{\textcolor[rgb]{0.00,0.44,0.13}{\textbf{{#1}}}}
\newcommand{\DataTypeTok}[1]{\textcolor[rgb]{0.56,0.13,0.00}{{#1}}}
\newcommand{\DecValTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}}
\newcommand{\BaseNTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}}
\newcommand{\FloatTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}}
\newcommand{\CharTok}[1]{\textcolor[rgb]{0.25,0.44,0.63}{{#1}}}
\newcommand{\StringTok}[1]{\textcolor[rgb]{0.25,0.44,0.63}{{#1}}}
\newcommand{\CommentTok}[1]{\textcolor[rgb]{0.38,0.63,0.69}{\textit{{#1}}}}
\newcommand{\OtherTok}[1]{\textcolor[rgb]{0.00,0.44,0.13}{{#1}}}
\newcommand{\AlertTok}[1]{\textcolor[rgb]{1.00,0.00,0.00}{\textbf{{#1}}}}
\newcommand{\FunctionTok}[1]{\textcolor[rgb]{0.02,0.16,0.49}{{#1}}}
\newcommand{\RegionMarkerTok}[1]{{#1}}
\newcommand{\ErrorTok}[1]{\textcolor[rgb]{1.00,0.00,0.00}{\textbf{{#1}}}}
\newcommand{\NormalTok}[1]{{#1}}
% Additional commands for more recent versions of Pandoc
\newcommand{\ConstantTok}[1]{\textcolor[rgb]{0.53,0.00,0.00}{{#1}}}
\newcommand{\SpecialCharTok}[1]{\textcolor[rgb]{0.25,0.44,0.63}{{#1}}}
\newcommand{\VerbatimStringTok}[1]{\textcolor[rgb]{0.25,0.44,0.63}{{#1}}}
\newcommand{\SpecialStringTok}[1]{\textcolor[rgb]{0.73,0.40,0.53}{{#1}}}
\newcommand{\ImportTok}[1]{{#1}}
\newcommand{\DocumentationTok}[1]{\textcolor[rgb]{0.73,0.13,0.13}{\textit{{#1}}}}
\newcommand{\AnnotationTok}[1]{\textcolor[rgb]{0.38,0.63,0.69}{\textbf{\textit{{#1}}}}}
\newcommand{\CommentVarTok}[1]{\textcolor[rgb]{0.38,0.63,0.69}{\textbf{\textit{{#1}}}}}
\newcommand{\VariableTok}[1]{\textcolor[rgb]{0.10,0.09,0.49}{{#1}}}
\newcommand{\ControlFlowTok}[1]{\textcolor[rgb]{0.00,0.44,0.13}{\textbf{{#1}}}}
\newcommand{\OperatorTok}[1]{\textcolor[rgb]{0.40,0.40,0.40}{{#1}}}
\newcommand{\BuiltInTok}[1]{{#1}}
\newcommand{\ExtensionTok}[1]{{#1}}
\newcommand{\PreprocessorTok}[1]{\textcolor[rgb]{0.74,0.48,0.00}{{#1}}}
\newcommand{\AttributeTok}[1]{\textcolor[rgb]{0.49,0.56,0.16}{{#1}}}
\newcommand{\InformationTok}[1]{\textcolor[rgb]{0.38,0.63,0.69}{\textbf{\textit{{#1}}}}}
\newcommand{\WarningTok}[1]{\textcolor[rgb]{0.38,0.63,0.69}{\textbf{\textit{{#1}}}}}
% Define a nice break command that doesn't care if a line doesn't already
% exist.
\def\br{\hspace*{\fill} \\* }
% Math Jax compatability definitions
\def\gt{>}
\def\lt{<}
% Document parameters
\title{Experiment100PadiWeb}
% Pygments definitions
\makeatletter
\def\PY@reset{\let\PY@it=\relax \let\PY@bf=\relax%
\let\PY@ul=\relax \let\PY@tc=\relax%
\let\PY@bc=\relax \let\PY@ff=\relax}
\def\PY@tok#1{\csname PY@tok@#1\endcsname}
\def\PY@toks#1+{\ifx\relax#1\empty\else%
\PY@tok{#1}\expandafter\PY@toks\fi}
\def\PY@do#1{\PY@bc{\PY@tc{\PY@ul{%
\PY@it{\PY@bf{\PY@ff{#1}}}}}}}
\def\PY#1#2{\PY@reset\PY@toks#1+\relax+\PY@do{#2}}
\expandafter\def\csname PY@tok@w\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.73,0.73,0.73}{##1}}}
\expandafter\def\csname PY@tok@c\endcsname{\let\PY@it=\textit\def\PY@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
\expandafter\def\csname PY@tok@cp\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.74,0.48,0.00}{##1}}}
\expandafter\def\csname PY@tok@k\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
\expandafter\def\csname PY@tok@kp\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
\expandafter\def\csname PY@tok@kt\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.69,0.00,0.25}{##1}}}
\expandafter\def\csname PY@tok@o\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
\expandafter\def\csname PY@tok@ow\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.67,0.13,1.00}{##1}}}
\expandafter\def\csname PY@tok@nb\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
\expandafter\def\csname PY@tok@nf\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
\expandafter\def\csname PY@tok@nc\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
\expandafter\def\csname PY@tok@nn\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
\expandafter\def\csname PY@tok@ne\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.82,0.25,0.23}{##1}}}
\expandafter\def\csname PY@tok@nv\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
\expandafter\def\csname PY@tok@no\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.53,0.00,0.00}{##1}}}
\expandafter\def\csname PY@tok@nl\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.63,0.63,0.00}{##1}}}
\expandafter\def\csname PY@tok@ni\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.60,0.60,0.60}{##1}}}
\expandafter\def\csname PY@tok@na\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.49,0.56,0.16}{##1}}}
\expandafter\def\csname PY@tok@nt\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
\expandafter\def\csname PY@tok@nd\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.67,0.13,1.00}{##1}}}
\expandafter\def\csname PY@tok@s\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
\expandafter\def\csname PY@tok@sd\endcsname{\let\PY@it=\textit\def\PY@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
\expandafter\def\csname PY@tok@si\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.73,0.40,0.53}{##1}}}
\expandafter\def\csname PY@tok@se\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.73,0.40,0.13}{##1}}}
\expandafter\def\csname PY@tok@sr\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.73,0.40,0.53}{##1}}}
\expandafter\def\csname PY@tok@ss\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
\expandafter\def\csname PY@tok@sx\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
\expandafter\def\csname PY@tok@m\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
\expandafter\def\csname PY@tok@gh\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.00,0.00,0.50}{##1}}}
\expandafter\def\csname PY@tok@gu\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.50,0.00,0.50}{##1}}}
\expandafter\def\csname PY@tok@gd\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.63,0.00,0.00}{##1}}}
\expandafter\def\csname PY@tok@gi\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.00,0.63,0.00}{##1}}}
\expandafter\def\csname PY@tok@gr\endcsname{\def\PY@tc##1{\textcolor[rgb]{1.00,0.00,0.00}{##1}}}
\expandafter\def\csname PY@tok@ge\endcsname{\let\PY@it=\textit}
\expandafter\def\csname PY@tok@gs\endcsname{\let\PY@bf=\textbf}
\expandafter\def\csname PY@tok@gp\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.00,0.00,0.50}{##1}}}
\expandafter\def\csname PY@tok@go\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.53,0.53,0.53}{##1}}}
\expandafter\def\csname PY@tok@gt\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.00,0.27,0.87}{##1}}}
\expandafter\def\csname PY@tok@err\endcsname{\def\PY@bc##1{\setlength{\fboxsep}{0pt}\fcolorbox[rgb]{1.00,0.00,0.00}{1,1,1}{\strut ##1}}}
\expandafter\def\csname PY@tok@kc\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
\expandafter\def\csname PY@tok@kd\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
\expandafter\def\csname PY@tok@kn\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
\expandafter\def\csname PY@tok@kr\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
\expandafter\def\csname PY@tok@bp\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
\expandafter\def\csname PY@tok@fm\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
\expandafter\def\csname PY@tok@vc\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
\expandafter\def\csname PY@tok@vg\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
\expandafter\def\csname PY@tok@vi\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
\expandafter\def\csname PY@tok@vm\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
\expandafter\def\csname PY@tok@sa\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
\expandafter\def\csname PY@tok@sb\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
\expandafter\def\csname PY@tok@sc\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
\expandafter\def\csname PY@tok@dl\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
\expandafter\def\csname PY@tok@s2\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
\expandafter\def\csname PY@tok@sh\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
\expandafter\def\csname PY@tok@s1\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
\expandafter\def\csname PY@tok@mb\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
\expandafter\def\csname PY@tok@mf\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
\expandafter\def\csname PY@tok@mh\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
\expandafter\def\csname PY@tok@mi\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
\expandafter\def\csname PY@tok@il\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
\expandafter\def\csname PY@tok@mo\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
\expandafter\def\csname PY@tok@ch\endcsname{\let\PY@it=\textit\def\PY@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
\expandafter\def\csname PY@tok@cm\endcsname{\let\PY@it=\textit\def\PY@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
\expandafter\def\csname PY@tok@cpf\endcsname{\let\PY@it=\textit\def\PY@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
\expandafter\def\csname PY@tok@c1\endcsname{\let\PY@it=\textit\def\PY@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
\expandafter\def\csname PY@tok@cs\endcsname{\let\PY@it=\textit\def\PY@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
\def\PYZbs{\char`\\}
\def\PYZus{\char`\_}
\def\PYZob{\char`\{}
\def\PYZcb{\char`\}}
\def\PYZca{\char`\^}
\def\PYZam{\char`\&}
\def\PYZlt{\char`\<}
\def\PYZgt{\char`\>}
\def\PYZsh{\char`\#}
\def\PYZpc{\char`\%}
\def\PYZdl{\char`\$}
\def\PYZhy{\char`\-}
\def\PYZsq{\char`\'}
\def\PYZdq{\char`\"}
\def\PYZti{\char`\~}
% for compatibility with earlier versions
\def\PYZat{@}
\def\PYZlb{[}
\def\PYZrb{]}
\makeatother
% Exact colors from NB
\definecolor{incolor}{rgb}{0.0, 0.0, 0.5}
\definecolor{outcolor}{rgb}{0.545, 0.0, 0.0}
% Prevent overflowing lines due to hard-to-break entities
\sloppy
% Setup hyperref package
\hypersetup{
breaklinks=true, % so long urls are correctly broken across lines
colorlinks=true,
urlcolor=urlcolor,
linkcolor=linkcolor,
citecolor=citecolor,
}
% Slightly bigger margins than the latex defaults
\geometry{verbose,tmargin=1in,bmargin=1in,lmargin=1in,rmargin=1in}
\begin{document}
\maketitle
\begin{Verbatim}[commandchars=\\\{\}]
{\color{incolor}In [{\color{incolor}1}]:} \PY{k+kn}{import} \PY{n+nn}{pandas} \PY{k}{as} \PY{n+nn}{pd}
\PY{k+kn}{import} \PY{n+nn}{networkx} \PY{k}{as} \PY{n+nn}{nx}
\PY{k+kn}{import} \PY{n+nn}{bqplot}\PY{n+nn}{.}\PY{n+nn}{pyplot} \PY{k}{as} \PY{n+nn}{plt}
\PY{o}{\PYZpc{}}\PY{k}{matplotlib} inline
\end{Verbatim}
\begin{Verbatim}[commandchars=\\\{\}]
{\color{incolor}In [{\color{incolor}2}]:} \PY{n}{data\PYZus{}bilan}\PY{o}{=}\PY{n}{pd}\PY{o}{.}\PY{n}{read\PYZus{}csv}\PY{p}{(}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{is\PYZus{}bilan.csv}\PY{l+s+s2}{\PYZdq{}}\PY{p}{,}\PY{n}{sep}\PY{o}{=}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{;}\PY{l+s+s2}{\PYZdq{}}\PY{p}{)}
\end{Verbatim}
\begin{Verbatim}[commandchars=\\\{\}]
{\color{incolor}In [{\color{incolor}3}]:} \PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{IS\PYZus{}BILAN}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{=}\PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{IS\PYZus{}BILAN}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{.}\PY{n}{apply}\PY{p}{(}\PY{k}{lambda} \PY{n}{x}\PY{p}{:} \PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{BILAN}\PY{l+s+s2}{\PYZdq{}} \PY{k}{if} \PY{n}{x} \PY{o}{==}\PY{l+m+mi}{1} \PY{k}{else} \PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{EPIDEMIE}\PY{l+s+s2}{\PYZdq{}}\PY{p}{)}
\end{Verbatim}
\section{Analyse de la structure des STRs avec un cas d'étude :
Bilan/Récapitulatif d'une
épidémie}\label{analyse-de-la-structure-des-strs-avec-un-cas-duxe9tude-bilanruxe9capitulatif-dune-uxe9piduxe9mie}
\textbf{La spatialité s'exprime-t-elle de la même manière dans certaines
classes ou types de document ?} Dans la surveillance d'épidémies
animales utilisant Google News, les chercheurs ont besoin de
différencier un récapitulatif/bilan de la situation concernant une
épidémie et la déclaration de celle-ci. Dans cette expérimentation, nous
allons essayer de voir si ces deux classes de documents possèdent des
caractéristiques spécifiques au travers de la STR.
\subsection{Définition des deux
classes}\label{duxe9finition-des-deux-classes}
A l'aide du corpus de PadiWeb, on selectionne un échantillon de 100
documents que l'on divise en deux classes:
\begin{itemize}
\tightlist
\item
\textbf{Bilan}. Un récapitulatif d'un événement terminé ou en cours.
\item
\textbf{Épidémie}. Son but est d'annoncer le déclenchment d'une
épidémie (le point de départ).
\end{itemize}
L'effectif de chacune des classes est indiqué ci-dessous.
\begin{Verbatim}[commandchars=\\\{\}]
{\color{incolor}In [{\color{incolor}4}]:} \PY{n}{data\PYZus{}bilan}\PY{o}{.}\PY{n}{groupby}\PY{p}{(}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{IS\PYZus{}BILAN}\PY{l+s+s2}{\PYZdq{}}\PY{p}{)}\PY{o}{.}\PY{n}{count}\PY{p}{(}\PY{p}{)}\PY{o}{.}\PY{n}{plot}\PY{o}{.}\PY{n}{pie}\PY{p}{(}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{ID\PYZus{}TEXT}\PY{l+s+s2}{\PYZdq{}}\PY{p}{)}
\end{Verbatim}
\begin{Verbatim}[commandchars=\\\{\}]
{\color{outcolor}Out[{\color{outcolor}4}]:} <matplotlib.axes.\_subplots.AxesSubplot at 0x109c1af98>
\end{Verbatim}
\begin{center}
\adjustimage{max size={0.9\linewidth}{0.9\paperheight}}{output_4_1.png}
\end{center}
{ \hspace*{\fill} \\}
\begin{Verbatim}[commandchars=\\\{\}]
{\color{incolor}In [{\color{incolor}5}]:} \PY{k+kn}{import} \PY{n+nn}{numpy} \PY{k}{as} \PY{n+nn}{np}
\PY{k}{def} \PY{n+nf}{number\PYZus{}of\PYZus{}edges}\PY{p}{(}\PY{n}{x}\PY{p}{,}\PY{n}{color}\PY{o}{=}\PY{k+kc}{None}\PY{p}{)}\PY{p}{:}
\PY{l+s+sd}{\PYZdq{}\PYZdq{}\PYZdq{}}
\PY{l+s+sd}{ Dedicated function to count edges based on their color}
\PY{l+s+sd}{ \PYZdq{}\PYZdq{}\PYZdq{}}
\PY{k}{if} \PY{o+ow}{not} \PY{n}{color}\PY{p}{:}
\PY{k}{return} \PY{n+nb}{len}\PY{p}{(}\PY{n}{x}\PY{o}{.}\PY{n}{number\PYZus{}of\PYZus{}edges}\PY{p}{(}\PY{p}{)}\PY{p}{)}
\PY{n}{edges}\PY{o}{=}\PY{n+nb}{list}\PY{p}{(}\PY{n}{x}\PY{o}{.}\PY{n}{edges}\PY{p}{(}\PY{n}{data}\PY{o}{=}\PY{k+kc}{True}\PY{p}{)}\PY{p}{)}
\PY{n}{cp}\PY{o}{=}\PY{l+m+mi}{0}
\PY{k}{for} \PY{n}{ed} \PY{o+ow}{in} \PY{n}{edges}\PY{p}{:}
\PY{k}{if} \PY{n}{ed}\PY{p}{[}\PY{o}{\PYZhy{}}\PY{l+m+mi}{1}\PY{p}{]}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{color}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]} \PY{o}{==} \PY{n}{color}\PY{p}{:}
\PY{n}{cp}\PY{o}{+}\PY{o}{=}\PY{l+m+mi}{1}
\PY{k}{return} \PY{n}{cp}
\PY{k}{def} \PY{n+nf}{flattern}\PY{p}{(}\PY{n}{A}\PY{p}{)}\PY{p}{:}
\PY{n}{rt} \PY{o}{=} \PY{p}{[}\PY{p}{]}
\PY{k}{for} \PY{n}{i} \PY{o+ow}{in} \PY{n}{A}\PY{p}{:}
\PY{k}{if} \PY{n+nb}{isinstance}\PY{p}{(}\PY{n}{i}\PY{p}{,} \PY{n+nb}{list}\PY{p}{)}\PY{p}{:}
\PY{n}{rt}\PY{o}{.}\PY{n}{extend}\PY{p}{(}\PY{n}{flattern}\PY{p}{(}\PY{n}{i}\PY{p}{)}\PY{p}{)}
\PY{k}{elif} \PY{n+nb}{isinstance}\PY{p}{(}\PY{n}{i}\PY{p}{,} \PY{n}{np}\PY{o}{.}\PY{n}{ndarray}\PY{p}{)}\PY{p}{:}
\PY{n}{rt}\PY{o}{.}\PY{n}{extend}\PY{p}{(}\PY{n}{flattern}\PY{p}{(}\PY{n}{i}\PY{o}{.}\PY{n}{tolist}\PY{p}{(}\PY{p}{)}\PY{p}{)}\PY{p}{)}
\PY{k}{else}\PY{p}{:}
\PY{n}{rt}\PY{o}{.}\PY{n}{append}\PY{p}{(}\PY{n}{i}\PY{p}{)}
\PY{k}{return} \PY{n}{rt}
\PY{k}{def} \PY{n+nf}{most\PYZus{}common}\PY{p}{(}\PY{n}{lst}\PY{p}{)}\PY{p}{:}
\PY{k}{if} \PY{o+ow}{not} \PY{n}{lst}\PY{p}{:}
\PY{k}{return} \PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{P\PYZhy{}PPL}\PY{l+s+s2}{\PYZdq{}}
\PY{k}{if} \PY{n+nb}{len}\PY{p}{(}\PY{n+nb}{list}\PY{p}{(}\PY{n+nb}{set}\PY{p}{(}\PY{n}{lst}\PY{p}{)}\PY{p}{)}\PY{p}{)} \PY{o}{\PYZgt{}}\PY{l+m+mi}{1} \PY{o+ow}{and} \PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{P\PYZhy{}PPL}\PY{l+s+s2}{\PYZdq{}} \PY{o+ow}{in} \PY{n+nb}{set}\PY{p}{(}\PY{n}{lst}\PY{p}{)}\PY{p}{:}
\PY{n}{lst}\PY{o}{=}\PY{p}{[}\PY{n}{x} \PY{k}{for} \PY{n}{x} \PY{o+ow}{in} \PY{n}{lst} \PY{k}{if} \PY{n}{x} \PY{o}{!=} \PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{PPL}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}
\PY{k}{return} \PY{n+nb}{max}\PY{p}{(}\PY{n+nb}{set}\PY{p}{(}\PY{n}{lst}\PY{p}{)}\PY{p}{,} \PY{n}{key}\PY{o}{=}\PY{n}{lst}\PY{o}{.}\PY{n}{count}\PY{p}{)}
\end{Verbatim}
\begin{Verbatim}[commandchars=\\\{\}]
{\color{incolor}In [{\color{incolor}6}]:} \PY{k+kn}{import} \PY{n+nn}{nxpd}
\PY{n}{nxpd}\PY{o}{.}\PY{n}{nxpdParams}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{show}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{=}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{ipynb}\PY{l+s+s2}{\PYZdq{}}
\PY{k+kn}{from} \PY{n+nn}{strpython}\PY{n+nn}{.}\PY{n+nn}{helpers}\PY{n+nn}{.}\PY{n+nn}{gazeteer\PYZus{}helpers} \PY{k}{import} \PY{n}{get\PYZus{}data}
\PY{k}{def} \PY{n+nf}{class\PYZus{}graph}\PY{p}{(}\PY{n}{g}\PY{p}{)}\PY{p}{:}
\PY{n}{mapping}\PY{o}{=}\PY{p}{\PYZob{}}\PY{p}{\PYZcb{}}
\PY{n}{g2}\PY{o}{=}\PY{n}{g}\PY{o}{.}\PY{n}{copy}\PY{p}{(}\PY{p}{)}
\PY{k}{for} \PY{n}{n} \PY{o+ow}{in} \PY{n}{g2}\PY{p}{:}
\PY{n}{c}\PY{o}{=}\PY{n}{get\PYZus{}data}\PY{p}{(}\PY{n}{n}\PY{p}{)}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{class}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}
\PY{n}{g2}\PY{o}{.}\PY{n}{nodes}\PY{p}{[}\PY{n}{n}\PY{p}{]}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{label}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{=}\PY{n}{most\PYZus{}common}\PY{p}{(}\PY{n}{c}\PY{p}{)}
\PY{k}{return} \PY{n}{g2}
\end{Verbatim}
Pour faire une comparaison entre les STRs générées dans chaque classe de
document, on utilise plusieurs indicateurs :
\begin{itemize}
\tightlist
\item
\textbf{Granularité} La granularité est définie le niveau dans
l'échelle spatiale (\(village\) \textless{} \(ville < pays\)) d'une
entité. Ici, elle nous indique à quel niveau la spatialité est utilisé
pour décrire la situation.
\item
\textbf{Densité} La densité est définie par le nombre d'arrête pour un
noeud dans un graphe. Un graphe d'une STR avec une forte densité,
indique une forte cohésion entre les entités spatiales.
\item
\textbf{Ratio \(Relation_i/Relation_j\)} Dans la STR, chaque entité
peut-être relié à une autre par deux type de relations : inclusion et
adjacence. Avec ce ratio, on souhaite savoir combien il existe de
\(relation_j\) pour une \(relation_i\). Par exemple, pour une relation
d'inclusion, combien de relations d'adjacence ?
\item
\textbf{Nombre de noeuds(entités spatiales)} Indique si des textes
sont fortement spatialisés.
\end{itemize}
\subsubsection{Calcul de la granularité d'une
STR}\label{calcul-de-la-granularituxe9-dune-str}
On récupére les \textbf{classes associées} aux différentes
\textbf{entités de la STR}, puis on récupére \textbf{la classe la plus
fréquente}. Par exemple:
\(STR_1\) -\/-\textgreater{} France, Montpellier, Clapiers, Caen
-\/-\textgreater{} {[}A-PCLI{]}, {[}P-PPL, A-ADM4{]}, {[}A-ADM4{]},
{[}A-ADM4{]}
On a donc pour granularité : \textbf{A-ADM4}
\subsubsection{Calcul de la densité d'une
STR}\label{calcul-de-la-densituxe9-dune-str}
Le calcul de la densité d'une STR (ici son graphe) se calcule à l'aide
de la formule suivante : \[\frac{2\times|E|}{|V|\times(|V|-1)}\]
\begin{Verbatim}[commandchars=\\\{\}]
{\color{incolor}In [{\color{incolor}7}]:} \PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{GRAPH}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{=}\PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{ID\PYZus{}TEXT}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{.}\PY{n}{apply}\PY{p}{(}\PY{k}{lambda} \PY{n}{x}\PY{p}{:}\PY{n}{nx}\PY{o}{.}\PY{n}{read\PYZus{}gexf}\PY{p}{(}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{str\PYZus{}PADI100/}\PY{l+s+si}{\PYZob{}0\PYZcb{}}\PY{l+s+s2}{.gexf}\PY{l+s+s2}{\PYZdq{}}\PY{o}{.}\PY{n}{format}\PY{p}{(}\PY{n}{x}\PY{p}{)}\PY{p}{)}\PY{p}{)}
\PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{GRAPH\PYZus{}C}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{=}\PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{GRAPH}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{.}\PY{n}{apply}\PY{p}{(}\PY{k}{lambda} \PY{n}{x}\PY{p}{:}\PY{n}{class\PYZus{}graph}\PY{p}{(}\PY{n}{x}\PY{p}{)}\PY{p}{)}
\end{Verbatim}
\begin{Verbatim}[commandchars=\\\{\}]
{\color{incolor}In [{\color{incolor}8}]:} \PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{DENSITY}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{=}\PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{GRAPH}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{.}\PY{n}{apply}\PY{p}{(}\PY{k}{lambda} \PY{n}{x}\PY{p}{:} \PY{p}{(}\PY{l+m+mi}{2}\PY{o}{*}\PY{n}{x}\PY{o}{.}\PY{n}{number\PYZus{}of\PYZus{}edges}\PY{p}{(}\PY{p}{)}\PY{p}{)}\PY{o}{/}\PY{p}{(}\PY{n}{x}\PY{o}{.}\PY{n}{number\PYZus{}of\PYZus{}nodes}\PY{p}{(}\PY{p}{)}\PY{o}{*}\PY{p}{(}\PY{n}{x}\PY{o}{.}\PY{n}{number\PYZus{}of\PYZus{}nodes}\PY{p}{(}\PY{p}{)}\PY{o}{\PYZhy{}}\PY{l+m+mi}{1}\PY{p}{)}\PY{p}{)} \PY{k}{if} \PY{n+nb}{len}\PY{p}{(}\PY{n}{x}\PY{p}{)} \PY{o}{\PYZgt{}}\PY{l+m+mi}{1} \PY{k}{else} \PY{l+m+mi}{0}\PY{p}{)}
\PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{NB\PYZus{}NODE}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{=}\PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{GRAPH}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{.}\PY{n}{apply}\PY{p}{(}\PY{k}{lambda} \PY{n}{x}\PY{p}{:} \PY{n+nb}{len}\PY{p}{(}\PY{n}{x}\PY{p}{)}\PY{p}{)}
\PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{NB\PYZus{}ED\PYZus{}ADJ}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{=}\PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{GRAPH}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{.}\PY{n}{apply}\PY{p}{(}\PY{k}{lambda} \PY{n}{x}\PY{p}{:} \PY{n}{number\PYZus{}of\PYZus{}edges}\PY{p}{(}\PY{n}{x}\PY{p}{,}\PY{n}{color}\PY{o}{=}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{green}\PY{l+s+s2}{\PYZdq{}}\PY{p}{)}\PY{p}{)}
\PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{NB\PYZus{}ED\PYZus{}INC}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{=}\PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{GRAPH}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{.}\PY{n}{apply}\PY{p}{(}\PY{k}{lambda} \PY{n}{x}\PY{p}{:} \PY{n}{number\PYZus{}of\PYZus{}edges}\PY{p}{(}\PY{n}{x}\PY{p}{,}\PY{n}{color}\PY{o}{=}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{red}\PY{l+s+s2}{\PYZdq{}}\PY{p}{)}\PY{p}{)}
\PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{R\PYZus{}ADJ\PYZus{}INC}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{=}\PY{p}{(}\PY{p}{(}\PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{NB\PYZus{}ED\PYZus{}ADJ}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{/}\PY{l+m+mi}{2}\PY{p}{)}\PY{o}{/}\PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{NB\PYZus{}ED\PYZus{}INC}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{p}{)}\PY{o}{.}\PY{n}{replace}\PY{p}{(}\PY{p}{[}\PY{n}{np}\PY{o}{.}\PY{n}{inf}\PY{p}{,} \PY{o}{\PYZhy{}}\PY{n}{np}\PY{o}{.}\PY{n}{inf}\PY{p}{]}\PY{p}{,} \PY{n}{np}\PY{o}{.}\PY{n}{nan}\PY{p}{)}\PY{o}{.}\PY{n}{fillna}\PY{p}{(}\PY{l+m+mi}{0}\PY{p}{)}
\PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{R\PYZus{}INC\PYZus{}ADJ}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{=}\PY{p}{(}\PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{NB\PYZus{}ED\PYZus{}INC}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{/}\PY{p}{(}\PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{NB\PYZus{}ED\PYZus{}ADJ}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{/}\PY{l+m+mi}{2}\PY{p}{)}\PY{p}{)}\PY{o}{.}\PY{n}{replace}\PY{p}{(}\PY{p}{[}\PY{n}{np}\PY{o}{.}\PY{n}{inf}\PY{p}{,} \PY{o}{\PYZhy{}}\PY{n}{np}\PY{o}{.}\PY{n}{inf}\PY{p}{]}\PY{p}{,} \PY{n}{np}\PY{o}{.}\PY{n}{nan}\PY{p}{)}\PY{o}{.}\PY{n}{fillna}\PY{p}{(}\PY{l+m+mi}{0}\PY{p}{)}
\end{Verbatim}
\begin{Verbatim}[commandchars=\\\{\}]
{\color{incolor}In [{\color{incolor}9}]:} \PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{CLASS}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{=}\PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{GRAPH}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{.}\PY{n}{apply}\PY{p}{(}\PY{k}{lambda} \PY{n}{x}\PY{p}{:} \PY{n}{flattern}\PY{p}{(}\PY{p}{[}\PY{n}{get\PYZus{}data}\PY{p}{(}\PY{n}{n}\PY{p}{)}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{class}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]} \PY{k}{for} \PY{n}{n} \PY{o+ow}{in} \PY{n+nb}{list}\PY{p}{(}\PY{n}{x}\PY{o}{.}\PY{n}{nodes}\PY{p}{(}\PY{p}{)}\PY{p}{)}\PY{p}{]}\PY{p}{)}\PY{p}{)}
\PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{MEAN\PYZus{}LVL}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{=}\PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{CLASS}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{.}\PY{n}{apply}\PY{p}{(}\PY{k}{lambda} \PY{n}{x}\PY{p}{:} \PY{n}{most\PYZus{}common}\PY{p}{(}\PY{n}{x}\PY{p}{)} \PY{k}{if} \PY{n+nb}{len}\PY{p}{(}\PY{n}{x}\PY{p}{)}\PY{o}{\PYZgt{}}\PY{l+m+mi}{0} \PY{k}{else} \PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{\PYZdq{}}\PY{p}{)}
\end{Verbatim}
\begin{Verbatim}[commandchars=\\\{\}]
{\color{incolor}In [{\color{incolor}10}]:} \PY{n}{data\PYZus{}bilan}\PY{o}{.}\PY{n}{head}\PY{p}{(}\PY{l+m+mi}{9}\PY{p}{)}
\end{Verbatim}
\begin{Verbatim}[commandchars=\\\{\}]
{\color{outcolor}Out[{\color{outcolor}10}]:} ID\_TEXT IS\_BILAN MIXED \textbackslash{}
0 0 BILAN 0
1 1 EPIDEMIE 0
2 2 BILAN 1
3 3 BILAN 0
4 4 EPIDEMIE 0
5 5 EPIDEMIE 1
6 6 EPIDEMIE 0
7 7 BILAN 0
8 8 BILAN 0
GRAPH \textbackslash{}
0 (GD4103071, GD4468122, GD95073, GD791183)
1 (GD1685421)
2 (GD2032795)
3 (GD1626932, GD3274230)
4 (GD639917, GD3789919, GD1316637, GD2055944)
5 (GD639917, GD3995806, GD3789919, GD1316637, GD{\ldots}
6 (GD639917, GD3789919, GD2055944)
7 (GD5526704, GD976842, GD1316637, GD2055944)
8 (GD2908705, GD1404948, GD9642903, GD3995806, G{\ldots}
GRAPH\_C DENSITY NB\_NODE \textbackslash{}
0 (GD4103071, GD4468122, GD95073, GD791183) 0.000000 4
1 (GD1685421) 0.000000 1
2 (GD2032795) 0.000000 1
3 (GD1626932, GD3274230) 0.000000 2
4 (GD639917, GD3789919, GD1316637, GD2055944) 0.166667 4
5 (GD639917, GD3995806, GD3789919, GD1316637, GD{\ldots} 0.200000 5
6 (GD639917, GD3789919, GD2055944) 0.000000 3
7 (GD5526704, GD976842, GD1316637, GD2055944) 0.333333 4
8 (GD2908705, GD1404948, GD9642903, GD3995806, G{\ldots} 0.285714 7
NB\_ED\_ADJ NB\_ED\_INC R\_ADJ\_INC R\_INC\_ADJ \textbackslash{}
0 0 0 0.0 0.0
1 0 0 0.0 0.0
2 0 0 0.0 0.0
3 0 0 0.0 0.0
4 0 1 0.0 0.0
5 0 2 0.0 0.0
6 0 0 0.0 0.0
7 2 0 0.0 0.0
8 4 2 1.0 1.0
CLASS MEAN\_LVL
0 [P-PPLA, P-PPL, P-PPLA, A-ADM1, P-PPLA] P-PPLA
1 [P-PPL] P-PPL
2 [A-PCLI] A-PCLI
3 [A-PCLI, P-PPL] P-PPL
4 [A-PCLI, A-ADM1, P-PPLA, P-PPLC, A-PCLI, A-PCLI] A-PCLI
5 [A-PCLI, P-PPL, A-ADM1, P-PPLA, P-PPLC, A-PCLI{\ldots} A-PCLI
6 [A-PCLI, A-ADM1, P-PPLA, P-PPLC, A-PCLI] A-PCLI
7 [A-PCLI, A-PCLI, A-PCLI, A-PCLI] A-PCLI
8 [A-ADM1, P-PPL, P-PPL, P-PPL, A-ADM1, A-ADM1, {\ldots} A-ADM1
\end{Verbatim}
\section{Résultats}\label{ruxe9sultats}
\subsubsection{\texorpdfstring{~Granularité sur les documents de classe
\textbf{BILAN}}{~Granularité sur les documents de classe BILAN}}\label{granularituxe9-sur-les-documents-de-classe-bilan}
\begin{Verbatim}[commandchars=\\\{\}]
{\color{incolor}In [{\color{incolor}11}]:} \PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{IS\PYZus{}BILAN}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]} \PY{o}{==} \PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{BILAN}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{.}\PY{n}{groupby}\PY{p}{(}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{MEAN\PYZus{}LVL}\PY{l+s+s2}{\PYZdq{}}\PY{p}{)}\PY{o}{.}\PY{n}{count}\PY{p}{(}\PY{p}{)}\PY{o}{.}\PY{n}{plot}\PY{o}{.}\PY{n}{pie}\PY{p}{(}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{ID\PYZus{}TEXT}\PY{l+s+s2}{\PYZdq{}}\PY{p}{)}
\end{Verbatim}
\begin{Verbatim}[commandchars=\\\{\}]
{\color{outcolor}Out[{\color{outcolor}11}]:} <matplotlib.axes.\_subplots.AxesSubplot at 0x10a1db828>
\end{Verbatim}
\begin{center}
\adjustimage{max size={0.9\linewidth}{0.9\paperheight}}{output_13_1.png}
\end{center}
{ \hspace*{\fill} \\}
\subsubsection{\texorpdfstring{Granularité sur les documents de classe
\textbf{EPIDEMIE}}{Granularité sur les documents de classe EPIDEMIE}}\label{granularituxe9-sur-les-documents-de-classe-epidemie}
\begin{Verbatim}[commandchars=\\\{\}]
{\color{incolor}In [{\color{incolor}12}]:} \PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{IS\PYZus{}BILAN}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]} \PY{o}{==} \PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{EPIDEMIE}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{.}\PY{n}{groupby}\PY{p}{(}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{MEAN\PYZus{}LVL}\PY{l+s+s2}{\PYZdq{}}\PY{p}{)}\PY{o}{.}\PY{n}{count}\PY{p}{(}\PY{p}{)}\PY{o}{.}\PY{n}{plot}\PY{o}{.}\PY{n}{pie}\PY{p}{(}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{ID\PYZus{}TEXT}\PY{l+s+s2}{\PYZdq{}}\PY{p}{)}
\end{Verbatim}
\begin{Verbatim}[commandchars=\\\{\}]
{\color{outcolor}Out[{\color{outcolor}12}]:} <matplotlib.axes.\_subplots.AxesSubplot at 0x10a2c9ba8>
\end{Verbatim}
\begin{center}
\adjustimage{max size={0.9\linewidth}{0.9\paperheight}}{output_15_1.png}
\end{center}
{ \hspace*{\fill} \\}
\subsubsection{Valeurs moyennes obtenues pour chaque
indicateur}\label{valeurs-moyennes-obtenues-pour-chaque-indicateur}
\begin{Verbatim}[commandchars=\\\{\}]
{\color{incolor}In [{\color{incolor}13}]:} \PY{n}{data\PYZus{}bilan}\PY{o}{.}\PY{n}{groupby}\PY{p}{(}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{IS\PYZus{}BILAN}\PY{l+s+s2}{\PYZdq{}}\PY{p}{)}\PY{o}{.}\PY{n}{mean}\PY{p}{(}\PY{p}{)}
\end{Verbatim}
\begin{Verbatim}[commandchars=\\\{\}]
{\color{outcolor}Out[{\color{outcolor}13}]:} ID\_TEXT MIXED DENSITY NB\_NODE NB\_ED\_ADJ NB\_ED\_INC \textbackslash{}
IS\_BILAN
BILAN 51.588235 0.014706 0.293545 6.455882 4.117647 2.705882
EPIDEMIE 46.727273 0.030303 0.379501 4.636364 2.969697 1.303030
R\_ADJ\_INC R\_INC\_ADJ
IS\_BILAN
BILAN 0.559194 1.049048
EPIDEMIE 0.398990 0.240657
\end{Verbatim}
\subsection{Analyse des résultats}\label{analyse-des-ruxe9sultats}
\subsubsection{Granularité}\label{granularituxe9}
En regardant les deux camemberts ci-dessus, on remarque que la
granularité observé dans les STRs différe selon le type de texte. Les
textes de classe \textbf{épidémie} sont généralement plus "haut" dans la
hiérarchie spatiale, de part la forte présence de classe telles que:
A-PCLI (\(\approx\)Pays), A-ADM1(premier découpage administratif d'un
pays \emph{equiv} région en France, état aux Etats-Unis, \emph{etc.}).
Ceux de la classe \textbf{BILAN}, ont une granularité un peu plus fine
avec un spectre de classe plus étendue : T-ISL (ile), S-BLDG (batiment).
En se basant sur la classification proposé, on conclue que les documents
de type \textbf{bilan} sont plus "fin" spatialement que ceux de la
classe \textbf{épidémie}.
\subsubsection{Densité/ Nombre de noeuds/ Nombre
d'arrêtes}\label{densituxe9-nombre-de-noeuds-nombre-darruxeates}
Malheuresement la densité moyenne ne permet de faire aucune conclusion.
On observe que le nombre de noeuds dans les documents de classes Bilan
est plus élevé. Ce qui indique que le nombre d'entités spatiales dans
ces documents est plus élevés. Ce qui semble tout à fait normal car
contrairement à une déclaration d'épidémie, le bilan fait un
récapitulatif de la propagation d'une maladie sur un laps de temps et
une spatialité (souvent) plus importante.
Pour le nombre de relations d'ajacence et d'inclusion, on observe un
même rapport de "force" : Il y a plus d'arêtes d'inclusion que d'arêtes
d'adjacence.
\subsubsection{Ratio Adjacence/Inclusion VS
Inclusion/Adjacence}\label{ratio-adjacenceinclusion-vs-inclusionadjacence}
\begin{longtable}[]{@{}lll@{}}
\toprule
CLASSE & ADJ/INC & INC/ADJ\tabularnewline
\midrule
\endhead
BILAN & 0.559194 & 1.04905\tabularnewline
EPIDEMIE & 0.39899 & 0.240657\tabularnewline
\bottomrule
\end{longtable}
On reprend les résultats concernat les rapports ADJ/INC (combien de
relations d'inclusion pour une relation d'adjacence ?) et INC/ADJ (le
contraire de ADJ/INC). A partir de ces résultats, on observe que les
rapports sont inversés ! Pour les documents de classe EPIDEMIE, on va
favoriser plus les relations d'inclusion, contrairement aux documents de
classe BILAN qui favorisent les relations d'adjacences.
Est-ce que parce que les relations d'inclusions sont favorisés (ratio
ADJ/INC élevé), on se retrouve sur des zones limitées, donc plus local ?
Ca rentre bien dans le cadre de la classe épidémie.
Est-ce qu'un ratio élevé INC/ADJ traduit une information concernant la
dispertion d'une maladie ?
\begin{Verbatim}[commandchars=\\\{\}]
{\color{incolor}In [{\color{incolor}29}]:} \PY{k+kn}{from} \PY{n+nn}{ipywidgets} \PY{k}{import} \PY{n}{interact}\PY{p}{,} \PY{n}{interactive}\PY{p}{,} \PY{n}{fixed}\PY{p}{,} \PY{n}{interact\PYZus{}manual}
\PY{k+kn}{import} \PY{n+nn}{ipywidgets} \PY{k}{as} \PY{n+nn}{widgets}
\end{Verbatim}
\begin{Verbatim}[commandchars=\\\{\}]
{\color{incolor}In [{\color{incolor}15}]:} \PY{k+kn}{from} \PY{n+nn}{nxpd} \PY{k}{import} \PY{n}{draw}
\PY{k}{def} \PY{n+nf}{f}\PY{p}{(}\PY{n}{x}\PY{p}{)}\PY{p}{:}
\PY{k}{global} \PY{n}{data\PYZus{}bilan}
\PY{k}{return} \PY{n}{draw}\PY{p}{(}\PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{n}{data\PYZus{}bilan}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{IS\PYZus{}BILAN}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{==}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{BILAN}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{o}{.}\PY{n}{iloc}\PY{p}{[}\PY{n}{x}\PY{p}{]}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{GRAPH\PYZus{}C}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{p}{,}\PY{n}{show}\PY{o}{=}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{ipynb}\PY{l+s+s2}{\PYZdq{}}\PY{p}{)}
\PY{n}{interact}\PY{p}{(}\PY{n}{f}\PY{p}{,} \PY{n}{x}\PY{o}{=}\PY{n}{widgets}\PY{o}{.}\PY{n}{IntSlider}\PY{p}{(}\PY{n+nb}{min}\PY{o}{=}\PY{l+m+mi}{0}\PY{p}{,}\PY{n+nb}{max}\PY{o}{=}\PY{l+m+mi}{100}\PY{p}{,}\PY{n}{step}\PY{o}{=}\PY{l+m+mi}{1}\PY{p}{)}\PY{p}{)}\PY{p}{;}
\end{Verbatim}
\begin{verbatim}
interactive(children=(IntSlider(value=0, description='x'), Output()), _dom_classes=('widget-interact',))
\end{verbatim}
\begin{Verbatim}[commandchars=\\\{\}]
{\color{incolor}In [{\color{incolor}16}]:} \PY{n}{dd}\PY{o}{=}\PY{n}{data\PYZus{}bilan}\PY{o}{.}\PY{n}{groupby}\PY{p}{(}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{IS\PYZus{}BILAN}\PY{l+s+s2}{\PYZdq{}}\PY{p}{)}\PY{o}{.}\PY{n}{mean}\PY{p}{(}\PY{p}{)}
\end{Verbatim}
\begin{Verbatim}[commandchars=\\\{\}]
{\color{incolor}In [{\color{incolor}28}]:} \PY{k+kn}{from} \PY{n+nn}{tabulate} \PY{k}{import} \PY{n}{tabulate}
\PY{n+nb}{print}\PY{p}{(}\PY{n}{tabulate}\PY{p}{(}\PY{n}{dd}\PY{p}{[}\PY{p}{[}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{R\PYZus{}ADJ\PYZus{}INC}\PY{l+s+s2}{\PYZdq{}}\PY{p}{,} \PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{R\PYZus{}INC\PYZus{}ADJ}\PY{l+s+s2}{\PYZdq{}}\PY{p}{]}\PY{p}{]}\PY{p}{,}\PY{n}{tablefmt}\PY{o}{=}\PY{l+s+s2}{\PYZdq{}}\PY{l+s+s2}{pipe}\PY{l+s+s2}{\PYZdq{}}\PY{p}{)}\PY{p}{)}
\end{Verbatim}
\begin{Verbatim}[commandchars=\\\{\}]
|:---------|---------:|---------:|
| BILAN | 0.559194 | 1.04905 |
| EPIDEMIE | 0.39899 | 0.240657 |
\end{Verbatim}
% Add a bibliography block to the postdoc
\end{document}
output_13_1.png

20.9 KB

output_15_1.png

15 KB

output_4_1.png 0 → 100644
output_4_1.png

8.24 KB

<gexf version="1.1" xmlns="http://www.gexf.net/1.1draft" xmlns:viz="http://www.gexf.net/1.1draft/viz" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/XMLSchema-instance">
<graph defaultedgetype="directed" mode="static">
<nodes>
<node id="GD4103071" label="Kano" />
<node id="GD4468122" label="Lagos" />
<node id="GD95073" label="Plateau State" />
<node id="GD791183" label="Port Harcourt" />
</nodes>
<edges />
</graph>
</gexf>
<gexf version="1.1" xmlns="http://www.gexf.net/1.1draft" xmlns:viz="http://www.gexf.net/1.1draft/viz" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/XMLSchema-instance">
<graph defaultedgetype="directed" mode="static">
<nodes>
<node id="GD1685421" label="Abeokuta" />
</nodes>
<edges />
</graph>
</gexf>
<gexf version="1.1" xmlns="http://www.gexf.net/1.1draft" xmlns:viz="http://www.gexf.net/1.1draft/viz" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/XMLSchema-instance">
<graph defaultedgetype="directed" mode="static">
<attributes class="edge" mode="static">
<attribute id="0" title="color" type="string" />
<attribute id="1" title="networkx_key" type="long" />
</attributes>
<nodes>
<node id="GD5400765" label="Paris" />
<node id="GD3789919" label="Seoul" />
<node id="GD1316637" label="South Korea" />
<node id="GD3949715" label="Taiwan" />
</nodes>
<edges>
<edge id="0" source="GD3789919" target="GD1316637">
<attvalues>
<attvalue for="0" value="red" />
<attvalue for="1" value="0" />
</attvalues>
</edge>
</edges>
</graph>
</gexf>
<gexf version="1.1" xmlns="http://www.gexf.net/1.1draft" xmlns:viz="http://www.gexf.net/1.1draft/viz" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/XMLSchema-instance">
<graph defaultedgetype="directed" mode="static">
<attributes class="edge" mode="static">
<attribute id="0" title="color" type="string" />
<attribute id="1" title="networkx_key" type="long" />
</attributes>
<nodes>
<node id="GD2880488" label="Angola" />
<node id="GD1335306" label="Namibia" />
</nodes>
<edges>
<edge id="0" source="GD2880488" target="GD1335306">
<attvalues>
<attvalue for="0" value="green" />
<attvalue for="1" value="0" />
</attvalues>
</edge>
<edge id="1" source="GD1335306" target="GD2880488">
<attvalues>
<attvalue for="0" value="green" />
<attvalue for="1" value="0" />
</attvalues>
</edge>
</edges>
</graph>
</gexf>
<gexf version="1.1" xmlns="http://www.gexf.net/1.1draft" xmlns:viz="http://www.gexf.net/1.1draft/viz" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/XMLSchema-instance">
<graph defaultedgetype="directed" mode="static">
<nodes>
<node id="GD1316637" label="South Korea" />
</nodes>
<edges />
</graph>
</gexf>
<gexf version="1.1" xmlns="http://www.gexf.net/1.1draft" xmlns:viz="http://www.gexf.net/1.1draft/viz" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/XMLSchema-instance">
<graph defaultedgetype="directed" mode="static">
<attributes class="edge" mode="static">
<attribute id="0" title="color" type="string" />
<attribute id="1" title="networkx_key" type="long" />
</attributes>
<nodes>
<node id="GD1955440" label="Alberta" />
<node id="GD5063243" label="Australia" />
<node id="GD3364804" label="Canada" />
<node id="GD3995806" label="Korea" />
<node id="GD934193" label="New Zealand" />
<node id="GD2055944" label="U.S." />
</nodes>
<edges>
<edge id="0" source="GD1955440" target="GD2055944">
<attvalues>
<attvalue for="0" value="green" />
<attvalue for="1" value="0" />
</attvalues>
</edge>
<edge id="1" source="GD1955440" target="GD3364804">
<attvalues>
<attvalue for="0" value="red" />
<attvalue for="1" value="0" />
</attvalues>
</edge>
<edge id="2" source="GD5063243" target="GD934193">
<attvalues>
<attvalue for="0" value="green" />
<attvalue for="1" value="0" />
</attvalues>
</edge>
<edge id="3" source="GD3364804" target="GD2055944">
<attvalues>
<attvalue for="0" value="green" />
<attvalue for="1" value="0" />
</attvalues>
</edge>
<edge id="4" source="GD3995806" target="GD2055944">
<attvalues>
<attvalue for="0" value="red" />
<attvalue for="1" value="0" />
</attvalues>
</edge>
<edge id="5" source="GD934193" target="GD5063243">
<attvalues>
<attvalue for="0" value="green" />
<attvalue for="1" value="0" />
</attvalues>
</edge>
<edge id="6" source="GD2055944" target="GD1955440">
<attvalues>
<attvalue for="0" value="green" />
<attvalue for="1" value="0" />
</attvalues>
</edge>
<edge id="7" source="GD2055944" target="GD3364804">
<attvalues>
<attvalue for="0" value="green" />
<attvalue for="1" value="0" />
</attvalues>
</edge>
</edges>
</graph>
</gexf>
<gexf version="1.1" xmlns="http://www.gexf.net/1.1draft" xmlns:viz="http://www.gexf.net/1.1draft/viz" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/XMLSchema-instance">
<graph defaultedgetype="directed" mode="static">
<attributes class="edge" mode="static">
<attribute id="0" title="color" type="string" />
<attribute id="1" title="networkx_key" type="long" />
</attributes>
<nodes>
<node id="GD5573894" label="Bauchi" />
<node id="GD509348" label="Bauchi State" />
<node id="GD4243745" label="Katagum" />
<node id="GD1607251" label="Nigeria" />
<node id="GD2878541" label="Toro" />
</nodes>
<edges>
<edge id="0" source="GD5573894" target="GD509348">
<attvalues>
<attvalue for="0" value="red" />
<attvalue for="1" value="0" />
</attvalues>
</edge>
<edge id="1" source="GD5573894" target="GD1607251">
<attvalues>
<attvalue for="0" value="red" />
<attvalue for="1" value="0" />
</attvalues>
</edge>
<edge id="2" source="GD509348" target="GD1607251">
<attvalues>
<attvalue for="0" value="red" />
<attvalue for="1" value="0" />
</attvalues>
</edge>
<edge id="3" source="GD4243745" target="GD509348">
<attvalues>
<attvalue for="0" value="red" />
<attvalue for="1" value="0" />
</attvalues>
</edge>
<edge id="4" source="GD4243745" target="GD1607251">
<attvalues>
<attvalue for="0" value="red" />
<attvalue for="1" value="0" />
</attvalues>
</edge>
</edges>
</graph>
</gexf>
<gexf version="1.1" xmlns="http://www.gexf.net/1.1draft" xmlns:viz="http://www.gexf.net/1.1draft/viz" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/XMLSchema-instance">
<graph defaultedgetype="directed" mode="static">
<attributes class="edge" mode="static">
<attribute id="0" title="color" type="string" />
<attribute id="1" title="networkx_key" type="long" />
</attributes>
<nodes>
<node id="GD639917" label="Germany" />
<node id="GD3649667" label="North Chungcheong Province" />
<node id="GD3789919" label="Seoul" />
<node id="GD753475" label="South Chungcheong Province" />
<node id="GD1316637" label="South Korea" />
<node id="GD2055944" label="United States" />
</nodes>
<edges>
<edge id="0" source="GD3649667" target="GD753475">
<attvalues>
<attvalue for="0" value="green" />
<attvalue for="1" value="0" />
</attvalues>
</edge>
<edge id="1" source="GD3649667" target="GD1316637">
<attvalues>
<attvalue for="0" value="red" />
<attvalue for="1" value="0" />
</attvalues>
</edge>
<edge id="2" source="GD3789919" target="GD1316637">
<attvalues>
<attvalue for="0" value="red" />
<attvalue for="1" value="0" />
</attvalues>
</edge>
<edge id="3" source="GD753475" target="GD3649667">
<attvalues>
<attvalue for="0" value="green" />
<attvalue for="1" value="0" />
</attvalues>
</edge>
<edge id="4" source="GD753475" target="GD1316637">
<attvalues>
<attvalue for="0" value="red" />
<attvalue for="1" value="0" />
</attvalues>
</edge>
</edges>
</graph>
</gexf>
<gexf version="1.1" xmlns="http://www.gexf.net/1.1draft" xmlns:viz="http://www.gexf.net/1.1draft/viz" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/XMLSchema-instance">
<graph defaultedgetype="directed" mode="static">
<nodes>
<node id="GD1184772" label="China" />
</nodes>
<edges />
</graph>
</gexf>
<gexf version="1.1" xmlns="http://www.gexf.net/1.1draft" xmlns:viz="http://www.gexf.net/1.1draft/viz" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/XMLSchema-instance">
<graph defaultedgetype="directed" mode="static">
<nodes />
<edges />
</graph>
</gexf>
<gexf version="1.1" xmlns="http://www.gexf.net/1.1draft" xmlns:viz="http://www.gexf.net/1.1draft/viz" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/XMLSchema-instance">
<graph defaultedgetype="directed" mode="static">
<nodes>
<node id="GD5622144" label="Asia" />
<node id="GD1184772" label="China" />
</nodes>
<edges />
</graph>
</gexf>
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment