README.md 6.09 KB
Newer Older
Fize Jacques's avatar
Fize Jacques committed
1
![](logo2.png)
2

3

Fize Jacques's avatar
Fize Jacques committed
4
5
[![Build Status](https://travis-ci.com/Jacobe2169/GMatch4py.svg?branch=master)](https://travis-ci.com/Jacobe2169/GMatch4py)
# GMatch4py a graph matching library for Python 
6

Fize Jacques's avatar
Fize Jacques committed
7

Fize Jacques's avatar
Fize Jacques committed
8
9
GMatch4py is a library dedicated to graph matching. Graph structure are stored in NetworkX graph objects.
GMatch4py algorithms were implemented with Cython to enhance performance.
Fize Jacques's avatar
Fize Jacques committed
10
11

## Requirements
Fize Jacques's avatar
Fize Jacques committed
12
13
14

 * Python 3
 * Numpy and Cython installed (if not : `(sudo) pip(3) install numpy cython`)
Fize Jacques's avatar
Fize Jacques committed
15
16
17
 
## Installation

Fize Jacques's avatar
Fize Jacques committed
18
To install `GMatch4py`, run the following commands:
Fize Jacques's avatar
Fize Jacques committed
19

20
21
22
```bash
git clone https://github.com/Jacobe2169/GMatch4py.git
cd GMatch4py
23
(sudo) pip(3) install .
Fize Jacques's avatar
Fize Jacques committed
24
25
26
```

## Get Started
27
### Graph input format
Fize Jacques's avatar
Fize Jacques committed
28

29
In `GMatch4py`, algorithms manipulate `networkx.Graph`, a complete graph model that 
30
31
comes with a large spectrum of parser to load your graph from various inputs : `*.graphml,*.gexf,..` (check [here](https://networkx.github.io/documentation/stable/reference/readwrite/index.html) to see all the format accepted)

Fize Jacques's avatar
Fize Jacques committed
32
### Use GMatch4py
33
34
If you want to use algorithms like *graph edit distances*, here is an example:

35
```python
36
37
38
# Gmatch4py use networkx graph 
import networkx as nx 
# import the GED using the munkres algorithm
Fize Jacques's avatar
Fize Jacques committed
39
import gmatch4py as gm
40
41
42
```

In this example, we use generated graphs using `networkx` helpers:
43
```python
44
45
46
47
g1=nx.complete_bipartite_graph(5,4) 
g2=nx.complete_bipartite_graph(6,4)
```

Fize Jacques's avatar
Fize Jacques committed
48
All graph matching algorithms in `Gmatch4py` work this way:
49
50
51
 * Each algorithm is associated with an object, each object having its specific parameters. In this case, the parameters are the edit costs (delete a vertex, add a vertex, ...)
 * Each object is associated with a `compare()` function with two parameters. First parameter is **a list of the graphs** you want to **compare**, i.e. measure the distance/similarity (depends on the algorithm). Then, you can specify a sample of graphs to be compared to all the other graphs. To this end, the second parameter should be **a list containing the indices** of these graphs (based on the first parameter list). If you rather compute the distance/similarity **between all graphs**, just use the `None` value.

52
```python
Fize Jacques's avatar
Fize Jacques committed
53
ged=gm.GraphEditDistance(1,1,1,1) # all edit costs are equal to 1
54
55
56
57
58
result=ged.compare([g1,g2],None) 
print(result)
```

The output is a similarity/distance matrix :
59
```python
60
61
array([[0., 14.],
       [10., 0.]])
62
63
64
```
This output result is "raw", if you wish to have normalized results in terms of distance (or similarity) you can use :

65
```python
66
67
68
69
ged.similarity(result)
# or 
ged.distance(result)
```
Fize Jacques's avatar
Fize Jacques committed
70

Fize Jacques's avatar
Fize Jacques committed
71
## Exploit nodes and edges attributes
Fize Jacques's avatar
Fize Jacques committed
72

Fize Jacques's avatar
Fize Jacques committed
73
74
75
76
77
78
79
80
In this latest version, we add the possibility to exploit graph attributes ! To do so, the `base.Base` is extended with the `set_attr_graph_used(node_attr,edge_attr)` method.

```python
import networkx as nx 
import gmatch4py as gm
ged = gm.GraphEditDistance(1,1,1,1)
ged.set_attr_graph_used("theme","color") # Edge colors and node themes attributes will be used.
```
Fize Jacques's avatar
Fize Jacques committed
81
82

## List of algorithms
Fize Jacques's avatar
Fize Jacques committed
83

Fize Jacques's avatar
Fize Jacques committed
84
85
 * Graph Embedding
    * Graph2Vec [1]
Fize Jacques's avatar
Fize Jacques committed
86
 * Node Embedding
Fize Jacques's avatar
Fize Jacques committed
87
    * DeepWalk [7]
88
    * Node2vec [8]
Fize Jacques's avatar
Fize Jacques committed
89
90
91
92
93
94
95
96
97
98
99
100
 * Graph kernels
    * Random Walk Kernel (*debug needed*) [3]
        * Geometrical 
        * K-Step 
    * Shortest Path Kernel [3]
    * Weisfeiler-Lehman Kernel [4]
        * Subtree Kernel 
 * Graph Edit Distance [5]
    * Approximated Graph Edit Distance 
    * Hausdorff Graph Edit Distance 
    * Bipartite Graph Edit Distance 
    * Greedy Edit Distance
Fize Jacques's avatar
Fize Jacques committed
101
102
103
104
 * Vertex Ranking [2]
 * Vertex Edge Overlap [2]
 * Bag of Nodes (a bag of words model using nodes as vocabulary)
 * Bag of Cliques (a bag of words model using cliques as vocabulary)
Fize Jacques's avatar
Fize Jacques committed
105
106
107
108
109
 * MCS [6]
    

## Publications associated

Fize Jacques's avatar
Fize Jacques committed
110
  * [1] Narayanan, Annamalai and Chandramohan, Mahinthan and Venkatesan, Rajasekar and Chen, Lihui and Liu, Yang. Graph2vec: Learning distributed representations of graphs. MLG 2017, 13th International Workshop on Mining and Learning with Graphs (MLGWorkshop 2017).
Fize Jacques's avatar
Fize Jacques committed
111
112
113
114
115
  * [2] Papadimitriou, P., Dasdan, A., & Garcia-Molina, H. (2010). Web graph similarity for anomaly detection. Journal of Internet Services and Applications, 1(1), 19-30.
  * [3] Vishwanathan, S. V. N., Schraudolph, N. N., Kondor, R., & Borgwardt, K. M. (2010). Graph kernels. Journal of Machine Learning Research, 11(Apr), 1201-1242.
  * [4] Shervashidze, N., Schweitzer, P., Leeuwen, E. J. V., Mehlhorn, K., & Borgwardt, K. M. (2011). Weisfeiler-lehman graph kernels. Journal of Machine Learning Research, 12(Sep), 2539-2561.
  * [5] Fischer, A., Riesen, K., & Bunke, H. (2017). Improved quadratic time approximation of graph edit distance by combining Hausdorff matching and greedy assignment. Pattern Recognition Letters, 87, 55-62.
  * [6] A graph distance metric based on the maximal common subgraph, H. Bunke and K. Shearer, Pattern Recognition Letters, 1998  
Fize Jacques's avatar
Fize Jacques committed
116
  * [7] Perozzi, B., Al-Rfou, R., & Skiena, S. (2014, August). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 701-710). ACM.
117
  * [8] node2vec: Scalable Feature Learning for Networks. Aditya Grover and Jure Leskovec. Knowledge Discovery and Data Mining, 2016.
Fize Jacques's avatar
Fize Jacques committed
118

Fize Jacques's avatar
Fize Jacques committed
119
120
121
## Author(s)

Jacques Fize, *jacques[dot]fize[at]cirad[dot]fr*
Fize Jacques's avatar
Fize Jacques committed
122

123
Some algorithms from other projects were integrated to Gmatch4py. **Be assured that
Fize Jacques's avatar
Fize Jacques committed
124
each code is associated with a reference to the original.**
Fize Jacques's avatar
Fize Jacques committed
125

126

127
## CHANGELOG
128
129
130
131
132
### 7.05.2019

 * Debug (problems with float edge weight)
 * Add the `AbstractEditDistance.edit_path(G,H)` method that return the edit path, the cost matrix and the selected cost index in the cost matrix
 * Add a tqdm progress bar for the `gmatch4py.helpers.reader.import_dir()` function
133

134
135
136
137
### 12.03.2019

 * Add Node2vec

Fize Jacques's avatar
Fize Jacques committed
138
139
140
141
142
143
144
145
### 05.03.2019

 * Add Graph Embedding algorithms
 * Remove depreciated methods and classes
 * Add logo
 * Update documentation


146
147
148
149
### 25.02.2019
 * Add New Graph Class. Features : Cython Extensions, precomputed values (degrees, neighbor info), hash representation of edges and nodes for a faster comparison
 * Some algorithms are parallelized such as graph edit distances or Jaccard

Fize Jacques's avatar
Fize Jacques committed
150
## TODO List
Fize Jacques's avatar
Fize Jacques committed
151

Fize Jacques's avatar
Fize Jacques committed
152
  * Debug algorithms --> Random Walk Kernel, Deltacon
153
  * Optimize algorithms --> Vertex Ranking
Fize Jacques's avatar
Fize Jacques committed
154