OpenCores
URL https://opencores.org/ocsvn/an-fpga-implementation-of-low-latency-noc-based-mpsoc/an-fpga-implementation-of-low-latency-noc-based-mpsoc/trunk

Subversion Repositories an-fpga-implementation-of-low-latency-noc-based-mpsoc

[/] [an-fpga-implementation-of-low-latency-noc-based-mpsoc/] [trunk/] [mpsoc/] [remove_cycle/] [README.md] - Blame information for rev 48

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 48 alirezamon
### Breaking Cycles in Noisy Hierarchies  - Description
2
 
3
> Taxonomy graphs that capture hyponymy or meronymy relationships through directed edges are expected to be acyclic. However, in practice, they may have thousands of cycles, as they are often created in a crowd-sourced way. Since these cycles represent logical fallacies, they need to be removed for many web applications. In this paper, we address the problem of breaking cycles while preserving the logical structure (hierarchy) of a directed graph as much as possible. Existing approaches for this problem either need manual intervention or use heuristics that can critically alter the taxonomy structure. In contrast, our approach infers graph hierarchy using a range of features, including a Bayesian skill rating system and a social agony metric. We also devise several strategies to leverage the inferred hierarchy for removing a small subset of edges to make the graph acyclic. Extensive experiments demonstrate the effectiveness of our approach.
4
 
5
>  **Keywords**:  _Directed Acyclic Graph_, _Graph Hierarchy_, _TrueSkill_, _Social Agony_, _Cycle Edges_
6
 
7
 
8
* Last update: 14, July, 2017
9
* Corresponding Paper: [Breaking Cycles in Noisy Hierarchies](https://github.com/zhenv5/breaking_cycles_in_noisy_hierarchies/tree/master/paper)
10
* [The 9th International ACM Web Science Conference 2017](http://dl.acm.org/citation.cfm?id=3091495), WebSci'17
11
* If you use this code, please consider to cite this paper: [BibTex Download](http://web.cse.ohio-state.edu/~sun.1306/Published_Works/Bib_WebSci_17_Breaking_Cycles_in_Noisy_Hierarchies.bib)
12
* [Author Homepage](http://web.cse.ohio-state.edu/~sun.1306)
13
 
14
 
15
**It's worth mentioning that this code has been used in "Project Ruprecht", developed by Wikimedia Foundation. The project aims to measure the tangledness of PHP code and provides clean codebase. Thanks for Daniel Kinzler (Principal Software Engineer, Wikimedia Foundation) for letting us know that our work can not only fix messy ontologies but also works well for cleaning up messy codebase. updated: 2019.05.10.**
16
 
17
#### 0. Requirements
18
 
19
* Python 2.7
20
* Lib: networkx 1.1/2.x
21
 
22
If you have ran errors like ```DiGraph has no attributes of nodes_iter() or edges_iter()```, please changes ```nodes_iter()``` to ```list(nodes())``` and ```edges_iter()``` to ```list(edges())```.
23
 
24
Please go to [networkx](https://networkx.github.io/) to see differences between different version of networkx.
25
 
26
#### 1. Generate Random Graphs (DAGs)
27
 
28
```
29
python generate_random_dag.py -n 300 -m 2500 -g data/gnm_300_2500.edges
30
```
31
 
32
* n: number of nodes
33
* m: number of edges
34
* g: file path used to save the generated random graph
35
 
36
#### 2. Introduce Cycles to DAG
37
 
38
```
39
python introduce_cycles_to_DAG.py -g data/gnm_300_2500.edges -k 300 -l 0
40
```
41
* -g:  target DAG edges list file
42
* -k: number of extra edges introduced to make DAG have cycles
43
* -l: threshold to control path length, if l <=0, no constraints on path length, else: path length < l (cycle length <= l)
44
 
45
output:
46
 
47
*  extra_edges_file (**Ground Truth Edges**): gnm_300_2500_extra_300_path_len_0.edges
48
*  graph_with_extra_edges_file : gnm_300_2500_graph_w_extra_300_path_len_0.edges
49
 
50
#### 3. Breaking Cycles by DFS
51
 
52
```
53
python remove_cycle_edges_by_dfs.py -g data/gnm_300_2500_graph_w_extra_300_path_len_0.edges
54
```
55
 
56
You can specificity a edges list of file as ground truth (edges should be removed to break cycles)  by using '-t'. It will report precision, recall and F-1 score.
57
 
58
```
59
python remove_cycle_edges_by_dfs.py -g data/gnm_300_2500_graph_w_extra_300_path_len_0.edges -t data/gnm_300_2500_extra_300_path_len_0.edges
60
```
61
 
62
Performance will be:
63
 
64
 
65
#### 4. Breaking Cycles by MFAS
66
 
67
Local greedy implementation of Minimum feedback arc set problem.
68
 
69
```
70
python remove_cycle_edges_by_minimum_feedback_arc_set_greedy.py -g data/gnm_300_2500_graph_w_extra_300_path_len_0.edges
71
```
72
 
73
You can specificity a edges list of file as ground truth (edges should be removed to break cycles)  by using '-t'. It will report precision, recall and F-1 score.
74
 
75
```
76
python remove_cycle_edges_by_minimum_feedback_arc_set_greedy.py -g data/gnm_300_2500_graph_w_extra_300_path_len_0.edges -t data/gnm_300_2500_extra_300_path_len_0.edges
77
```
78
 
79
#### 5. Breaking Cycles via Hierarchy, inferred by PageRank
80
 
81
Graph hierarchy is inferred from PageRank.
82
 
83
```
84
python remove_cycle_edges_by_hierarchy.py -s pagerank -g data/gnm_300_2500_graph_w_extra_300_path_len_0.edges
85
```
86
 
87
* -s pagerank: use pagerank to infer graph hierarchy
88
 
89
You can also specify the ground truth file path as ground_truth_edges_file by using '-t'
90
 
91
```
92
python remove_cycle_edges_by_hierarchy.py -s pagerank -g data/gnm_300_2500_graph_w_extra_300_path_len_0.edges -t data/gnm_300_2500_extra_300_path_len_0.edges
93
```
94
 
95
 
96
#### 6. Breaking Cycles via Hierarchy, inferred by TrueSkill
97
 
98
* External Library Requirement: TrueSkill [install](http://trueskill.org/)
99
 
100
```
101
python remove_cycle_edges_by_hierarchy.py -g data/gnm_300_2500_graph_w_extra_300_path_len_0.edges -s trueskill
102
```
103
 
104
* -s trueskill: use TrueSkill to infer graph hierarchy
105
 
106
You can also specify the ground truth file path as ground_truth_edges_file by using '-t'.
107
 
108
```
109
python remove_cycle_edges_by_hierarchy.py -g data/gnm_300_2500_graph_w_extra_300_path_len_0.edges -s trueskill -t data/gnm_300_2500_extra_300_path_len_0.edges
110
```
111
 
112
It will report performance of TS_G, TS_B, TS_F, and TS_Voting (ensembling of TS_G, TS_B and TS_F).
113
 
114
 
115
 
116
#### 7. Breaking Cycles via Hierarchy, inferred by SocialAgony
117
 
118
 
119
The code for computing Social Agony has been put in /agony. You have to compile it first. To compile it sucessfully, you may have to install some packages. Detailes can be viewed at ```/agony/README.md```.
120
 
121
Social Agony computation code is from [Tatti](http://users.ics.aalto.fi/ntatti/software.shtml)
122
 
123
After that run:
124
 
125
```
126
python remove_cycle_edges_by_hierarchy.py -g data/gnm_300_2500_graph_w_extra_300_path_len_0.edges -s socialagony
127
```
128
 
129
* -s socialagony: use SocialAgony to infer graph hierarchy
130
 
131
You can also specify the ground truth file path as ground_truth_edges_file by using '-t'.
132
 
133
```
134
python remove_cycle_edges_by_hierarchy.py -g data/gnm_300_2500_graph_w_extra_300_path_len_0.edges -s socialagony -t data/gnm_300_2500_extra_300_path_len_0.edges
135
```
136
 
137
It will report performance of SA_G, SA_B, SA_F, and SA_Voting (ensembling of SA_G, SA_B and SA_F).
138
 
139
 
140
#### 8. Breaking Cycles via Hierarchy, Ensembling Approach
141
 
142
Ensembling TS_G, TS_B, TS_F, SA_G, SA_B and SA_F.
143
 
144
```
145
python remove_cycle_edges_by_hierarchy.py -g data/gnm_300_2500_graph_w_extra_300_path_len_0.edges -s ensembling
146
```
147
 
148
* -s ensembling: ensembling all above 6 apprpaches
149
 
150
You can also specify the ground truth file path as ground_truth_edges_file by using '-t'.
151
 
152
 
153
```
154
python remove_cycle_edges_by_hierarchy.py -g data/gnm_300_2500_graph_w_extra_300_path_len_0.edges -s ensembling -t data/gnm_300_2500_extra_300_path_len_0.edges
155
```
156
 
157
* **This can also report performances of TS_G, TS_B, TS_F, SA_G, SA_B and SA_F individually**, these performance may have slightly difference with running them individually.
158
* It can report performance of H_Voting (ensembling of SA_G, SA_B, SA_F, TS_G, TS_B and TS_F)
159
 
160
#### 9. Test on Synthetic Graphs
161
 
162
Instead of testing above methods individually, you can simply run below code to compare performance of all methods on Synthetic Graphs (random generated graphs).
163
 
164
```
165
python synthetic_performance.py --dir data/ -n 300 -m 2500 -k 300 -l 0
166
```
167
 
168
* --dir: directory to save all results, 'data/' is in current directory
169
* -n: number of nodes in the generated random graph
170
* -m: number of edges in the generated random graph
171
* -k: number of extra edges to introduce cycles to the generated random graph
172
* -l: control parameter of path length
173
 
174
It will do:
175
 
176
* generate a random DAG with n nodes and m edges
177
* introduce k extra edges to this DAG to make it have cycles (l is used to control generated cycle size, if l == 0, it has no constraints on cycle size)
178
* run all methods to break cycles and report performance
179
 
180
 
181
#### 10. Test on Real Datasets
182
 
183
If you already have a graph with cycles, you can run:
184
 
185
```
186
python break_cycles.py -g data/gnm_300_2500_graph_w_extra_300_path_len_0.edges
187
```
188
 
189
* -g data/gnm_300_2500_graph_w_extra_300_path_len_0.edges : to specify edgeslist file
190
 
191
And if you have ground truth, you can run below commands to report performances:
192
 
193
 
194
```
195
python break_cycles.py -g data/gnm_300_2500_graph_w_extra_300_path_len_0.edges -t data/gnm_300_2500_extra_300_path_len_0.edges
196
```
197
 
198
* -t data/gnm_300_2500_extra_300_path_len_0.edges : to specify ground truth of edges to be removed
199
 
200
 
201
#### 11. Visualization
202
 
203
Visualize peroformance of different methods.
204
 
205
```
206
python plot_recallPrecision.py --input data/performance.txt --title "RG(1.5K,15K)"
207
```
208
 
209
* --input data/performance.txt
210
 
211
file format:
212
 
213
method_name  precision recall f-1Score(optional) numEdgesRemoved(optional)
214
 
215
**(columns are separated by whitespace)**
216
 
217
First **3** columns must be: method_name, precision, and recall.
218
 
219
Other columns are optional, such as f-1score and numEdgesRemoved in data/performance.txt.
220
 
221
 
222
* --title "RG(1.5K,15K)": file will be saved at: performance_RG(1.5K,15K).pdf
223
 
224
For example, we test algorithms on a random graph (1500,15000) with 500 extra edges added, and get performance saved as:
225
 
226
 
227
```
228
DFS     0.0140  0.1820  0.0259  6516
229
PR      0.2298  0.7400  0.3507  1610
230
MFAS    0.4464  0.9320  0.6036  1044
231
SA_G    0.8993  0.9640  0.9305  536
232
SA_F    0.5946  0.9620  0.7349  809
233
SA_B    0.5975  0.9680  0.7389  810
234
SA_Voting       0.9282  0.9560  0.9419  515
235
TS_G    0.8274  0.9300  0.8757  562
236
TS_F    0.5145  0.9580  0.6695  931
237
TS_B    0.5010  0.9700  0.6608  968
238
TS_Voting       0.8654  0.9260  0.8947  535
239
H_Voting        0.9369  0.9500  0.9434  507
240
```
241
 
242
* first column, such as DFS: method name
243
* second column, such as 0.0140: precision
244
* third column, such as 0.1820: recall
245
* forth column, such as 0.0259: f-1 score (optional, will compute f-1 automatically)
246
* fifth column, such as 1044: number of edges to be removed
247
 
248
visualization will be saved as: performance_RG(1.5K,15K).pdf
249
 
250
#### Licence and Acknowledgement
251
 
252
BSD Licence.
253
 
254
 
255
This work is supported by the National Science Foundation of United States under grant CCF-1645599 and IIS-1550302. All content represents the opinion of the authors, which is not necessarily shared or endorsed by their respective employers and/or sponsors.
256
 
257
 
258
 
259
#### Cite
260
 
261
If you use this code, please consider to cite this paper:
262
 
263
```
264
 @inproceedings{Sun:2017:BCN:3091478.3091495,
265
 author = {Sun, Jiankai and Ajwani, Deepak and Nicholson, Patrick K. and Sala, Alessandra and Parthasarathy, Srinivasan},
266
 title = {Breaking Cycles In Noisy Hierarchies},
267
 booktitle = {Proceedings of the 2017 ACM on Web Science Conference},
268
 series = {WebSci '17},
269
 year = {2017},
270
 isbn = {978-1-4503-4896-6},
271
 location = {Troy, New York, USA},
272
 pages = {151--160},
273
 numpages = {10},
274
 url = {http://doi.acm.org/10.1145/3091478.3091495},
275
 doi = {10.1145/3091478.3091495},
276
 acmid = {3091495},
277
 publisher = {ACM},
278
 address = {New York, NY, USA},
279
 keywords = {cycle edges, directed acyclic graph, graph hierarchy, social agony, trueskill},
280
}
281
```

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.