Top Banner
Tagging structure in a protein-protein interaction network, a co-authorship network and the (English) Wikipedia Protein s Math co- authors Wikipedia articles Network Nodes + Links Node tags Directed Acyclic Graph Gergely Palla, Illés J. Farka s , Péter Pollner, Imre Derényi, Tamás Vicsek Category tree FIFA World Cup FIFA World Cup Players Classification of co-authored papers Combinator ics Graph theory Biochemical functions Growth Cell growth Eötvös University and Hungarian Academy of Sciences (Budapest, Hungary) Interactions (“MIPS”) Co-authorships (“MathSciNet”) Hyperlinks in Wikipedia CFinder.org
9
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Wikimania 2010-illesfarkas-v4

Tagging structure in a protein-protein interaction network, a co-authorship network and the (English) Wikipedia

Proteins Math co-authors Wikipedia articles

Network

Nodes + Links

Node tags

DirectedAcyclicGraph

Gergely Palla, Illés J. Farkas, Péter Pollner, Imre Derényi, Tamás Vicsek

Category tree

FIFA World Cup

FIFA World Cup Players

Classification of co-authored papers

Combinatorics

Graph theory

Biochemical functions

Growth

Cell growth

Eötvös University and Hungarian Academy of Sciences (Budapest, Hungary)

Interactions (“MIPS”) Co-authorships (“MathSciNet”) Hyperlinks in Wikipedia

CFinder.org

Page 2: Wikimania 2010-illesfarkas-v4

G. Palla, I. J. Farkas, P. Pollner, I. Derényi, T. Vicsek Fundamental statistical features and self-similar properties of tagged networks New Journal of Physics 10, 123026 (2008) 

http://CFinder.org --> Publications

Full version:

Page 3: Wikimania 2010-illesfarkas-v4

There is no clearly separated group of “most popular tags”Transition between popular and less popular tags is continuous

Portion (x) of all nodes in the network

Probability that a given tag and its descendants label a portion x of all nodes

CFinder.org

Page 4: Wikimania 2010-illesfarkas-v4

There is no clear group of “most heavily tagged nodes”Transition between strongly and less strongly tagged nodes is continuous

Number of tags (n) on a node (protein, author, wiki article) of the network

Probability that a node has n tags

CFinder.org

Page 5: Wikimania 2010-illesfarkas-v4

On the large scale there is no “link saturation” within a topicIn fact, link density within a large topic is almost the same as outside

CFinder.org

Number of links among these nodes

Number of Wikipedia articles selected from those labeled with “Japan” and descendant terms

Maximum possible

Wikipedia average

Page 6: Wikimania 2010-illesfarkas-v4

CFinder.org

AB

C D

E

F G

H

J

Meaningful removal of loops from the category hierarchyHow to achieve tree structure (DAG) with the lowest number of removals

Example (Oct.2007):

Category:Urdu Category:Hindustani

Finding all loops is easy.

But which of their links should be removed?

Goals: - Remove lowest possible number of links- Smallest “damage” to existing category hierarchy

subcategory

subcategory

Page 7: Wikimania 2010-illesfarkas-v4

CFinder.org

Meaningful removal of loops from the category hierarchyHow to achieve tree structure (DAG) with the lowest number of removals

Example (Oct.2007):

(1) Identify “loop subgraph”by iteratively removing nodes with 1 link

(2) Iteratively remove “least important” linksfrom the loop subgraph

(3) Add non-loop links again

AB

C D

E

F G

H

JCategory:Urdu Category:Hindustani

subcategory

subcategory

Finding all loops is easy.

But which of their links should be removed?

Goals: - Remove lowest possible number of links- Smallest “damage” to existing category hierarchy

Page 8: Wikimania 2010-illesfarkas-v4

Gergely Palla Illés Farkas Péter Pollner Imre Derényi Tamás Vicsek

Page 9: Wikimania 2010-illesfarkas-v4

http://CFinder.org -- with network data and free analysis software

We thank forsupport from: