Top Banner
1 Complex Social Network Mining Theory, Methodologies, and Applications Jie Tang Department of Computer Science and Technology Tsinghua University Email: [email protected]
67

Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

Aug 04, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

1

Complex Social Network Mining —Theory, Methodologies, and Applications

Jie Tang

Department of Computer Science and Technology

Tsinghua University Email: [email protected]

Page 2: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

2

Social Networks

Web 1.0 (1989)

Pages, hyperlinks

Relevance search

Web 2.0 (2004)

social networks

Blogs, micro-blogs

Mobile Web (2008-20)

Connecting via

mobiles…

Web-based (or mobile-based) social networks already

become a bridge to connect our real daily life and the virtual

web space

Page 3: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

3

Web-based Social Network Mining —Theory, Methodologies, and Applications

Web1.0: Web of Pages Web 2.0: Web of People Web 3.0: Web of Semantics

Social theory Learning from users

Attribute/link

prediction

Search/query

over networks

Social dynamics

Trust and privacy

Social influence

analysis

Social data

integration

Social

knowledge

acquisition

1

2

3

4

5

6

7

Collective learning Graphical models

Theoretical layer

Page 4: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

4

Outline

• ArnetMiner: Academic Social Network

• Core Techniques

– Knowledge Acquisition

– Semantic Integration

– Heterogeneous Ranking

– Social Influence Analysis

• Demo

Page 5: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

5

提供全面的研究者网络分析与挖掘功能

Papers published: ACM TKDD, KDD’08-10, SDM’09,

ICDM’07-09, CIKM’07-09, DKE, JIS

http://arnetminer.org/

ArnetMiner.org - Academic research social network analysis and mining system

Page 6: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

6

Why Arnetminer.org?

“Academic search is

treated as document

search, but ignore

semantics”

“The information

need is not only

about publication…”

Page 7: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

7

Examples – Expertise search

Researcher A

• When starting a

work in a new research topic;

• Or brainstorming for novel

ideas.

• Who are experts in this field?

• What are the top conferences in

the field?

• What are the best papers?

• What are the top research labs?

Page 8: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

8

Examples – Citation network analysis

Researcher B • an in-depth understanding

of the research field?

Self-Indexing Inverted Files for

Fast Text Retrieval

Static

Index Pruning for Information

Retrieval Systems

Signature les: An access Method

for Documents and

its Analytical Performance

Evaluation

Filtered

Document Retrieval with

Frequency-Sorted Indexes

Vector-space Ranking with

Effective Early Termination

Efficient Document Retrieval in

Main Memory

A Document-centric Approach

to Static Index Pruning in Text

Retrieval Systems

An Inverted Index

Implementation

Parameterised Compression for

Sparse Bitmaps

Introduction of Modern

Information Retrieval

Memory Efficient

Ranking

Topic 31: Ranking and Inverted Index

Topic 27: Information retrieval

Topic 1 : Theory

Topic 21: Framework

Topic 22: Compression

Other

Topic 23: Index method

Topic 34: Parallel computing

Basic theoryComparable workOther

Citation Relationship Type

Topics

Page 9: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

9

Which conference

should we submit the

paper?

Researcher C

authors

content

Examples – Conference Suggestion

Page 10: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

10

Who are best matching

reviewers for each

paper? KDD Committee

Paper content

conference

Examples – Reviewer Suggestion

Page 11: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

11

Our Social Network is Black Social network without

role/relationship info, e.g. a company’s email network

CEO

Employee

How to

infer Manager

Latent relationship graph

Fortunately, user interactions form implicit groups

Page 12: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

12

From BW to Color

Page 13: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

13

1

3

2

Page 14: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

14

Person Search

Basic Info.

Research Interests

Publications

Social Network

Citation statistics

Fundings

Page 15: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

15

Expertise

Search

Finding experts,

expertise conferences,

and expertise papers

for ―information

retrieval‖

Page 16: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

16

Course Search

Finding courses for

―data mining‖

Page 17: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

17

Association Search

Finding associations

between persons

- high efficiency

- Top-K associations

Usage:

- to find a partner

- to find a person with

same interests

Page 18: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

18

Sub-Graph Search

Sub graphs

Page 19: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

19

Topic Browser

200 topics have been

discovered automatically

from the academic network

Page 20: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

20

Academic Performance Measurement

Academic Statistics

Personal Statistics

Page 21: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

21

Outline

• ArnetMiner: Academic Social Network

• Core Techniques

– Knowledge Acquisition

– Semantic Integration

– Heterogeneous Ranking

– Social Influence Analysis

• Demo

Page 22: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

22

Homepage

Papers

ACM

Papers

DBLP

Papers

Libra

Data Sources

Integration

Social Network Extraction

Name Disambiguation

Profiling extraction

Homepage finding

Publication extraction

citation

Scholar

Social Network Storage

Storage

Indexing

Access interface

RNKB

Metadata

Modeling and Search Network

T

DNd

wzxad

β

Φ

α

A

θ

c

T

μ ψ

Social Network AnalysisM

ap

-redu

ce--Distrib

uted

pro

cessing p

latfo

rm

Topic model

Academic

suggestion Expertise search

Social involution

analysis

Citation tracing

analysis

Social influence

analysis

ArnetMiner: Overview

1

2 3

writewrite

cite

cite

cite

write

write

write

cite

Write

publish

publish

publish

publish

publish

publish

write

write

coauthor coauthor

Dr. Tang

Limin

Prof. Wang

Prof. Li

SVM...Association...

Tree CRF...

Semantic...EOS... Annotation...

IJCAI

ISWC

WWW

Pc member

Page 23: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

23

Ruud Bolle Office: 1S-D58

Letters: IBM T.J. Watson Research Center

P.O. Box 704

Yorktown Heights, NY 10598 USA

Packages: IBM T.J. Watson Research Center

19 Skyline Drive

Hawthorne, NY 10532 USA

Email: [email protected]

Ruud M. Bolle was born in Voorburg, The Netherlands. He received the Bachelor's

Degree in Analog Electronics in 1977 and the Master's Degree in Electrical

Engineering in 1980, both from Delft University of Technology, Delft, The

Netherlands. In 1983 he received the Master's Degree in Applied Mathematics and in

1984 the Ph.D. in Electrical Engineering from Brown University, Providence, Rhode

Island. In 1984 he became a Research Staff Member at the IBM Thomas J. Watson

Research Center in the Artificial Intelligence Department of the Computer Science

Department. In 1988 he became manager of the newly formed Exploratory Computer

Vision Group which is part of the Math Sciences Department.

Currently, his research interests are focused on video database indexing, video

processing, visual human-computer interaction and biometrics applications.

Ruud M. Bolle is a Fellow of the IEEE and the AIPR. He is Area Editor of Computer

Vision and Image Understanding and Associate Editor of Pattern Recognition. Ruud

M. Bolle is a Member of the IBM Academy of Technology.

DBLP: Ruud Bolle

2006

Nalini K. Ratha, Jonathan Connell, Ruud M. Bolle, Sharat Chikkerur: Cancelable Biometrics:

A Case Study in Fingerprints. ICPR (4) 2006: 370-373EE50

Sharat Chikkerur, Sharath Pankanti, Alan Jea, Nalini K. Ratha, Ruud M. Bolle: Fingerprint

Representation Using Localized Texture Features. ICPR (4) 2006: 521-524EE49

Andrew Senior, Arun Hampapur, Ying-li Tian, Lisa Brown, Sharath Pankanti, Ruud M. Bolle:

Appearance models for occlusion handling. Image Vision Comput. 24(11): 1233-1243 (2006)EE48

2005

Ruud M. Bolle, Jonathan H. Connell, Sharath Pankanti, Nalini K. Ratha, Andrew W. Senior:

The Relation between the ROC Curve and the CMC. AutoID 2005: 15-20EE47

Sharat Chikkerur, Venu Govindaraju, Sharath Pankanti, Ruud M. Bolle, Nalini K. Ratha:

Novel Approaches for Minutiae Verification in Fingerprint Images. WACV. 2005: 111-116EE46

...

Ruud Bolle Office: 1S-D58

Letters: IBM T.J. Watson Research Center

P.O. Box 704

Yorktown Heights, NY 10598 USA

Packages: IBM T.J. Watson Research Center

19 Skyline Drive

Hawthorne, NY 10532 USA

Email: [email protected]

Ruud M. Bolle was born in Voorburg, The Netherlands. He received the Bachelor's

Degree in Analog Electronics in 1977 and the Master's Degree in Electrical

Engineering in 1980, both from Delft University of Technology, Delft, The

Netherlands. In 1983 he received the Master's Degree in Applied Mathematics and in

1984 the Ph.D. in Electrical Engineering from Brown University, Providence, Rhode

Island. In 1984 he became a Research Staff Member at the IBM Thomas J. Watson

Research Center in the Artificial Intelligence Department of the Computer Science

Department. In 1988 he became manager of the newly formed Exploratory Computer

Vision Group which is part of the Math Sciences Department.

Currently, his research interests are focused on video database indexing, video

processing, visual human-computer interaction and biometrics applications.

Ruud M. Bolle is a Fellow of the IEEE and the AIPR. He is Area Editor of Computer

Vision and Image Understanding and Associate Editor of Pattern Recognition. Ruud

M. Bolle is a Member of the IBM Academy of Technology.

CT1: Knowledge Acquisition from Social Web (ACM TKDD, ISWC’06, ICDM’07, ACL’07, CIKM’07-08)

Contact Information

Educational history

Academic services

Publications

1

1

2

2

Ruud Bolle

Position

Affiliation

Address

Address

Email

Phduniv

Phdmajor

Phddate

Msuniv

Msdate

Msmajor

BsunivBsdate

Bsmajor

Research Staff

IBM T.J. Watson Research

Center

P.O. Box 704

Yorktown Heights,

NY 10598 USA

[email protected]

Brown University

1984

Electrical Engineering

Delft University of Technology

Analog Electronics

1977

Delft University of Technology

IBM T.J. Watson Research

Center

19 Skyline Drive

Hawthorne, NY 10532 USA

IBM T.J. Watson

Research Center

Electrical Engineering

1980

Applied Mathematics

Msmajor

http://researchweb.watson.ibm.com/

ecvg/people/bolle.html

Homepage

Ruud BolleName

video database indexing

video processing

visual human-computer interaction

biometrics applications

Research_Interest

Photo

Publication 1#

Cancelable Biometrics:

A Case Study in

Fingerprints

ICPR 370

2006

Date

Start_page

Venue

Title

373

End_page

Publication 2#

Fingerprint

Representation Using

Localized Texture

Features

ICPR 521

2006

Date

Start_page

Venue

Title

524

End_page

. . .

Co-authorCo-author

1

Ruud Bolle

2

Publication #3

Publication #5

coauthor

coauthor

UIUC affiliation

Professor position

2

1

Two questions: • How to accurately extract the researcher

profile information from the Web?

• How to integrate the information from different

sources?

Page 24: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

24

Researcher Network Extraction

Researcher

Homepage

Phone

Address

Email

Phduniv

Phddate

PhdmajorMsuniv

Bsmajor

Bsdate

Bsuniv

Affiliation

Postion

Msmajor

Msdate

Fax

Person Photo

Publication

Research_Interest

NameAuthored

Title

Publication_venue

Start_page

End_page

Date

Coauthor

70.60% of the researchers

have at least one homepage

or an introducing page

There are a large number of

person names having the

ambiguity problem

85.6% from

universities

14.4% from

companies

71.9% are

homepages

28.1% are

introducing

pages

60% are natural

language text

40% are in lists

and tables

70% moved at least one time

Even 3 ―Yi Li‖ graduated from

the author’s lab

Page 25: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

25

Our Approach Picture – based on Markov Random Field

aYbY

cY

eY

fY

dY

( | | )

( | | ~ )

i j j i

i j j i

P Y Y Y Y

P Y Y Y Y

Special cases: - Conditional Random Fields

- Hidden Markov Random

Fields

Markov Property:

x1

x2

x3

co-conference

cite

coauthor

cite

coauthor

coauthor

coauthort-coauthor

cite

co-conference

x4

x5

x6

x7

x8

x9

x10

x11

y1=1

y8=1

y9=3

y11=3

y10=3

y2=1

y3=1

y4=2

y7=2

y6=2

y5=2

Researcher Profiling Name Disambiguation

Page 26: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

26

CT2: Semantic Integration (IEEE TKDE, SIGMOD’09, IJCAI’09, ISWC’09)

Page 27: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

27

RiMOM-A Tool for Semantic Integration (OAEI’06-09)

0

0.5

1

Benchmark Results

Precsion

Recall

F-measure 0

0.2

0.4

0.6

0.8

1 Anatomy Results

Precision

Recall

Recall+

F-measure

0 0.2 0.4 0.6 0.8

1

agrafsa Subtrack Results

Precision http://keg.cs.tsinghua.edu.cn/project/RiMOM/

“I’m really surprised by the

good results of these years

RiMOM, you can compete with

the top systems that make use

of such background knowledge.‖

Page 28: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

28

CT3: Topic-based Heterogeneous Ranking (Machine Learn. J, KDD’08, ICDM’08, CIKM’09, DKE)

Search with

keyword

Modeling using VSM Principles of Data Mining. DJ Hand - Drug Safety, 2007 - drugsafety.adisonline.com

Advances in Knowledge Discovery and Data Mining UM Fayyad, G Piatetsky-Shapiro, P Smyth, R…

Data Mining: Concepts and Techniques J Han, M Kamber - 2001…

Return

Search with

semantic

modeling

Modeling using semantic topics

Data

mining

Data mining

Association Rules

Database systems

Data management

Web databases

Information systems

0.4

0.2

0.15 0.1

0.05

0.02

Topics

Return

Experts Expertise

conferences

Expertise

papers

Data

mining

11

00

1 1 0 1

1 0 1 0 1

0 1

001

11

11

Query

vector

Doc1

vector

Doc3

vector

Doc4 vector

Page 29: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

29

1. How to model the

heterogeneous academic

network?

2. How to capture the link

information for ranking

objects in the academic

network?

Challenges

-------------------------

-------------------------

-------------------------

-------------------------

------------------------

-------------------------

-------------------------

-------------------------

------------------------

-------------------------

-------------------------

-------------------------

-------------------------

-------------------------

Cite------------------------

------------------------

------------------------

------------------------

------------------------

Cite

Cite

Citewrite

write

write

Co-write

Co-writeCo-author

Co-author

PC member

chair

publish

publish

publish

-------------------------

-------------------------

-------------------------

-------------------------

------------------------

-------------------------

-------------------------

-------------------------

------------------------

-------------------------

-------------------------

-------------------------

-------------------------

-------------------------

Cite------------------------

------------------------

------------------------

------------------------

------------------------

Cite

Cite

Citewrite

write

write

Co-write

Co-writeCo-author

Co-author

PC member

chair

publish

publish

publish

-------------------------

-------------------------

-------------------------

-------------------------

------------------------

-------------------------

-------------------------

-------------------------

------------------------

-------------------------

-------------------------

-------------------------

-------------------------

-------------------------

Cite------------------------

------------------------

------------------------

------------------------

------------------------

Cite

Cite

Citewrite

write

write

Co-write

Co-writeCo-author

Co-author

PC member

chair

publish

publish

publish

Page 30: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

30

Modeling the Academic Network

T

DNd

wzxad

β

Φ

α

A

θ

c

T

μ ψ

T

DNd

wzx

ad

β

Φ

α

AC

θ

c

T

D

Nd

wz

β

Φ

c

η,σ2

ad x

α

A

θ

ACT1 ACT2 ACT3

authors

Topic

words

conference

Author-Conference-Topic Model [Tang et al., 08]

Page 31: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

31

Random walk over the

academic network Modeling academic

network with topics

Integrating Topic Model into Random Walk

-------------------------

-------------------------

-------------------------

-------------------------

------------------------

-------------------------

-------------------------

-------------------------

------------------------

-------------------------

-------------------------

-------------------------

-------------------------

-------------------------

Cite------------------------

------------------------

------------------------

------------------------

------------------------

Cite

Cite

Citewrite

write

write

Co-write

Co-writeCo-author

Co-author

PC member

chair

publish

publish

publish+

Author-Conference-Topic

Model [Tang et al., 08]

Page 32: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

32

Combination Method 1

ISWC

IJCAI

WWW

Tree CRF...

EOS...

Association...

Paper Graph Gp

Author Graph Ge

Prof.

WangProf. Tang

Jing Zhang

Conference

Graph Gc

λde

λed

λcd

λdc

λdd

Stage 1:

Random walk

Stage 2.

Topic-based

relevance

Ranking score

Topic-based

relevance score

Combination by

multiplication

ISWC

IJCAI

WWW

Tree CRF...

EOS...

Association...

Prof.

WangProf. Tang

Jing Zhang

Data

mining

Query

...

...

Topic layer

Page 33: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

33

Query:

ontology alignment

ISWC

IJCAI

WWW

Tree CRF...

EOS...

Association...

posowl

Web

service

Paper Graph Gp

Author Graph Ge

Prof.

WangProf. Tang

Jing Zhang

Conference

Graph Gc

Hidden Theme

Graph Gt

λde

λed

λcd

λdc

λtdλdt

λqtλtq

λdd

Combination Method 2

Ranking score

Transition probability

Page 34: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

34

Learning to Rank Experts

• Combining more information Empirical loss Model penalty

2

1

2

21 ,min

i i iT

n

a b

TTw

T T T

i

wz w x x

feature weight

Language model,

BM25, tf*idf

Page 35: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

35

Heterogeneous Cross-domain Ranking

KDD

SDM

ICDM

PAKDD

?

P. Yu

?

Principles of Data Mining

Data Mining: Concepts and

Techniques

?

Conferences

Papers

Authors

?

?

Query: “data mining”

conf author/

paper

1 2

1

2

1, 2,1

mi 1 ,n , 1i i i i

S Ti i

n n

a b a b

S S S S T T T Tw w

i i

C Wz w x x z w x x

Loss in one domain Loss in another domain

1 2

1 12, ,1,

21 , 1 ,min

i i iS T

i i i

n n

T a b T a b

S S S S T T T

i iw w

TU

z w U x x z w xC U Wx

Common feature space

Page 36: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

36

Learning Algorithm

• Equivalent objective function:

Optimize the loss function for

each domain

Common space discovery

Optimize the weight

via the common space

Page 37: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

37

Experimental Results

• Data sets

– Homogeneous Data

• LETOR 2.0: TREC2003, TREC2004, and OHSUMED

– Heterogeneous Data

• Academic network consisting of 14,134 authors, 10,716

papers, and 1,434 conferences.

– Heterogeneous Tasks

• Expert finding vs. Bole search

• Baselines

– RSVM

– Language model

Page 38: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

38

Results on Homogeneous Data

Page 39: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

39

Results on Heterogeneous Data

Page 40: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

40

Results on Heterogeneous Tasks

• Expert finding verse Bole search (finding best supervisor)

• To obtain ground truth of bole for each query

– We sent emails to 50 senior researchers and 50 junior researchers

(91.6% are post doc or graduates)

– Average their feedbacks

Page 41: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

41

CT4: Social Influence Analysis (KDD’10, KDD’10, KDD’09, ICDM’09, JIS)

• How to quantify the influence between users?

• What is the relationship between users?

• How to discover topic distribution over links?

• Can we predict the user’s actions? write

write

cite

cite

cite

write

write

write

cite

Write

publish

publish

publish

publish

publish

publish

write

write

coauthor coauthor

Dr. Tang

Limin

Prof. Wang

Prof. Li

SVM...Association...

Tree CRF...

Semantic...EOS... Annotation...

IJCAI

ISWC

WWW

Pc member

Page 42: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

42

Topic-based Social Influence Analysis

• Social network -> Topical influence network

Page 43: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

43

Social Influence Sub-graph on ―Data mining‖

Page 44: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

44

Influential nodes on different topics

Page 45: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

45

CT4: Social Influence Analysis (KDD’10, KDD’10, KDD’09, ICDM’09, JIS)

• How to quantify the influence between users?

• What is the relationship between users?

• How to discover topic distribution over links?

• Can we predict the user’s actions? write

write

cite

cite

cite

write

write

write

cite

Write

publish

publish

publish

publish

publish

publish

write

write

coauthor coauthor

Dr. Tang

Limin

Prof. Wang

Prof. Li

SVM...Association...

Tree CRF...

Semantic...EOS... Annotation...

IJCAI

ISWC

WWW

Pc member

Page 46: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

46

Mining Advisor-Advisee Relationship

from Research Publication Networks

Sm ithth

2000

2000

2001

2002

2003

22000000000000000000

1999

A da B ob

Jerry

Y ing

Input:T em poralcollaboration netw ork

O utput:R elationship analysis

(0.8,[1999,2000])

(0.7,[2000,2001])

(0.65,[2002,2004])

2004

A da

Bob

Y ing

Sm ith

(0.2,[2001,2003])

(0.5,[/,2000])

(0.9,[/,1998])

(0.4,[/,1998])

(0.49,[/,1999])

V isualized chorologicalhierarchies

Jerry

Page 47: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

47

Results

Page 48: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

48

Application: visualization

TPFG

RULE

Page 49: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

49

Bole Search In Arnetminer

An example on a real

system: Arnetminer

Performance improvement

Page 50: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

50

Results (cont.)

Page 51: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

51

CT4: Social Influence Analysis (KDD’10, KDD’10, KDD’09, ICDM’09, JIS)

• How to quantify the influence between users?

• What is the relationship between users?

• How to discover topic distribution over links?

• Can we predict the user’s actions? write

write

cite

cite

cite

write

write

write

cite

Write

publish

publish

publish

publish

publish

publish

write

write

coauthor coauthor

Dr. Tang

Limin

Prof. Wang

Prof. Li

SVM...Association...

Tree CRF...

Semantic...EOS... Annotation...

IJCAI

ISWC

WWW

Pc member

Page 52: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

52

Original citation network Semantic citation network

Examples – Topic distribution analysis over citations

Researcher A • an in-depth understanding

of the research field?

VS.

Self-Indexing Inverted Files for

Fast Text Retrieval

Static

Index Pruning for Information

Retrieval Systems

Signature les: An access Method

for Documents and

its Analytical Performance

Evaluation

Filtered

Document Retrieval with

Frequency-Sorted Indexes

Vector-space Ranking with

Effective Early Termination

Efficient Document Retrieval in

Main Memory

A Document-centric Approach

to Static Index Pruning in Text

Retrieval Systems

An Inverted Index

Implementation

Parameterised Compression for

Sparse Bitmaps

Introduction of Modern

Information Retrieval

Memory Efficient

Ranking

Topic 31: Ranking and Inverted Index

Topic 27: Information retrieval

Topic 1 : Theory

Topic 21: Framework

Topic 22: Compression

Other

Topic 23: Index method

Topic 34: Parallel computing

Basic theoryComparable workOther

Citation Relationship Type

Topics

Page 53: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

53

Problem: Link Semantic Analysis Topic modeling

over links Citation context

words

Link semantics

Page 54: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

54

Pairwise Restricted Boltzmann Machines

(PRBMs)

Link context

words

Topic distribution

Link category

Latent variables

defined over the

link to bridge the

two pages

Pairwise Restricted Boltzmann

Machines (PRBMs) Example

Page 55: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

55

Accuracy of Link Categorization

gPRBM: our approach

with generative

learning

dPRBM: our approach

with discriminative

learning

hPRBM: our approach

with hybrid learning

Tested on Arnetminer

citation data

Page 56: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

56

CT4: Social Influence Analysis (KDD’10, KDD’10, KDD’09, ICDM’09, JIS)

• How to quantify the influence between users?

• What is the relationship between users?

• How to discover topic distribution over links?

• Can we predict the user’s actions? write

write

cite

cite

cite

write

write

write

cite

Write

publish

publish

publish

publish

publish

publish

write

write

coauthor coauthor

Dr. Tang

Limin

Prof. Wang

Prof. Li

SVM...Association...

Tree CRF...

Semantic...EOS... Annotation...

IJCAI

ISWC

WWW

Pc member

Page 57: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

57

What can we do in SNS?

Page 58: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

58

Social Action

Add favorites

Comment on Haiti

Earthquake

Publish in KDD

Conference

Twitter Flickr KDD

Page 59: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

59

Action

1. Always watch news

2. Enjoy sports

3. … Attribute

Comment on Haiti

Earthquake

Time t Time t+1

Page 60: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

60

Results

Page 61: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

61

Arnetminer Today — A brief summary

Page 62: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

62

• 2006/5, V0.1 Perl-based CGI version

– Profile extraction, person/paper/conf. search

• 2006/8, V1.0 Java (Demo @ ASWC)

– Rewrite the above functions

• 2007/7, V2.0 (Demo @ KDD, ISWC)

– New: survey search, research interest, association search

• 2008/4, V3.0 (Demo @ WWW)

– Query understanding, New search GUI, log analysis

• 2008/11, V4.0 (Demo @ KDD, ICDM)

– Graph search, topic mining, NSFC/NSF

• 2009/4, V5.0 (Demo @ KDD)

– Bole/course search, profile editing, open resources, #citation

• 2009/12, V6.0

– Academic statistics, user feedbacks, refined ranking

• V7.0, coming soon

– Name disambiguation, reviewer assignment, supervisor suggestion, open API

ArnetMiner’s History

Page 63: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

63

* Arnetminer data:

> 0.6 M researcher profiles

> 3M papers

> 17M citation relationships

> 5K conferences

> 50M logs

* Visits come from more than 190

countries

* Continuously +20% increase of

visits per month

* >100,000 page views per day

ArnetMiner Today

Top 10 countries

1. USA 6. Canada

2. China 7. Japan

3. Germany 8. Taiwan

4. India 9. France

5. UK 10. Italy

… I’ve happened to visit your Arnetminer, and

shocked. It was really impressive, its usefulness

and your works!!! … [from …@selab.snu.ac.kr]

…I would first of all congratulate you on the

excellent work you have done in Arnetminer and I

am much inspired… [from …@nu.edu.pk]

… Arnetminer is one of my favorite tools to find

folk and academic relatives… [from …@qlink.com]

Dear Dr. Jie Tang,

Can you include our papers (http://www.waset.org)

in your Arnetminer?... [from …@waset.org]

.. top top! I am very interested in your ArnetMiner.

Is that possible give me a bit of your social

network data… [from …@cse.ust.hk]

Messages from Users

Title: Semantic Technologies for Learning and Teaching in Web 2.0. — Thanassis Tiropanis, Hugh Davis, Dave Millard, Mark Weal

…Exposing the expertise of the institution to the outside world in order to attract

funding and students. ArnetMiner is the most representative example of such tools

at the moment…

Contextualised queries and searches, searches across repositories potentially in

different departments or institutions, and matching of people for collaborative

activities. Best example of the surveyed technologies to this end is ArnetMiner.

a survey by UK Southampton

Page 64: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

64

Opportunity: exploiting semantic web and social network

in the real-world

Social search

& mining

Social network

Extraction

Social network

Mining

IBM

Scientific

Literature

Users cover >180

countries

>600K researcher

>3M papers

Arnetminer.org

Advertisement

Advertisement

Recommendation

Sohu

Mobile Context

Mobile search

& recommendation

Nokia

Large-scale

Mining

Scalable algorithms

for message tagging

and community

Discovery

Google

Energy trend

analysis

Energy product

Evolution

Techniques

Trend

Oil Company

Search, browsing, complex query, integration, collaboration, trustable

analysis, decision support, intelligent services,

Web, relational data,

ontological data,

social data

Data Mining and Social Network techniques

科技信息资源内容监测与分析服务平台 (中国科技部信息情报研究所)

Page 65: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

65

Arnetminer

PatentMiner CheMiner PubmedMiner ScopusMiner

Page 66: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

66

Representative Publications • Jie Tang, Jing Zhang, Ruoming Jin, Zi Yang, Keke Cai, Li Zhang, and Zhong Su. Topic Level Expertise

Search over Heterogeneous Networks. Machine Learning Journal.

• Jie Tang, Limin Yao, Duo Zhang, and Jing Zhang. A Combination Approach to Web User Profiling. ACM

TKDD, 2010.

• Juanzi Li, Jie Tang, Yi Li, Qiong Luo. RiMOM: A Dynamic Multi-Strategy Ontology Alignment Framework.

IEEE TKDE, 2009.

• Chenhao Tan, Jie Tang, Jimeng Sun, Quan Lin, and Fengjiao Wang. Social Action Tracking via Noise

Tolerant Time-varying Factor Graphs. KDD’10.

• Chi Wang, Jiawei Han, Yuntao Jia, Duo Zhang, Yintao Yu, Jie Tang, Jingyi Guo. Mining Advisor-Advisee

Relationships from Research Publication Networks. KDD’10.

• Jie Tang, Jimeng Sun, Chi Wang, and Zi Yang. Social Influence Analysis in Large-scale Networks. KDD'09.

• Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. ArnetMiner: Extraction and Mining of

Academic Social Networks. KDD’08.

• Jie Tang, Hang Li, Yunbo Cao, and Zhaohui Tang. Email Data Cleaning. KDD’05.

• Jie Tang, Ho-fung Leung, Qiong Luo, Dewei Chen, and Jibin Gong. Towards Ontology Learning from

Folksonomies. IJCAI’09.

• Qian Zhong, Hanyu Li, Juanzi Li, Guotong Xie, Jie Tang, Lizhu Zhou. A Gauss Function based Approach for

Unbalanced Ontology Matching. SIGMOD’09.

• Feng Shi, Juanzi Li, Jie Tang. Actively Learning Ontology Matching via User Interaction. ISWC’09.

• Chonghui Zhu, Jie Tang, Hang Li, Hwee Tou Ng, and Tiejun Zhao. A Unified Tagging Approach to Text

Normalization. ACL’07.

Others: ICDM’07-09, CIKM’07-09, SDM’09, ISWC’06, DKE, JIS, etc.

Page 67: Complex Social Network Mining - Tsinghuakeg.cs.tsinghua.edu.cn/jietang/publications/Jie... · 3 Web-based Social Network Mining —Theory, Methodologies, and Applications Web1.0:

67

Demo: http://arnetminer.org

HP: http://keg.cs.tsinghua.edu.cn/persons/tj/

Thanks!