Top Banner
CLUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN ACTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1
29

C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

Dec 30, 2015

Download

Documents

Loreen Goodwin
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

1

CLUSTERING NETWORKED DATA BASED ON LINK AND

SIMILARITY IN ACTIVE LEARNING

Advisor : Sing Ling Lee

Student : Yi Ming Chang

Speaker : Yi Ming Chang

Page 2: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

2

OUTLINE

Introduction

Active Learning

Networked data

Related Work

Newman’s Modularity

Collective Classification(ICA)

ALFNET

CLAL

Experimental Results

Conclusion

Page 3: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

3

PASSIVE LEARNING

-

+

++-

-

-

+ ClassifierTrain

Classify-

+

++-

-

-

++

++

+

+

+--

-

-

: Unlabeled instance

: Labeled instance

+Testing data

Training data

Wrong : 5+

-

+

-

Page 4: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

4

ACTIVE LEARNING

+-

-

ClassifierTrain

Classify-

+

++-

-

-

++

++

+

+

+--

-

-

: Unlabeled node

: Labeled node

Testing data

Training data

+

+

-

+

+

-

Query

EX : Query batch number = 3

+

+-

-

Wrong : 2

Page 5: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

NETWORK DATA

5Classifier

training classify

+

-+

+

+

-

-

-

-

: Unlabeled node

: Labeled node

Page 6: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

6

OUTLINE

Introduction

Active Learning

Networked data

Related Work

Newman’s Modularity

Collective Classification(ICA)

ALFNET

CLAL

Experimental Results

Conclusion

Page 7: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

7

NEWMAN’S MODULARITY FOR CLUSTERING

m = 5 : Real edge : Degree of node : Group of node

= (1 – 2*2 /10 ) = (0 – 2*2/10 ) = (1 – 2*3/10 ) = (0 – 2*1/10 )

1

32

5

4

ijAik

iisi

21ss12B13B 31ss14B 41ss15B 51ss

121 ss131 ss141 ss151 ss

1,1,

1

23

4

5

Page 8: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

8

NEWMAN’S MODULARITY FOR CLUSTERING

Example :

= (1 – 5*2 /16 ) = 0.375 = (0 – 5*3/ 16 ) = -0.9375 = (1 – 2*5/ 16 ) = 0.375 = (1 – 2*3/ 16 ) = 0.625 = (0 – 3*5/ 16 ) = -0.9375 = (1 – 3*2/ 16) = 0.625

1 32

21ss12B13B 31ss21B 12ss23B 32ss31B32B

13ss23ss

21ss31ss12ss32ss13ss23ss

0.625+0.625 > 0.375+0.375

Page 9: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

9

NEWMAN’S MODULARITY FOR CLUSTERING

Maximizing 0.3 0.1 -0.5

11-1

TuuB

Page 10: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

10

COLLECTIVE CLASSIFICATION(ICA)

Iterative Classification Algorithm(ICA)

-

-

+

?

?

?

+

Content-Onlylearner

?

?

?

?

training

Collectivelearner

Compute neighbor feature using CO

training

Until stable orthreshold of iteration have elapsed

Iteration 1

Iteration 2

Iteration 3

Compute neighbor feature using CC

.

.

.

1 0 0 1 0 … 1 3/5 2/5 ..1 0 0 1 0 … 1

feature Neighbor featureCOCC

Page 11: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

11

CC PROBLEMHow to set threshold?

-

-

+: Labeled node

: Unlabeled node

-

+1

2

Infer neighbor feature :

-

1

2

3

Iteration 1:+ -

2/5 3/5

3/5+

2/5

3 0/1 1/1

Iteration 2: 1

2

3

3/5 2/5

2/5 3/5

1/1 0/1

-

+

Iteration 3: 1

2

3

2/5 3/5

4/5 1/5

0/1 1/1

+

-+

Iteration 4: 1

2

3

3/5 2/5

2/5 3/5

1/1 0/1

-

+-

+

-+

+ -

Iteration 5: 1

2

3

2/5 3/5

4/5 1/5

0/1 1/1

-

+-

Page 12: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

12

ALFNET

1. Cluster data at least k clusters.

2. Pick k clusters based on size and initialize Content-Only(CO) classifier

cluster cluster cluster

… ……

k

COClassifier

SVM

Page 13: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

13

ALFNET

3.while (labeled nodes < budget )3.1 Re-train CO and CC classifier

3.2 pick k cluster based on score :

CO

CC

cluster cluster cluster

… ……

k

Trainingset

train

Page 14: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

14

3.2 pick an item form each cluster based on

CO

CCTraining

settrain

Page 15: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

15

ALFNET

CO CCMain Label

Class A

Class B

Class C

Class D

entropy(1/3) + entropy(1/3) + entropy(1/3) = 0.3662 *3

predicted category

proportion of three classifier predicted

predict

entropy(2/3) + entropy(1/3) = 0.2703 + 0.3662

entropy(3/3) = 0

CO

CC

Main

Page 16: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

16

OUTLINE

Introduction

Active Learning

Networked data

Related Work

Newman’s Modularity

Collective Classification(ICA)

ALFNET

CLAL

Experimental Results

Conclusion

Page 17: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

17

MODULARITY AND SIMILARITY

Node 1Node 2

1 1 0 01 0 0 0

Node 3Node 4

1 1 0 0 0 0 1 1

4

1

44

11

4

0

44

00

44

1

16

1

441

16

1

441

EX:

Page 18: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

18

MAXIMUM Q

Maximizing

Page 19: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

19

CLAL

: Labeled node

: Unlabeled node

trainingCO trainingCO

Query &classify

Query &classify

Until Labeled node > budget

Page 20: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

20

TUNING AND GREEDY MECHANISM

??

?

?

?

??

??

??

?

??

?

: Labeled node

: Unlabeled node

CO

Query &classify

trainingCO

Query &classify

Retrain &

MoveOut-link > In-link

reserve the greater COs

Moving priority:OutLink - Inlink3 -> 2 -> 1 -> 1

Clustering priority :Low accuracy -> High accuracy

MoveOut-link > In-link

CO CO

Page 21: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

21

OUTLINE

Introduction

Active Learning

Networked data

Related Work

Newman’s Modularity

Collective Classification(ICA)

ALFNET

CLAL

Experimental Results

Conclusion

Page 22: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

22

BACKGROUNDNetworked data

Social network

Citation network

word

Paper NO.

word…

word

nodecite

word

Paper NO.

word…

word

feature

Person name

feature…

feature

feature

Person name

feature…

feature

node

Attribute

Attribute

friend

Page 23: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

23

OUTLINE

Introduction

Active Learning

Networked data

Related Work

Newman’s Modularity

Collective Classification(ICA)

ALFNET

CLAL

Experimental Results

Conclusion

Page 24: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

24

APPENDIX

Page 25: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

25

SVMTraining data sets :

+

+

++

+

+

Margin

-

-

-

-

-

+

+

++

+

+

Margin

-

-

-

-

-

Hyper-plan

1,1,...,2,1,, , id

ini yRxiyxi ,

Page 26: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

26

CHALLENGE

Query efficiency from discriminative feature

Paper name word

word …

word

510Sum of 2 class

word …word

Class 1

Class 2

400 250

word

word

word

word

250

260 180

220 100

150

Paper name

Paper name

Page 27: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

27

CC PROBLEM :HOW TO SET TERMINAL CONDITION? Different iteration will obtain diverse result.

: CO predicted label : true labeled : labeled

Infer neighbor feature

Neighbor feature

NF_A NF_B

3/5 2/5

BB

BB

AA

A

AA

Local feature

0,1,0,…

F1,F2,…

A

BB

2/3 1/3

A

1/3 2/3

B

AA

Iteration 1Iteration 2

4/5 1/52/3 1/32/3 1/3

A

A

A

CC classifier

Page 28: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

28

ALFNETQuery and training CO

Query and training classifier

Compute

Compute

Iteration > ?

Ni

NiN

Y

Labeled node

>Budget?

Y

N

Output

Page 29: C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

29

REPRESENTATION AND CHALLENGE

In a citation network

node

nodenode node

node

How to use link information