Top Banner
Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing
18

Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.

Mar 28, 2015

Download

Documents

Matthew Flynn
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.

Index Driven Selective Sampling for CBR

Nirmalie Wiratunga Susan Craw Stewart Massie

THEROBERT GORDON

UNIVERSITYABERDEEN

School of Computing

Page 2: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.

Overview

Selective sampling

Cluster creation using an index

Cluster and case utility scores

Evaluation

Page 3: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.

Selective Sampling

selected cases

labelled cases

select interesting cases

unlabelled cases(pool)

Index

case-base•Relevance feedback•Distance learning•Patient monitoring

Page 4: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.

Uncertainty and Representativeness

+ -? ?

+ -?

?

??

??

Page 5: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.

Sampling Procedure

L = set of labelled casesU = set of unlabelled casesLOOP

model <= create-domain-model (L)clusters <= create-clusters(model, L, U)k-clusters <= select-clusters(k, clusters, L, U)FOR 1 to Max-Batch-Size

case <= select-case(k-clusters, L, U)L <= L U get-label(case, oracle)U <= L \ case

UNTIL stopping-criterion

Page 6: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.

Overview

Selective sampling

Cluster creation using an index

Cluster and case utility scores

Evaluation

Page 7: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.

Forming Clusters

5 labelled(4X, 1Y)

6 unlabelled

0 labelled 6 unlabelled

f35 labelled

(2X, 2Z, 1Y) 0 unlabelled

< N >= N

5 labelled(2X, 2Y, 1Z) 6 unlabelled

f1

f2

a b

d e

5 labelled(4Y, 1Z)

0 unlabelled

c

Page 8: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.

Analysing Clusters

X

X X

Y

X

Y

X X

Y

Z

Z

Y Y

Y

YZ

X X

Y

Z

Page 9: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.

Overview

Selective sampling

Cluster creation

Cluster and case utility scores

Evaluation

Page 10: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.

Ranking Clusters - Cluster Utility Score

Page 11: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.

Ranking Cases - Case Utility Score

Page 12: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.

Overview

Selective sampling

Cluster creation

Cluster and case utility scores

Evaluation

Page 13: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.

Evaluation

Selection Heuristics Rnd : randomly select cluster and cases Rnd-Cluster : random cluster with highest ranked cases Rnd-Case : highest ranked cluster random cases Informed-S : highest ranked cluster and cases Informed-M : highest ranked clusters and case

UCI ML (6 datasets) smaller data sets (Zoo, Iris, Lymph, Hep) medium data sets (house votes, breast cancer)

Page 14: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.

Experimental Design

Index

case-base

sampling pool

Inc 2Inc 3Inc 4Inc 5Inc

test set

case base size = L + selected cases

selected cases = sampling iterations * Max-Batch-Size

kNNaccuracy

Page 15: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.

Results I

70

75

80

85

90

50 75 100 125 150

Zoo: Sampling Pool Size

Acc

urac

y on

Tes

t Set

80

85

90

95

50 75 100 125 150

Iris: Sampling Pool Size

Acuu

racy

on

Test

Set

Rnd Rnd-cluster Rnd-case Informed-M Informed-S

Zoo (7C, 18F, A, P9) Iris (3C, 4F, #+A, P3)

Page 16: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.

Results II

65

70

75

80

50 75 100 125 150

Lymphography: Sampling Pool Size

Accu

racy

on

Test

Set

80

81

82

83

84

50 75 100 125 150

Hepatitis: Sampling Pool Size

Accu

racy

on

Test

Set

Rnd Rnd-cluster Rnd-case Informed-M Informed-S

Lymphography (4C, 19F, #+A, P9) Hepatitis (2C, 20F, A+?, P7)

Page 17: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.

Results III

80

84

88

92

150 200 250 300 350

House Votes: Sampling Pool Size

Accu

racy

on

Test

Set

62

63

64

65

66

67

68

69

150 200 250 300 350Breast Cancer: Sampling Pool Size

Accu

racy

on

Test

Set

Rnd Rnd-cluster Rnd-case Informed-M Informed-S

House (2C, 16F, A+?, P3 ) Breast (2C, 9F, A+?, P7)

Page 18: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.

Conclusions

Developed a case selection mechanism exploiting case base partitions

Utility Scores to rank clusters and cases ClUS captures uncertainty within clusters and uses

entropy to further weight this score CaUS captures the impact on other cases

Significant improvement with informed selection on 6 data sets

The influence of votes, partitions and entropy needs further investigation