Top Banner
Contrast Data Mining: Methods and Applications James Bailey, NICTA Victoria Laboratory and The University of Melbourne Guozhu Dong, Wright State University Presented at the IEEE International Conference on Data Mining (ICDM), October 28-31 2007 An up to date version of this tutorial is available at http://www.csse.unimelb.edu.au/~jbailey/contrast
143

Emerging patterns based classifier

May 10, 2015

Download

Education

Pavan Kumar

It gives several insights for designing of robust,fast,accurate classifiers based on emerging patterns
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Emerging patterns based classifier

Con

tras

t Dat

a M

inin

g: M

etho

ds

and

App

licat

ions

Jam

es B

aile

y, N

ICT

A V

icto

ria L

abor

ator

y an

d T

he U

nive

rsity

of M

elbo

urne

Guo

zhu

Don

g, W

right

Sta

te U

nive

rsity

Pre

sent

ed a

t the

IEE

E In

tern

atio

nal C

onfe

renc

e on

Dat

a M

inin

g (I

CD

M),

Oct

ober

28-

31 2

007

An

up to

dat

e ve

rsio

n of

this

tuto

rial i

s av

aila

ble

at h

ttp://

ww

w.c

sse.

unim

elb.

edu.

au/~

jbai

ley/

cont

rast

Page 2: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g2

Con

tras

t dat

a m

inin

g -

Wha

t is

it ?

Co

ntr

ast

-``

To

com

pare

or

appr

aise

in

resp

ect t

o di

ffere

nces

’’ (M

erria

m W

ebst

er D

ictio

nary

)

Co

ntr

ast

dat

a m

inin

g-

The

min

ing

of

patte

rns

and

mod

els

cont

rast

ing

two

or

mor

e cl

asse

s/co

nditi

ons.

Page 3: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g3

Con

tras

t Dat

a M

inin

g -

Why

?

``S

omet

imes

it’s

goo

d to

con

trast

wha

t you

lik

e w

ith s

omet

hing

els

e. I

t mak

es y

ou

appr

ecia

te it

eve

n m

ore’

’D

arby

Con

ley,

Get

Fuz

zy, 2

001

Page 4: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g4

Wha

t can

be

cont

rast

ed ?

Obj

ects

at d

iffer

ent t

ime

perio

ds

``C

ompa

re IC

DM

pap

ers

publ

ishe

d in

200

6-20

07

vers

us th

ose

in 2

004-

2005

’’

Obj

ects

for

diffe

rent

spa

tiall

ocat

ions

``F

ind

the

dist

ingu

ishi

ng fe

atur

es o

f loc

atio

n x

for

hum

an D

NA

, ver

sus

loca

tion

xfo

r m

ouse

DN

A’’

Obj

ects

acr

oss

diffe

rent

cla

sses

``F

ind

the

diffe

renc

es b

etw

een

peop

le w

ith

brow

n ha

ir, v

ersu

s th

ose

with

blo

nde

hair’

Page 5: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g5

Wha

t can

be

cont

rast

ed ?

Con

t.

Obj

ects

with

ina

clas

s``

With

in th

e ac

adem

ic p

rofe

ssio

n, th

ere

are

few

pe

ople

old

er th

an 8

0’’ (

rarit

y)``

With

in th

e ac

adem

ic p

rofe

ssio

n, th

ere

are

no r

ich

peop

le’’

(hol

es)

``W

ithin

com

pute

r sc

ienc

e, m

ost o

f the

pap

ers

com

e fr

om U

SA

or

Eur

ope’

’ (ab

unda

nce)

Obj

ect p

ositi

ons

in a

ran

king

``F

ind

the

diffe

renc

es b

etw

een

high

and

low

inco

me

earn

ers’

Com

bina

tions

of t

he a

bove

Page 6: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g6

Alte

rnat

ive

nam

es fo

r co

ntra

st d

ata

min

ing

Con

tras

t={c

hang

e, d

iffer

ence

, dis

crim

inat

or,

clas

sific

atio

n ru

le, …

}

Con

tras

t dat

a m

inin

g is

rel

ated

to to

pics

suc

h as

:C

hang

e de

tect

ion,

cla

ss b

ased

ass

ocia

tion

rule

s, c

ontr

ast s

ets,

co

ncep

t drif

t, di

ffere

nce

dete

ctio

n, d

iscr

imin

ativ

e pa

ttern

s,

(dis

)sim

ilarit

y in

dex,

em

ergi

ng p

atte

rns,

gra

dien

t min

ing,

hig

h co

nfid

ence

pat

tern

s, (

in)f

requ

ent p

atte

rns,

top

k pa

ttern

s,…

Page 7: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g7

Cha

ract

eris

tics

of c

ontr

ast d

ata

min

ing

App

lied

to m

ultiv

aria

te d

ata

Obj

ects

may

be

rela

tiona

l, se

quen

tial,

grap

hs, m

odel

s, c

lass

ifier

s, c

ombi

natio

ns

of th

ese

Use

rs m

ay w

ant e

ither

To

find

mul

tiple

cont

rast

s (a

ll, o

r to

p k)

A s

ingl

em

easu

re fo

r co

mpa

rison

•``

The

deg

ree

of d

iffer

ence

bet

wee

n th

e gr

oups

(or

m

odel

s) is

0.7

’’

Page 8: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g8

Con

tras

t cha

ract

eris

tics

Con

t.

Rep

rese

ntat

ion

of c

ontr

asts

is im

port

ant.

N

eeds

to b

eIn

terp

reta

ble,

non

red

unda

nt, p

oten

tially

act

iona

ble,

ex

pres

sive

Tra

ctab

leto

com

pute

Qua

lity

of c

ontr

asts

is a

lso

impo

rtan

t. N

eed

Sta

tistic

al s

igni

fican

ce, w

hich

can

be

mea

sure

d in

m

ultip

le w

ays

Abi

lity

to r

ank

cont

rast

s is

des

irabl

e, e

spec

ially

for

clas

sific

atio

n

Page 9: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g9

How

is c

ontr

ast d

ata

min

ing

used

?

Dom

ain

unde

rsta

ndin

g``

You

ng c

hild

ren

with

dia

bete

s ha

ve a

gre

ater

ris

k of

hos

pita

l ad

mis

sion

, com

pare

d to

the

rest

of t

he p

opul

atio

n

Use

d fo

r bu

ildin

g cl

assi

fiers

Man

y di

ffere

nt te

chni

ques

-to

be

cove

red

late

rA

lso

used

for

wei

ghtin

gan

d ra

nkin

gin

stan

ces

Use

d in

con

stru

ctio

n of

syn

thet

icin

stan

ces

Goo

d fo

r ra

recl

asse

s

Use

d fo

r al

ertin

g, n

otifi

catio

n an

d m

onito

ring

``T

ell m

e w

hen

the

diss

imila

rity

inde

x fa

lls b

elow

0.3

’’

Page 10: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g10

Goa

ls o

f thi

s tu

toria

l

Pro

vide

an

over

view

of c

ontr

ast d

ata

min

ing

Brin

g to

geth

er r

esul

ts fr

om a

num

ber

of

disp

arat

e ar

eas.

Min

ing

for

diffe

rent

type

s of

dat

a•

Rel

atio

nal,

sequ

ence

, gra

ph, m

odel

s, …

Cla

ssifi

catio

nus

ing

disc

rimin

atin

g pa

ttern

s

Page 11: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g11

By

the

end

of th

is tu

toria

l you

will

be

abl

e to

Und

erst

and

som

e pr

inci

pal t

echn

ique

s fo

r re

pres

entin

gco

ntra

sts

and

eval

uatin

gth

eir

qual

ityA

ppre

ciat

e so

me

min

ing

tech

niqu

es fo

r co

ntra

st d

isco

very

U

nder

stan

d te

chni

ques

for

usin

g co

ntra

sts

in c

lass

ifica

tion

Page 12: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g12

Don

’t ha

ve ti

me

to c

over

..

Str

ing

algo

rithm

sC

onne

ctio

ns to

wor

k in

indu

ctiv

e lo

gic

prog

ram

min

gT

ree-

base

d co

ntra

sts

Cha

nges

in d

ata

stre

ams

Fre

quen

t pat

tern

alg

orith

ms

Con

nect

ions

to g

ranu

lar

com

putin

g…

Page 13: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g13

Out

line

of th

e tu

toria

l

Bas

ic n

otio

ns a

nd u

niva

riate

cont

rast

sP

atte

rn a

nd r

ule

base

d co

ntra

sts

Con

tras

t pat

tern

bas

ed c

lass

ifica

tion

Con

tras

ts fo

r ra

re c

lass

dat

aset

sD

ata

cube

con

tras

tsS

eque

nce

base

d co

ntra

sts

Gra

ph b

ased

con

tras

tsM

odel

bas

ed c

ontr

asts

Com

mon

them

es +

ope

n pr

oble

ms

+ s

umm

ary

Page 14: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g14

Bas

ic n

otio

ns a

nd u

niva

riate

case

Fea

ture

sel

ectio

nan

d fe

atur

e si

gnifi

canc

ete

sts

can

be th

ough

t of a

s a

basi

c co

ntra

st d

ata

min

ing

activ

ity.

``T

ell m

e th

e di

scrim

inat

ing

feat

ures

’’ •

Wou

ld li

ke a

sin

gle

qual

itym

easu

re•

Use

ful f

or fe

atur

e ra

nkin

g

Em

phas

is is

less

on

findi

ngth

e co

ntra

st a

nd

mor

e on

eva

luat

ing

its p

ower

Page 15: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g15

Sam

ple

Fea

ture

-Cla

ss D

atas

et

Hap

py ☺

150

9004

3325

4327

9006

1005

ID

…..

……

Hap

py ☺

120

Hap

py ☺

137

Sad

200

Cla

ssH

eigh

t (cm

)

Page 16: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g16

Dis

crim

inat

ive

pow

er

Can

ass

ess

disc

rimin

ativ

e po

wer

of H

eigh

tfe

atur

e by

Info

rmat

ion

mea

sure

s(s

igna

l to

nois

e, in

form

atio

n ga

in r

atio

, …)

Sta

tistic

al te

sts

(t-t

est,

Kol

mog

orov

-Sm

irnov

, Chi

sq

uare

d, W

ilcox

onra

nk s

um, …

). A

sses

sing

w

heth

er

•T

he m

ean

of e

ach

clas

s is

the

sam

e•

The

sam

ples

for

each

cla

ss c

ome

from

the

sam

e di

strib

utio

n•

How

wel

l a d

atas

et fi

ts a

hyp

othe

sis

No

sing

le te

st is

bes

t in

all s

ituat

ions

!

Page 17: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g17

Exa

mpl

e D

iscr

imin

ativ

e P

ower

T

est -

Wilc

oxon

Ran

k S

um

Sup

pose

n1

happ

y, a

nd n

2sa

d in

stan

ces

Sor

t the

inst

ance

s ac

cord

ing

to h

eigh

t val

ue:

h 1<

= h

2<

= h

3<=

… h

n 1+

n 2A

ssig

n a

rank

to e

ach

inst

ance

, ind

icat

ing

how

man

y in

stan

ces

in th

e ot

her

clas

s ar

e le

ss.

For

x in

cla

ss A

For

eac

h cl

ass

Com

pute

the

Ran

ksum

=S

um(r

anks

of a

ll its

inst

ance

s)N

ull H

ypot

hesi

s: T

he in

stan

ces

are

from

the

sam

e di

strib

utio

nC

onsu

lt st

atis

tical

sig

nific

ance

tabl

e to

det

erm

ine

whe

ther

val

ue

of R

anks

umis

sig

nific

ant

Ran

k(x)

=|{

y: c

lass

(y)<

>A

and

hei

ght(

y)<

heig

ht(x

)}|

Page 18: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g18

Ran

k S

um C

alcu

latio

n E

xam

ple

0H

appy

☺12

081

6

1S

ad

15

041

5

1H

appy

☺17

732

1

2S

ad

19

066

0

2S

ad

21

048

13

Hap

py ☺

220

324

Ran

kC

lass

Hei

ght(

cm)

ID Hap

py:R

ankS

um=

3+1+

0=4

Sad

:Ran

kSum

=2+

2+1=

5

Page 19: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g19

Wilc

oxon

Ran

k S

um T

estC

ont.

Non

par

amet

ric (

no n

orm

al d

istr

ibut

ion

assu

mpt

ion)

Req

uire

s an

ord

erin

g on

the

attr

ibut

e va

lues

Sca

led

valu

e of

Ran

ksum

is e

quiv

alen

t to

area

unde

r R

OC

curv

e fo

r us

ing

the

sele

cted

feat

ure

as a

cla

ssifi

erTrue Positive Rate 0 %

100%

Fals

e Po

sitiv

e R

ate

0 %10

0%

Ran

ksum

(n1*

n 2)

Page 20: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g20

Dis

crim

inat

ing

with

attr

ibut

e va

lues

Can

alte

rnat

ivel

y fo

cus

on s

igni

fican

ce o

f at

trib

ute

valu

es, w

ith e

ither

1) F

requ

ency

/infr

eque

ncy

(hig

h/lo

w c

ount

s)F

requ

ent i

n on

e cl

ass

and

infr

eque

nt in

the

othe

r.

•T

here

are

50

happ

y pe

ople

of h

eigh

t 200

cm a

nd o

nly

2 sa

d pe

ople

of h

eigh

t 200

cm

2) R

atio

(hig

h ra

tio o

f sup

port

)A

ppea

rs X

tim

es m

ore

in o

ne c

lass

than

the

othe

r•

The

re a

re 2

5 tim

es m

ore

happ

y pe

ople

of h

eigh

t 200

cm

than

sad

peo

ple

of h

eigh

t 200

cm

Page 21: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g21

Attr

ibut

e/F

eatu

re C

onve

rsio

n

Pos

sibl

e to

form

a n

ew b

inar

y fe

atur

e ba

sed

on a

ttrib

ute

valu

e an

d th

en a

pply

fe

atur

e si

gnifi

canc

e te

sts

Blu

r di

stin

ctio

n be

twee

n at

trib

ute

and

attr

ibut

e va

lue

Hap

py ☺

…N

oY

es

Cla

ss…

200c

m15

0cm

Sad

…Y

esN

o

Page 22: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g22

Dis

crim

inat

ing

Attr

ibut

e V

alue

s in

a

Dat

a S

trea

m

Det

ectin

g ch

ange

s in

attr

ibut

e va

lues

is a

n im

port

ant f

ocus

in d

ata

stre

ams

Ofte

n fo

cus

on u

niva

riate

cont

rast

s fo

r ef

ficie

ncy

reas

ons

Fin

ding

whe

nch

ange

occ

urs

(non

sta

tiona

ry

stre

am).

F

indi

ng th

e m

agni

tude

of th

e ch

ange

. E.g

. How

big

is

the

dist

ance

bet

wee

n tw

o sa

mpl

es o

f the

str

eam

?U

sefu

l for

sig

nalin

g ne

cess

ityfo

r m

odel

upd

ate

or a

n im

pend

ing

faul

t or

criti

cal e

vent

Page 23: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g23

Odd

s ra

tio a

nd R

isk

ratio

Can

be

used

for

com

parin

g or

mea

surin

g ef

fect

siz

eU

sefu

l for

bin

ary

data

Wel

l kno

wn

in c

linic

al c

onte

xts

Can

als

o be

use

d fo

r qu

ality

eva

luat

ion

of

mul

tivar

iate

con

tras

ts (

will

see

late

r)A

sim

ple

exam

ple

give

n ne

xt

Page 24: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g24

Odd

s an

d ris

k ra

tio C

ont.

4321ID

……

No

Mal

e

No

Fem

ale

Yes

Mal

e

Exp

osed

(e

vent

)G

ende

r (f

eatu

re)

Page 25: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g25

Odd

s R

atio

Exa

mpl

e

Sup

pose

we

have

100

men

and

100

wom

en,

and

70 m

en a

nd 1

0 w

omen

hav

e be

en e

xpos

edO

dds

of e

xpos

ure(

mal

e)=

0.7/

0.3=

2.33

Odd

s of

exp

osur

e(fe

mal

e)=

0.1/

0.9=

0.11

Odd

s ra

tio=

2.33

/.11=

21.2

Mal

es h

ave

21.2

tim

es th

e od

ds o

f exp

osur

e th

an fe

mal

esIn

dica

tes

expo

sure

is m

uch

mor

e lik

ely

for

mal

es th

an fo

r fe

mal

es

Page 26: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g26

Rel

ativ

e R

isk

Exa

mpl

e

Sup

pose

we

have

100

men

and

100

wom

en,

and

70 m

en a

nd 1

0 w

omen

hav

e be

en e

xpos

edR

elat

ive

risk

of e

xpos

ure

(mal

e)=

70/1

00=

0.7

Rel

ativ

e ris

k of

exp

osur

e(fe

mal

e)=

10/1

00=

0.1

The

rel

ativ

e ris

k=0.

7/0.

1=7

Men

7 ti

mes

mor

e lik

ely

to b

e ex

pose

d th

an

wom

en

Page 27: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g27

Pat

tern

/Rul

e B

ased

Con

tras

ts

Ove

rvie

w o

f ``r

elat

iona

l’’ c

ontr

ast p

atte

rn m

inin

g E

mer

ging

pat

tern

s an

d m

inin

gJu

mpi

ng e

mer

ging

pat

tern

s C

ompu

tatio

nal c

ompl

exity

B

orde

r di

ffere

ntia

l alg

orith

m•

Gen

e cl

ub +

bor

der

diffe

rent

ial

•In

crem

enta

l min

ing

Tre

e ba

sed

algo

rithm

Pro

ject

ion

base

d al

gorit

hmZ

BD

D b

ased

alg

orith

m

Bio

info

rmat

icap

plic

atio

n: c

ance

r st

udy

on m

icro

arra

yge

ne e

xpre

ssio

n da

ta

Page 28: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g28

Ove

rvie

w

Cla

ss b

ased

ass

ocia

tion

rule

s (C

aiet

al 9

0, L

iu e

t al 9

8, ..

.)

Ver

sion

spa

ces

(Mitc

hell

77)

Em

ergi

ng p

atte

rns

(Don

g+Li

99)

–m

any

algo

rithm

s (la

ter)

Con

tras

t set

min

ing

(Bay

+P

azza

ni99

, Web

b et

al 0

3)

Odd

s ra

tio r

ules

& d

elta

dis

crim

inat

ive

EP

(Li e

t al 0

5, L

i et

al 0

7)

MD

L ba

sed

cont

rast

(Sie

bes,

KD

D07

)

Usi

ng s

tatis

tical

mea

sure

s to

eva

luat

e gr

oup

diffe

renc

es

(Hild

erm

an+

Pec

kman

05, W

ebb

07)

Spa

tial c

ontr

ast p

atte

rns

(Aru

nasa

lam

et a

l 05)

……

see

ref

eren

ces

Page 29: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g29

Cla

ssifi

catio

n/A

ssoc

iatio

n R

ules

Cla

ssifi

catio

n ru

les

--sp

ecia

l ass

ocia

tion

rule

s (w

ith ju

st o

ne it

em –

clas

s --

on R

HS

):X

C

(s,

c)

•X

is a

pat

tern

, •

C is

a c

lass

,

•s

is s

uppo

rt,

•c

is c

onfid

ence

Page 30: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g30

Ver

sion

Spa

ce (

Mitc

hell)

Ver

sion

spa

ce: t

he s

et o

f all

patte

rns

cons

iste

nt w

ith

give

n (D

+,D

-) –

patte

rns

sepa

ratin

g D

+, D

-.T

he s

pace

is d

elim

ited

by a

spe

cific

& a

gen

eral

boun

dary

. U

sefu

l for

sea

rchi

ng th

e tr

ue h

ypot

hesi

s, w

hich

lies

som

ewhe

re

b/w

the

two

boun

darie

s.A

ddin

g +

veex

ampl

es to

D+

mak

es th

e sp

ecifi

c bo

unda

ry m

ore

gene

ral;

addi

ng -

veex

ampl

es to

D-

mak

es th

e ge

nera

l bo

unda

ry m

ore

spec

ific.

Com

mon

pat

tern

/hyp

othe

sis

lang

uage

ope

rato

rs:

conj

unct

ion,

dis

junc

tion

Pat

tern

s/hy

poth

eses

are

cris

p; n

eed

to b

e ge

nera

lized

to

dea

l with

per

cent

ages

; har

d to

dea

l with

noi

se in

dat

a

Page 31: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g31

ST

UC

CO

, MA

GN

UM

OP

US

for

cont

rast

pa

ttern

min

ing

ST

UC

CO

(B

ay+

Paz

zani

99)

Min

ing

cont

rast

pat

tern

s X

(ca

lled

cont

rast

set

s) b

etw

een

k>=

2 gr

oups

: |su

ppi(X

) –

supp

j(X)|

>=

min

Diff

Use

Chi

2 to

mea

sure

sta

tistic

al s

igni

fican

ce o

f con

tras

t pat

tern

s•

sign

ifica

nce

cut-

off t

hres

hold

s ch

ange

, bas

ed o

n th

e le

vel o

f the

no

de a

nd th

e lo

cal n

umbe

r of

con

tras

t pat

tern

s M

ax-M

iner

like

sea

rch

stra

tegy

, plu

s so

me

prun

ing

tech

niqu

es

MA

GN

UM

OP

US

(W

ebb

01)

An

asso

ciat

ion

rule

min

ing

met

hod,

usi

ng M

ax-M

iner

like

ap

proa

ch (

prop

osed

bef

ore,

and

inde

pend

ently

of,

Max

-Min

er)

Can

min

e co

ntra

st p

atte

rns

(by

limiti

ng R

HS

to a

cla

ss)

Page 32: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g32

Con

tras

t pat

tern

s vs

deci

sion

tree

ba

sed

rule

s

It ha

s be

en r

ecog

nize

d by

sev

eral

aut

hors

(e.

g.

Bay

+P

azza

ni99

) th

at

rule

s ge

nera

tion

from

dec

isio

n tr

ees

can

be g

ood

cont

rast

pat

tern

s,

but m

ay m

iss

man

y go

od c

ontr

ast p

atte

rns.

Diff

eren

t con

tras

t set

min

ing

algo

rithm

s ha

ve

diffe

rent

thre

shol

dsS

ome

have

min

sup

port

thre

shol

dS

ome

have

no

min

sup

port

thre

shol

d; lo

w s

uppo

rt

patte

rns

may

be

usef

ul fo

r cl

assi

ficat

ion

etc

Page 33: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g33

Em

ergi

ng P

atte

rns

Em

ergi

ng P

atte

rns

(EP

s) a

re c

ontr

ast p

atte

rns

betw

een

two

clas

ses

of d

ata

who

se s

uppo

rt c

hang

es s

igni

fican

tly b

etw

een

the

two

clas

ses.

Cha

nge

sign

ifica

nce

can

be d

efin

ed b

y:

If su

pp2(

X)/

supp

1(X

) =

infin

ity, t

hen

X is

a ju

mpi

ng E

P.

jum

ping

EP

occ

urs

in s

ome

mem

bers

of o

ne c

lass

but

nev

er

occu

rs in

the

othe

r cl

ass.

Con

junc

tive

lang

uage

; ext

ensi

on to

dis

junc

tive

EP

late

r

sim

ilar

to R

iskR

atio

; +:

allo

win

g pa

ttern

s w

ith

smal

l ove

rall

supp

ort

big

supp

ort r

atio

:su

pp2(

X)/

supp

1(X

) >

= m

inR

atio

big

supp

ort d

iffer

ence

:|s

upp2

(X)

–su

pp1(

X)|

>=

min

Diff

(as

defin

ed b

y B

ay+P

azza

ni99

)

Page 34: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g34

A ty

pica

l EP

in th

e M

ushr

oom

dat

aset

The

Mus

hroo

m d

atas

et c

onta

ins

two

clas

ses:

edi

ble

and

pois

onou

s.E

ach

data

tupl

eha

s se

vera

l fea

ture

s su

ch a

s: o

dor,

rin

g-nu

mbe

r, s

talk

-sur

face

-bel

low

-rin

g, e

tc.

Con

side

r th

e pa

ttern

{o

dor

= n

one,

st

alk-

surf

ace-

belo

w-r

ing

= s

moo

th,

ring-

num

ber

= o

ne}

Its s

uppo

rt in

crea

ses

from

0.2

% in

the

pois

onou

s cl

ass

to

57.6

% in

the

edib

le c

lass

(a

grow

th r

ate

of 2

88).

Page 35: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g35

Exa

mpl

e E

P in

mic

roar

ray

data

for

canc

er

Nor

mal

Tis

sues

Can

cer

Tis

sues

Jum

ping

EP

: Pat

tern

s w

/ hig

h su

ppor

t rat

io b

/w d

ata

clas

ses

E.G

. {g1

=L,

g2=

H,g

3=L}

; sup

pN=

50%

, sup

pC=

0

LH

HL

HL

LH

LL

HL

HL

HL

g4g3

g2g1

LH

HH

HL

LL

HH

HL

HL

HH

g4g3

g2g1

binn

ed

data

Page 36: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g36

Top

sup

port

min

imal

jum

ping

EP

s fo

r co

lon

canc

er

Col

on C

ance

r E

Ps

{1+

4-

112+

113

+}

100%

{1+

4-

113+

116

+}

100%

{1+

4-

113+

221

+}

100%

{1+

4-

113+

696

+}

100%

{1+

108

-11

2+ 1

13+

} 10

0%{1

+ 1

08-

113+

116

+}

100%

{4-

108-

112+

113

+}

100%

{4-

109+

113

+ 7

00+

} 10

0%{4

-11

0+ 1

12+

113

+}

100%

{4-

112+

113

+ 7

00+

} 10

0%{4

-11

3+ 1

17+

700

+}

100%

{1+

6+

8-

700+

} 97

.5%

Col

on N

orm

al E

Ps

{12-

21-

35+

40+

137

+ 2

54+

} 10

0%{1

2-35

+ 4

0+ 7

1-13

7+ 2

54+

} 10

0%{2

0-21

-35

+ 1

37+

254

+}

100%

{20-

35+

71-

137+

254

+}

100%

{5-

35+

137

+ 1

77+

} 95

.5%

{5-

35+

137

+ 2

54+

} 95

.5%

{5-

35+

137

+ 4

19-}

95.

5%{5

-13

7+ 1

77+

309

+}

95.5

%{5

-13

7+ 2

54+

309

+}

95.5

%{7

-21

-33

+ 3

5+ 6

9+}

95.5

%{7

-21

-33

+ 6

9+ 3

09+

} 95

.5%

{7-

21-

33+

69+

126

1+}

95.5

%

EP

s fr

om

Mao

+D

ong

2005

(g

ene

club

+

bord

er-d

iff).

Col

on c

ance

r da

tase

t (A

lon

et a

l, 19

99 (

PN

AS

)): 4

0 ca

ncer

tiss

ues,

22

nor

mal

tiss

ues.

200

0 ge

nes

The

se E

Ps

have

95%

--1

00%

sup

port

in o

ne

clas

s bu

t 0%

sup

port

in

the

othe

r cl

ass.

Min

imal

: Eac

h pr

oper

su

bset

occ

urs

in b

oth

clas

ses.

Ver

y fe

w 1

00%

sup

port

EP

s.

The

re a

re ~

1000

item

s w

ith s

upp

>=

80%

.

Page 37: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g37

A p

oten

tial u

se o

f min

imal

jum

ping

EP

sM

inim

al ju

mpi

ng E

Ps

for

norm

altis

sues

Pro

perly

exp

ress

ed g

ene

grou

ps im

port

ant f

or n

orm

al c

ell f

unct

ioni

ng, b

ut

dest

roye

d in

all

colo

n ca

ncer

tiss

ues

Res

tore

thes

e ?c

ure

colo

n ca

ncer

?

Min

imal

jum

ping

EP

s fo

r ca

ncer

tissu

es

Bad

gen

e gr

oups

that

occ

ur in

som

e ca

ncer

tiss

ues

but n

ever

occ

ur in

nor

mal

tissu

es

Dis

rupt

thes

e ?c

ure

colo

n ca

ncer

?

? P

ossi

ble

targ

ets

for

drug

des

ign

?Li

+W

ong

2002

pro

pose

d “g

ene

ther

apy

usin

g E

P”

idea

: the

rapy

aim

s to

des

troy

ba

d JE

P &

res

tore

goo

d JE

P

Page 38: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g38

Use

fuln

ess

of E

mer

ging

Pat

tern

sE

Ps

are

usef

ul

for

build

ing

high

ly a

ccur

ate

and

robu

st c

lass

ifier

s, a

nd fo

r im

prov

ing

othe

r ty

pes

of c

lass

ifier

s fo

r di

scov

erin

g po

wer

ful d

istin

guis

hing

feat

ures

bet

wee

n da

tase

ts.

Like

oth

er p

atte

rns

com

pose

d of

con

junc

tive

com

bina

tion

of e

lem

ents

, EP

s ar

e ea

sy fo

r pe

ople

to u

nder

stan

d an

d us

e di

rect

ly.

EP

s ca

n al

so c

aptu

re p

atte

rns

abou

t cha

nge

over

tim

e.

Pap

ers

usin

g E

P te

chni

ques

in C

ance

r C

ell (

cove

r, 3

/02)

.E

mer

ging

Pat

tern

s ha

ve b

een

appl

ied

in m

edic

al a

pplic

atio

ns fo

r

diag

nosi

ng a

cute

Lym

phob

last

icLe

ukem

ia.

Page 39: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g39

The

land

scap

e of

EP

s on

the

supp

ort p

lane

, an

d ch

alle

nges

for

min

ing

O1

1

Sup

D2

(X)

Sup D1 (X)

C BA

•E

P m

inR

atio

cons

trai

nt is

ne

ither

mon

oton

ic n

or a

nti-

mon

oton

ic (

but e

xcep

tions

ex

ist f

or s

peci

al c

ases

)•

Req

uire

s sm

alle

r su

ppor

t th

resh

olds

than

thos

e us

ed

for

freq

uent

pat

tern

min

ing

Land

scap

e of

EP

sC

halle

nges

for

EP

m

inin

g

Page 40: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g40

Odd

s R

atio

and

Rel

ativ

e R

isk

Pat

tern

s [L

i and

Won

g P

OD

S06

]

May

use

odd

s ra

tio/r

elat

ive

risk

to

eval

uate

com

poun

d fa

ctor

s as

wel

lM

ay b

e no

sin

gle

fact

or w

ith h

igh

rela

tive

risk

or o

dds

ratio

, but

a c

ombi

natio

n of

fact

ors

•R

elat

ive

risk

patte

rns

-S

imila

r to

em

ergi

ng

patte

rns

•R

isk

diffe

renc

e pa

ttern

s -

Sim

ilar

to c

ontr

ast s

ets

•O

dds

ratio

pat

tern

s

Page 41: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g41

Min

ing

Pat

tern

s w

ith H

igh

Odd

s R

atio

or

Rel

ativ

e R

isk

Spa

ce o

f odd

s ra

tio p

atte

rns

and

rela

tive

risk

patte

rns

are

not c

onve

x in

gen

eral

Can

bec

ome

conv

ex, i

f str

atifi

ed in

to

plat

eaus

, bas

ed o

n su

ppor

t lev

els

Page 42: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g42

EP

Min

ing

Alg

orith

ms

Com

plex

ity r

esul

t (W

ang

et a

l 05)

Bor

der-

diffe

rent

ial a

lgor

ithm

(D

ong+

Li 9

9)G

ene

club

+ b

orde

r di

ffere

ntia

l (M

ao+

Don

g 05

)C

onst

rain

t-ba

sed

appr

oach

(Z

hang

et a

l 00)

Tre

e-ba

sed

appr

oach

(B

aile

y et

al 0

2,

Fan

+K

otag

iri02

)P

roje

ctio

n ba

sed

algo

rithm

(B

aile

y el

al 0

3)Z

BD

D b

ased

met

hod

(Loe

kito

+B

aile

y06

).

Page 43: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g43

Com

plex

ity r

esul

t

The

com

plex

ity o

f fin

ding

em

ergi

ng

patte

rns

(eve

n th

ose

with

the

high

est

freq

uenc

y) is

MA

X S

NP

-har

d.

Thi

s im

plie

s th

at p

olyn

omia

l tim

e ap

prox

imat

ion

sche

mes

do

not e

xist

for

the

prob

lem

unl

ess

P=

NP

.

Page 44: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g44

Bor

ders

are

con

cise

rep

rese

ntat

ions

of

conv

ex c

olle

ctio

ns o

f ite

mse

ts

< m

inB

={1

2,13

}, m

axB

={1

2345

,124

56}>

123,

123

412

124,

123

5

123

45

125,

124

5

124

56

126,

124

6

1313

4, 1

256

135,

134

5

A c

olle

ctio

n S

is c

onve

x:

If fo

r al

l X,Y

,Z (

X in

S, Y

in

S, X

sub

set Z

sub

set

Y)

Z in

S.

Page 45: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g45

Bor

der-

Diff

eren

tial A

lgor

ithm

<{{

}},{

1234

}> -

<{{}

},{2

3,24

,34}

>=

<{1

,234

},{1

234}

>{}{}

1,,

22, , 3,

43,

412

, 13,

14

,

, 2

3, 2

423

, 24

, , 3434

123,

124

, 134

,234

1234

Goo

d fo

r: J

umpi

ng E

Ps;

EP

s in

“re

ctan

gle

regi

ons,

” …

Alg

orith

m:

•U

se ite

rations

of

expan

sion &

m

inim

izat

ion o

f “p

roduct

s” o

f diffe

rence

s

•U

se t

ree

to s

pee

d

up m

inim

izat

ion

•F

ind

min

imal

sub

sets

of 1

234

that

are

not

sub

sets

of 2

3, 2

4, 3

4.

•{1

,234

} =

min

({1

,4}

X {

1,3}

X {

1,2}

)

Itera

tive

expa

nsio

n &

min

imiz

atio

n ca

n be

vi

ewed

as

optim

ized

Ber

ge h

yper

grap

htr

ansv

ersa

l alg

orith

m

Page 46: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g46

Gen

e cl

ub +

Bor

der

Diff

eren

tial

Bor

der-

diffe

rent

ial c

an h

andl

e up

to 7

5 at

trib

utes

(us

ing

2003

PC

)F

or m

icro

arra

yge

ne e

xpre

ssio

n da

ta, t

here

are

th

ousa

nds

of g

enes

. (M

ao+

Don

g 05

) us

ed b

orde

r-di

ffere

ntia

l afte

r fin

ding

m

any

gene

clu

bs -

-on

e ge

ne c

lub

per

gene

.A

gen

e cl

ub is

a s

et o

f k g

enes

str

ongl

y co

rrel

ated

with

a

give

n ge

ne a

nd th

e cl

asse

s.

Som

e E

Ps

disc

over

ed u

sing

this

met

hod

wer

e sh

own

earli

er. D

isco

vere

d m

ore

EP

s w

ith n

ear

100%

sup

port

in

canc

er o

r no

rmal

, inv

olvi

ng m

any

diffe

rent

gen

es. M

uch

bette

r th

an e

arlie

r re

sults

.

Page 47: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g47

Tre

e-ba

sed

algo

rithm

for

JEP

min

ing

Use

tree

to c

ompr

ess

data

and

pat

tern

s.T

ree

is s

imila

r to

FP

tree

, but

it s

tore

s tw

o co

unts

per

no

de (

one

per

clas

s)an

d us

es d

iffer

ent i

tem

ord

erin

gN

odes

with

non

-zer

o su

ppor

t for

pos

itive

cla

ss a

nd z

ero

supp

ort f

or n

egat

ive

clas

s ar

e ca

lled

base

nod

es.

For

eve

ry b

ase

node

, the

pat

h’s

item

seti

s a

pote

ntia

l JE

P. G

athe

r ne

gativ

e da

ta c

onta

inin

g ro

ot it

em a

nd it

em

for

base

d no

des

on th

e pa

th. C

all b

orde

r di

ffere

ntia

l.Ite

m o

rder

ing

is im

port

ant.

Hyb

rid (

supp

ort r

atio

or

derin

g fir

st fo

r a

perc

enta

ge o

f ite

ms,

freq

uenc

y or

derin

g fo

r ot

her

item

s) is

bes

t.

Page 48: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g48

Pro

ject

ion

base

d al

gorit

hmF

orm

dat

aset

H c

onta

inin

g th

e di

ffere

nces

{p

-ni|

i=1…

k}.

p is

a p

ositi

ve tr

ansa

ctio

n, n

1, …

, nk

are

nega

tive

tran

sact

ions

.F

ind

min

imal

tran

sver

als

of h

yper

grap

hH

. i.e

. The

sm

alle

st s

ets

inte

rsec

ting

ever

y ed

ge (

equi

vale

nt to

th

e sm

alle

st s

ubse

ts o

f p n

ot c

onta

ined

in a

ny n

i).Le

t x1<

…<

xmbe

incr

easi

ng it

em fr

eque

ncy

(in

H)

orde

ring.

For

i=1

to m

le

t Hxi

be H

with

all

item

s y

> x

i pro

ject

ed o

ut &

al

l tra

nsac

tions

con

tain

ing

xi r

emov

ed (

data

pr

ojec

tion)

.re

mov

e no

n m

inim

al tr

ansa

ctio

ns in

Hxi

.if

Hxi

is s

mal

l, ap

ply

bord

er d

iffer

entia

l O

ther

wis

e, a

pply

the

algo

rithm

on

Hxi

.

Let H

be:

a b

c d

(edg

e 1)

b e

d

(edg

e 2)

b c

e

(edg

e 3)

c d

e

(edg

e 4)

Item

ord

erin

g:

a <

b <

c <

d <

e

Ha

is H

with

all

item

s >

a (

red

item

s)pr

ojec

ted

out

and

also

edg

e w

ith a

rem

oved

, so

Ha=

{}.

Page 49: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g49

ZB

DD

bas

ed a

lgor

ithm

to

min

e di

sjun

ctiv

e em

ergi

ng p

atte

rns

Dis

jun

ctiv

e E

mer

gin

g P

atte

rns:

allo

win

gdi

sjun

ctio

n as

wel

l as

conj

unct

ion

of

sim

ple

attr

ibut

e co

nditi

ons.

e.g

. Pre

cip

itat

ion

=(

gt-n

orm

OR

lt-no

rm)

AN

D

In

tern

al d

isco

lora

tio

n =

( br

own

OR

bla

ck )

Gen

eral

izat

ion

of E

Ps

ZB

DD

bas

ed a

lgor

ithm

use

s Z

ero

Sup

pres

sed

Bin

ary

Dec

isio

n D

iagr

am fo

r ef

ficie

ntly

min

ing

disj

unct

ive

EP

s.

Page 50: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g50

Pop

ular

in b

oole

anS

AT

sol

vers

and

rel

iabi

lity

eng.

Can

onic

al D

AG

rep

rese

ntat

ions

of b

oole

anfo

rmul

ae

No

de

shar

ing

: ide

ntic

al n

odes

are

sha

red

Cac

hin

g p

rin

cip

le: p

ast c

ompu

tatio

n re

sults

are

aut

omat

ical

ly s

tore

d an

d ca

n be

ret

rieve

dE

ffici

ent B

DD

impl

emen

tatio

ns a

vaila

ble,

e.g

. CU

DD

(U

of C

olor

ado)

Bin

ary

Dec

isio

n D

iagr

ams

(BD

Ds)

c ad

10

root

f = (c

Λa)

v (d

Λa)

c

ad

10

a1

0

0

10

dotte

d (o

r 0)

edg

e: d

on’t

link

the

node

s (in

form

ulae

)

Page 51: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g51

ZB

DD

Rep

rese

ntat

ion

of It

emse

ts

Zer

o-s

up

pre

ssed

BD

D, Z

BD

D: A

BD

D v

aria

nt fo

r m

anip

ulat

ion

of it

em

com

bina

tions

E.g

. Bui

ldin

g a

ZB

DD

for

{{a,b,c,e}

,{a,b,d,e

},{b,c,d}

}

Ord

erin

g : c

<d

<a

< e

< b

c a e b

10

d a e b

10

c a e b

10

d

={{a,b,c,e}

}{{a,b,d,e}

}{{a,b,c,e},{a,b,d,e

}}U

z{{b,c,d}

}U

z=

{{a,b,c,e}

,{a,b,d,e

},{b,c,d

}} c dd a

e

b

10

c d b

10

Uz

Uz=

ZBD

D s

et-u

nion

Uz

==

Page 52: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g52

ZB

DD

bas

ed m

inin

g ex

ampl

eU

se s

olid

pat

hs in

ZB

DD

(Dn)

to g

ener

ate

cand

idat

es, a

nd u

se B

itmap

of

Dp

to c

heck

freq

uenc

y su

ppor

t in

Dp.

c

d ee

f

g

1

d b f hac d e bZB

DD

(Dn)

Bitm

apa

b c

d e

f g h

iP

1: 1

0 0

0 1

0 1

0 0

P2:

1 0

0 1

0 0

0 0

1P

3: 0

1 0

0 0

1 0

1 0

P4:

0 0

1 0

1 0

0 1

0

N1:

1 0

0 0

0 1

1 0

0N

2: 0

1 0

1 0

0 0

1 0

N3:

0 1

0 0

0 1

0 1

0N

4: 0

0 1

0 1

0 1

0 0

Dp= Dn=

Ord

erin

g: a

<c<d

<e<b

<f<g

<h

hf

bi

da

ge

a

eA2

hA3

cA1

hf

bh

db

gf

a

eA2

gA3

cA1

Dp

Dn

Page 53: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g53

Con

tras

t pat

tern

bas

ed c

lass

ifica

tion

--hi

stor

yC

ontr

ast p

atte

rn b

ased

cla

ssifi

catio

n: M

etho

ds to

bui

ld o

r im

prov

e cl

assi

fiers

, usi

ng c

ontr

ast p

atte

rns

CB

A (

Liu

et a

l 98)

CA

EP

(Don

g et

al 9

9)In

stan

ce b

ased

met

hod:

DeE

Ps

(Li e

t al 0

0, 0

4)Ju

mpi

ng E

P b

ased

(Li

et a

l 00)

, Inf

orm

atio

n ba

sed

(Zha

ng e

t al 0

0), B

ayes

ian

base

d (F

an+K

otag

iri03

), im

prov

ing

scor

ing

for

>=

3 cl

asse

s (B

aile

y et

al 0

3)

CM

AR

(Li

et a

l 01)

Top

-ran

ked

EP

bas

ed P

CL

(Li+

Won

g 02

)C

PA

R (

Yin

+H

an 0

3)W

eigh

ted

deci

sion

tree

(Alh

amm

ady+

Kot

agiri

06)

Rar

e cl

ass

clas

sific

atio

n(A

lham

mad

y+K

otag

iri04

)C

onst

ruct

ing

supp

lem

enta

ry tr

aini

ng in

stan

ces

(Alh

amm

ady+

Kot

agiri

05)

Noi

se to

lera

nt c

lass

ifica

tion

(Fan

+Kot

agiri

04)

EP

leng

th b

ased

1-c

lass

cla

ssifi

catio

n of

rar

e ca

ses

(Che

n+D

ong

06)

…M

ost f

ollo

w th

e ag

greg

atin

g ap

proa

ch o

f CA

EP

.

Page 54: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g54

EP

-bas

ed c

lass

ifier

s: r

atio

nale

Con

side

r a

typi

cal E

P in

the

Mus

hroo

m d

atas

et, {

odor

= n

one,

st

alk-

surf

ace-

belo

w-r

ing

= s

moo

th, r

ing-

num

ber

= o

ne};

its s

uppo

rt

incr

ease

s fr

om 0

.2%

from

“po

ison

ous”

to 5

7.6%

in “

edib

le”

(gro

wth

ra

te =

288

).

Str

ong

diffe

rent

iatin

g po

wer

: if a

test

T c

onta

ins

this

EP

, we

can

pred

ict T

as

edib

le w

ith h

igh

conf

iden

ce 9

9.6%

= 5

7.6/

(57.

6+0.

2)A

sin

gle

EP

is u

sual

ly s

harp

in te

lling

the

clas

s of

a s

mal

l fra

ctio

n (e

.g. 3

%)

of a

ll in

stan

ces.

Nee

d to

agg

rega

teth

e po

wer

of m

any

EP

s to

mak

e th

e cl

assi

ficat

ion.

E

P b

ased

cla

ssifi

catio

n m

etho

ds o

ften

out p

erfo

rm s

tate

of t

he a

rt

clas

sifie

rs, i

nclu

ding

C4.

5 an

d S

VM

. The

y ar

e al

so n

oise

tole

rant

.

Page 55: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g55

CA

EP

(C

lass

ifica

tion

by A

ggre

gatin

g E

mer

ging

Pat

tern

s)

The

cont

ribut

ion

of o

ne E

P X

(sup

port

wei

ghte

d co

nfid

ence

):

Giv

en a

test

T a

nd a

set

E(C

i) of

EPs

for c

lass

Ci,

the

aggr

egat

e sc

ore

of T

for C

i is

Giv

en a

test

cas

e T,

obt

ain

T’s

scor

es fo

r eac

h cl

ass,

by

aggr

egat

ing

the

disc

rimin

atin

g po

wer

of E

Ps c

onta

ined

by

T; a

ssig

n th

e cl

ass

with

the

max

imal

sco

re a

s T’

s cl

ass.

The

disc

rimin

atin

g po

wer

of E

Ps a

re e

xpre

ssed

in te

rms

of

supp

orts

and

gro

wth

rat

es. P

refe

rla

rge

supR

atio

, lar

ge s

uppo

rt

For

eac

h cl

ass,

usi

ng m

edia

n (o

r 85

%)

aggr

egat

ed v

alue

to

norm

aliz

e to

avo

id b

ias

tow

ards

cla

ss w

ith m

ore

EP

s

Com

pare

CM

AR

: C

hi2

wei

ghte

d C

hi2

stre

ngth

(X)

= s

up(X

) *

supR

atio

(X)

/ (su

pRat

io(X

)+1)

scor

e(T,

Ci)

stre

ngth

(X)

(ove

r X

of C

imat

chin

g T

)

Page 56: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g56

How

CA

EP

wor

ks?

An

exam

ple

Giv

en a

test

T=

{a,d

,e},

how

to c

lass

ify T

?

b

ed

cb

ea

ed

ca

ed

ba

ec

dc

ba

baC

lass

2 (D

2)

Cla

ss 1

(D1)

●T

cont

ains

EPs

of c

lass

1 :

{a,e

} (50

%:2

5%) a

nd

{d,e

} (50

%:2

5%),

so

Sco

re(T

, cla

ss1)

=

●T

cont

ains

EPs

of c

lass

2: {

a,d}

(25%

:50%

), so

Sc

ore(

T, c

lass

2) =

0.3

3;

●T

will

be c

lass

ified

as

clas

s 1

sinc

e Sc

ore1

>Sco

re2

0.5*

[2/(

2+1)

] + 0

.5*[

2/(2

+1)

] = 0

.67

Page 57: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g57

DeE

Ps

(Dec

isio

n-m

akin

g by

Em

ergi

ng P

atte

rns)

An

inst

ance

bas

ed (

lazy

)le

arni

ng m

etho

d, li

ke k

-NN

; but

doe

s no

t us

e no

rmal

dis

tanc

e m

easu

re.

For

a te

st in

stan

ce T

, DeE

Ps

Firs

t pro

ject

eac

h tr

aini

ng in

stan

ce to

con

tain

onl

y ite

ms

in T

Dis

cove

r E

Ps

from

the

proj

ecte

d da

taT

hen

use

thes

e E

Ps

to s

elec

t tra

inin

g da

ta th

at m

atch

som

e di

scov

ered

E

Ps

Fin

ally

, use

the

prop

ortio

nal s

ize

of m

atch

ing

data

in a

cla

ss C

as T

’s

scor

e fo

r C

Adv

anta

ge: d

isal

low

sim

ilar

EP

s to

giv

e du

plic

ate

vote

s!

Page 58: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g58

DeE

Ps

: Pla

y-G

olf e

xam

ple

(dat

a pr

ojec

tion)

Tes

t =

{su

nn

y, m

ild, h

igh

, tru

e} Out

look

Tem

pera

ture

Hum

idity

Win

dyC

lass

sunn

yhi

ghN

sunn

yhi

ghtru

eN

true

Nsu

nny

mild

high

Nm

ildhi

ghtru

eN

high

Pm

ildhi

ghP

TRU

EP

sunn

yP

mild

Psu

nny

mild

TRU

EP

mild

high

TRU

EP

Out

look

Tem

pera

ture

Hum

idity

Win

dyC

lass

sunn

yho

thi

ghfa

lse

Nsu

nny

hot

high

true

Nra

inco

olno

rmal

true

Nsu

nny

mild

high

fals

eN

rain

mild

high

true

Nov

erca

stho

thi

ghFA

LSE

Pra

inm

ildhi

ghFA

LSE

Pra

inco

olno

rmal

FALS

EP

over

cast

cool

norm

alTR

UE

Psu

nny

cool

norm

alFA

LSE

Pra

inm

ildno

rmal

FALS

EP

sunn

ym

ildno

rmal

TRU

EP

over

cast

mild

high

TRU

EP

over

cast

hot

norm

alFA

LSE

P

Dis

cove

r E

Ps

and

deriv

e sc

ores

usi

ng th

e pr

ojec

ted

data

Orig

inal

dat

aP

roje

cted

dat

a

Page 59: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g59

PC

L (P

redi

ctio

n by

Col

lect

ive

Like

lihoo

d)Le

t X1,

…,X

mbe

the

m (

e.g.

100

0) m

ost g

ener

al E

Ps

in d

esce

ndin

g su

ppor

t ord

er.

Giv

en a

test

cas

e T

, con

side

r th

e lis

t of a

ll E

Ps

that

mat

ch T

. Div

ide

this

list

by

EP

’s c

lass

, and

list

them

in d

esce

ndin

g su

ppor

t ord

er:

P c

lass

: Xi1

, …, X

ip

N c

lass

: Xj1

, …, X

jn

Use

k (

e.g.

15)

top

rank

ed m

atch

ing

EP

s to

get

sco

re fo

r T

for

the

P

clas

s (s

imila

rly fo

r N

):

norm

aliz

ing

fact

or

Sco

re(T

,P)

= Σ

t=1k

supp

P(X

it) /

supp

P(X

t)

Page 60: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g60

Em

ergi

ng p

atte

rn s

elec

tion

fact

ors

The

re a

re m

any

EP

s, c

an’t

use

them

all.

Sho

uld

sele

ct a

nd u

se a

goo

d su

bset

.E

P s

elec

tion

cons

ider

atio

ns in

clud

eK

eep

min

imal

(sh

orte

st, m

ost g

ener

al)

ones

Rem

ove

synt

actic

ally

sim

ilar

ones

Use

sup

port

/gro

wth

rat

e im

prov

emen

t(be

twee

n su

pers

et/s

ubse

t pai

rs)

to p

rune

Use

inst

ance

cov

erag

e/ov

erla

pto

pru

neU

sing

onl

y in

finite

gro

wth

rat

eon

es (

JEP

s)…

Page 61: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g61

Why

EP

-bas

ed c

lass

ifier

s ar

e go

od

Use

the

disc

rimin

atin

g po

wer

of l

ow

supp

ort E

Ps,

toge

ther

with

hi

gh s

uppo

rt o

nes

Use

mu

lti-

feat

ure

cond

ition

s, n

ot ju

st s

ingl

e-fe

atur

e co

nditi

ons

Sel

ect f

rom

larg

er p

oo

lsof

dis

crim

inat

ive

cond

ition

sC

ompa

re: S

earc

h sp

ace

of p

atte

rns

for

deci

sion

tree

s is

lim

ited

by

early

gre

edy

choi

ces.

Ag

gre

gat

e/co

mb

ine

disc

rimin

atin

g po

wer

of a

div

ersi

fied

com

mitt

ee o

f “ex

pert

s” (

EP

s)

Dec

isio

n is

hig

hly

exp

lain

able

Page 62: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g62

Som

e ot

her

wor

ks

CB

A (

Liu

et a

l 98)

use

s on

e ru

le to

mak

e a

clas

sific

atio

n pr

edic

tion

for

a te

stC

MA

R (

Li e

t al 0

1) u

ses

agg

reg

ated

(Ch2

wei

ghte

d)

Chi

2 of

mat

chin

g ru

les

CP

AR

(Y

in+

Han

03)

use

s ag

greg

atio

n by

ave

ragi

ng: i

t us

es th

e av

erag

e ac

cura

cy o

f top

k r

ules

for

each

cla

ss

mat

chin

g a

test

cas

e…

Page 63: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g63

Agg

rega

ting

EP

s/ru

les

vsba

ggin

g (c

lass

ifier

ens

embl

es)

Bag

ging

/ens

embl

es: a

com

mitt

ee o

f cla

ssifi

ers

vote E

ach

clas

sifie

r is

fairl

y ac

cura

te fo

r a

larg

e po

pula

tion

(e.g

. >51

% a

ccur

ate

for

2 cl

asse

s)

Agg

rega

ting

EP

s/ru

les:

mat

chin

g pa

ttern

s/ru

les

vote E

ach

patte

rn/r

ule

is a

ccur

ate

on a

ver

y sm

all

popu

latio

n, b

ut in

accu

rate

if u

sed

as a

cla

ssifi

er o

n al

l dat

a; e

.g. 9

9% a

ccur

ate

on 2

% o

f dat

a, b

ut <

2%

accu

rate

on

all d

ata

Page 64: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g64

Usi

ng c

ontr

asts

for

rare

cla

ss d

ata

[Al H

amm

ady

and

Ram

amoh

anar

ao04

,05,

06]

Rar

e cl

ass

data

is im

port

ant i

n m

any

appl

icat

ions

Intr

usio

n de

tect

ion

(1%

of s

ampl

es a

re

atta

cks)

Fra

ud d

etec

tion

(1%

of s

ampl

es a

re fr

aud)

Cus

tom

er c

lick

thru

s(1

% o

f cus

tom

ers

mak

e a

purc

hase

)…

..

Page 65: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g65

Rar

e C

lass

Dat

aset

s

Due

to th

e cl

ass

imba

lanc

e, c

an

enco

unte

r so

me

prob

lem

sF

ew in

stan

ces

in th

e ra

re c

lass

, diff

icul

t to

trai

n a

clas

sifie

rF

ew c

on

tras

tsfo

r th

e ra

re c

lass

Po

or

qu

alit

yco

ntra

sts

for

the

maj

ority

cla

ss

Nee

d to

eith

er in

crea

se th

e in

stan

ces

in

the

rare

cla

ss o

r ge

nera

te e

xtra

con

tras

tsfo

r it

Page 66: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g66

Syn

thes

isin

gne

w c

ontr

asts

(n

ew e

mer

ging

pat

tern

s)

Syn

thes

isin

gne

w e

mer

ging

pat

tern

s by

su

perp

ositi

onof

hig

h gr

owth

rat

e ite

ms

Sup

pose

that

attr

ibut

e A

2=`a

’ has

hig

h gr

owth

rat

e an

d th

at {

A1=

`x’,

A2=

`y’}

is a

n em

ergi

ng p

atte

rn.

T

hen

crea

te a

new

em

ergi

ng p

atte

rn {

A1=

‘x’,

A2=

‘a’}

and

test

its

qual

ity.

A s

impl

e he

uris

tic, b

ut c

an g

ive

surp

risin

gly

good

cla

ssifi

catio

n pe

rfor

man

ce

Page 67: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g67

Syn

thes

isin

gne

w d

ata

inst

ance

s

Can

als

o us

e pr

evio

usly

foun

d co

ntra

sts

as th

e ba

sis

for

cons

truc

ting

new

rar

e cl

ass

inst

ance

sC

ombi

ne o

verla

ppin

g co

ntra

sts

and

high

gro

wth

rat

e ite

ms

Mai

n id

ea -

inte

rsec

t &

`cr

oss

pro

du

ct’t

he e

mer

ging

pa

ttern

s &

hig

h gr

owth

rat

e (s

uppo

rt r

atio

) ite

ms

Fin

dem

ergi

ng p

atte

rns

Clu

ster

emer

ging

pat

tern

s in

to g

roup

s th

at c

over

all

the

attr

ibut

esC

om

bin

e pa

ttern

s w

ithin

eac

h gr

oup

to fo

rm

inst

ance

s

Page 68: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g68

Syn

thes

isin

gne

w in

stan

ces

E1{

A1=

1, A

2=X

1}, E

2{A

5=Y

1,A

6=2,

A7=

3},

E3{

A2=

X2,

A3=

4,A

5=Y

2} -

this

is a

gro

up

V4

is a

hig

h gr

owth

item

for

A4

Com

bine

E1+

E2+

E3+

{A4=

V4}

to g

et fo

ur s

ynth

etic

inst

ance

s.

A7

A6

A5

A4

A3

A2

A1

32

Y2

V4

4X

21

32

Y1

V4

4X

21

32

Y2

V4

4X

11

32

Y1

V4

4X

11

Page 69: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g69

Mea

surin

g in

stan

ce q

ualit

y us

ing

emer

ging

pat

tern

s [A

l Ham

mad

yan

d R

amam

ohan

arao

07]

Cla

ssifi

ers

usua

lly a

ssum

e th

at d

ata

inst

ance

s ar

e re

late

d to

onl

y a

sing

le c

lass

(cr

isp

assi

gnm

ents

).H

owev

er, r

eal l

ife d

atas

ets

suffe

r fr

om n

oise

.A

lso,

whe

n ex

pert

s as

sign

an

inst

ance

to a

cl

ass,

they

firs

t ass

ign

scor

es to

eac

h cl

ass

and

then

ass

ign

the

clas

s w

ith th

e hi

ghes

t sco

re.

Thu

s, a

n in

stan

ce m

ay in

fact

be

rela

ted

to

seve

ral c

lass

es

Page 70: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g70

Mea

surin

g in

stan

ce q

ualit

y C

ont.

For

eac

h in

stan

ce i,

ass

ign

a w

eigh

t for

its

st

reng

th o

f mem

bers

hip

in e

ach

clas

s.C

an u

se e

mer

ging

pat

tern

s to

det

erm

ine

appr

opria

te w

eigh

ts fo

r in

stan

ces

Use

thes

e w

eigh

ts in

a m

odifi

ed v

ersi

on o

f cl

assi

fier,

e.g

. a d

ecis

ion

tree

Mod

ify in

form

atio

n ga

in c

alcu

latio

n to

take

wei

ghts

in

to a

ccou

nt

Wei

ght(

i) =

agg

rega

tion

of E

Ps

divi

ded

by

mea

n va

lue

for

inst

ance

s in

that

cla

ss

Page 71: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g71

Usi

ng E

Ps

to b

uild

Wei

ghte

d D

ecis

ion

Tre

es

Inst

ead

of c

risp

clas

s m

embe

rshi

p,le

t ins

tanc

es h

ave

wei

ghte

d cl

ass

mem

bers

hip,

th

en b

uild

wei

ghte

d de

cisi

on

tree

s, w

here

pro

babi

litie

s ar

e co

mpu

ted

from

the

wei

ghte

d m

embe

rshi

p.

DeE

Ps

and

othe

r E

P b

ased

cl

assi

fiers

can

be

used

to a

ssig

n w

eigh

ts.

)|

|)

(,..

.,|

|1

)(

()

(1

TWik

Tp

TWi

Tp

TP

Ti

k

Ti

∑∑

∈∧

∈∧

==

=

An

inst

ance

Xi’s

mem

bers

hip

in k

cla

sses

: (W

i1,…

,Wik

)

∑ =

∧∧

∧−

=k j

jj

WDT

Tp

Tp

TP

Info

12

))(

(lo

g*)

())

((

∑ =

∧=

m ll

lWDT

TP

Info

TTT

AInfo

1

))(

(|

||

|)

,(

Page 72: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g72

Mea

surin

g in

stan

ce q

ualit

y by

em

ergi

ng p

atte

rns

Con

t.

Mor

e ef

fect

ive

than

k-N

N te

chni

ques

for

assi

gnin

g w

eigh

tsLe

ss s

ensi

tive

to n

oise

Not

dep

ende

nt o

n di

stan

ce m

etric

Tak

es in

to a

ccou

nt a

ll in

stan

ces,

not

just

cl

ose

neig

hbor

s

Page 73: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g73

Dat

a cu

be b

ased

con

tras

ts(C

ondi

tiona

l Con

tras

ts)

Gra

dien

t (D

ong

et a

l 01)

, cub

egra

de(I

mie

linsk

iet a

l 02

–T

R p

ublis

hed

in 2

000)

:M

inin

g sy

ntac

tical

ly s

imila

r cu

be c

ells

, hav

ing

sign

ifica

ntly

di

ffere

nt m

easu

re v

alue

sS

ynta

ctic

ally

sim

ilar:

anc

esto

r-de

scen

dant

or

sibl

ing-

sibl

ing

pair

Can

be

view

ed a

s “c

on

dit

ion

al c

on

tras

ts”:

two

neig

hbor

ing

patte

rns

with

big

diff

eren

ce in

per

form

ance

/mea

sure

Dat

a cu

bes

usef

ul fo

r an

alyz

ing

mul

ti-di

men

sion

al,

mul

ti-le

vel,

time-

depe

nden

t dat

a.

Gra

dien

t min

ing

usef

ul fo

r M

DM

L an

alys

is in

mar

ketin

g,

busi

ness

dec

isio

ning

, med

ical

/sci

entif

ic s

tudi

es

Page 74: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g74

Dec

isio

n su

ppor

t in

data

cub

esU

sed

for

disc

over

ing

patte

rns

capt

ured

in c

onso

lidat

ed h

isto

rical

da

ta fo

r a

com

pany

/org

aniz

atio

n:

rule

s, a

nom

alie

s, u

nusu

al fa

ctor

com

bina

tions

Foc

us o

n m

odel

ing

& a

naly

sis

of d

ata

for

deci

sion

mak

ers,

not

dai

ly

oper

atio

ns.

Dat

a or

gani

zed

arou

nd m

ajor

sub

ject

s or

fact

ors,

suc

h as

cust

omer

, pro

duct

, tim

e, s

ales

.

Cub

e “c

onta

ins”

hug

e nu

mbe

r of

MD

ML

“seg

men

t” o

r “s

ecto

r”

sum

mar

ies

at d

iffer

ent l

evel

s of

det

ails

Bas

ic O

LAP

ope

ratio

ns: D

rill d

own,

rol

l up,

slic

e an

d di

ce, p

ivot

Page 75: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g75

Dat

a C

ubes

: Bas

e T

able

& H

iera

rchi

esBas

e ta

ble

sto

res

sale

s vo

lum

e (measure

), a

funct

ion o

f pro

duct

, tim

e, &

loc

atio

n (

dim

ensi

ons)

ProductLocatio

n

Tim

eH

iera

rchi

cal s

umm

ariz

atio

n pa

ths

Indu

stry

R

egio

n

Y

ear

Cat

egor

y C

ount

ry Q

uart

er

Prod

uct

C

ity

Mon

th

Wee

k

Off

ice

Day

a bas

e ce

ll

*:

all

(as

top o

f ea

ch d

imen

sion

)

Page 76: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g76

Dat

a C

ubes

: Der

ived

Cel

lsT

ime

Product

Location

sum

sum

TV

VC

RPC

1Qtr

2Qtr

3Qtr

4Qtr

U.S

.A

Can

ada

Mex

ico

sum

Mea

sure

s:

sum

, co

unt,

av

g,

max,

m

in,

std,

Der

ived

cel

ls,

diffe

rent

leve

ls o

f det

ails

(TV,*

,Mex

ico)

Page 77: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g77

Dat

a C

ubes

: Cel

l Lat

tice

(*,*

,*)

(a1,*

,*)

(*,b

1,*

)(a

2,*

,*)

(a1,b

2,*

)(a

1,b

1,*

)(a

2,b

1,*

)… …

(a1,b

2,c

1)

(a1,b

1,c

1)

(a1,b

1,c

2)

Com

pare

: cu

boid

latti

ce

Page 78: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g78

Gra

dien

t min

ing

in d

ata

cube

s

Use

rs w

ant:

mor

e po

wer

ful (

OLA

M)

supp

ort:

Fin

d po

tent

ially

inte

rest

ing

cells

from

the

billi

ons!

O

LAP

ope

ratio

ns u

sed

to h

elp

user

s se

arch

in h

uge

spac

e of

ce

llsU

sers

do:

mou

sing

, eye

-bal

ling,

mem

oing

, dec

isio

ning

, …

Gra

die

nt

min

ing

: Fin

d sy

ntac

tical

ly s

imila

r ce

lls w

ith

sign

ifica

ntly

diff

eren

t mea

sure

val

ues

(tee

n cl

othi

ng,C

alifo

rnia

,200

6), t

otal

-pro

fit=

100K

vs

(tee

n cl

othi

ng,P

ensy

lvan

ia,2

006)

, tot

al p

rofit

= 1

0K

A s

peci

fic O

LAM

task

Page 79: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g79

Live

Set

-Driv

en A

lgor

ithm

for

cons

trai

ned

grad

ient

min

ing

Set

-orie

nted

pro

cess

ing;

trav

erse

the

cube

whi

le c

arry

ing

the

live

seto

f cel

ls h

avin

g po

tent

ial t

o m

atch

des

cend

ants

of t

he c

urre

nt

cell

as g

radi

ent c

ells

A g

radi

ent c

ompa

res

two

cells

; one

is th

e pr

obe

cell,

& th

e ot

her

is a

gr

adie

nt c

ell.

Pro

be c

ells

are

anc

esto

r or

sib

ling

cells

Tra

vers

e th

e ce

ll sp

ace

in a

coa

rse-

to-f

ine

man

ner,

look

ing

for

mat

chab

legr

adie

nt c

ells

with

pot

entia

l to

satis

fy g

radi

ent c

onst

rain

t

Dyn

amic

ally

pru

neth

e liv

e se

t dur

ing

trav

ersa

l

Com

pare

: Naï

ve m

etho

d ch

ecks

eac

h po

ssib

le c

ell p

air

Page 80: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g80

Pru

ning

pro

be c

ells

usi

ng d

imen

sion

m

atch

ing

anal

ysis

Def

n: P

robe

cel

l p=

(a1,

…,a

n) is

mat

chab

lew

ith

grad

ient

cel

l g=

(b1,

…, b

n) if

f

No

solid

-mis

mat

ch, o

r

Onl

y on

e so

lid-m

ism

atch

but

no

*-m

ism

atch

A s

olid

-mis

mat

ch: i

f aj≠

b j+

non

e of

ajor

bjis

*

A *

-mis

mat

ch: i

f aj=

* an

d b j

≠*

Thm

: cel

l p is

mat

chab

lew

ith c

ell g

iffp

may

mak

e a

prob

e-gr

adie

nt p

air

with

som

e

desc

enda

nt o

f g (

usin

g on

ly d

imen

sion

val

ue in

fo)

p=

(00,

Tor,

*,

*)

: 1

sol

idg=

(00,

Chi, *

,PC)

: 1 *

Page 81: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g81

Seq

uenc

e ba

sed

cont

rast

sW

e w

ant t

o co

mpa

re s

eque

nce

data

sets

:bi

oinf

orm

atic

s (D

NA

, pro

tein

), w

eb lo

g, jo

b/w

orkf

low

his

tory

, bo

oks/

docu

men

tse.

g. c

ompa

re p

rote

in fa

mili

es; c

ompa

re b

ible

boo

ks/v

ersi

ons

Seq

uenc

e da

ta a

re v

ery

diffe

rent

from

rel

atio

nal d

ata

orde

r/po

sitio

n m

atte

rsun

boun

ded

num

ber

of “

flexi

ble

dim

ensi

ons”

Seq

uenc

e co

ntra

sts

in te

rms

of 2

type

s of

com

paris

on:

Dat

aset

bas

ed: P

ositi

ve v

sN

egat

ive

•D

istin

guis

hing

seq

uenc

e pa

ttern

s w

ith g

ap c

onst

rain

ts (

Jiet

al 0

5, 0

7)

•E

mer

ging

sub

strin

gs (

Cha

n et

al 0

3)S

ite b

ased

: Nea

r m

arke

r vs

away

from

mar

ker

•M

otifs

May

als

o in

volv

e da

ta c

lass

esR

ough

ly: A

site

is a

pos

ition

in

a s

eque

nce

whe

re a

sp

ecia

l mar

ker/

patte

rn o

ccur

s

Page 82: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g82

Exa

mpl

e se

quen

ce c

ontr

asts

Whe

n co

mpa

ring

the

two

prot

ein

fam

ilies

zf-C

2H2

andzf

-CC

HC, w

e di

scov

ered

a p

rote

in M

DS

CLH

Hap

pear

ing

as a

su

bseq

uenc

e in

141

of196

prot

ein

sequ

ence

s of

zf-C

2H2

but n

ever

app

earin

g in

the 208

sequ

ence

s in

zf-C

CHC.

Whe

n co

mpa

ring

the

first

and

last

boo

ks fr

om th

e B

ible

, w

e fo

und

the

subs

eque

nces

(with

gap

s) “

havi

ng h

orns

”,

“fac

e w

orsh

ip”,

“st

ones

pric

e”an

d “o

rnam

ents

pric

e”ap

pear

mul

tiple

tim

es in

sen

tenc

es in

the

Boo

k of

R

evel

atio

n, b

ut n

ever

in th

e B

ook

of G

enes

is.

Page 83: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g83

Seq

uenc

e an

d se

quen

ce p

atte

rn

occu

rren

ceA

seq

uenc

eS

=e 1e 2e 3…

e nis

an

orde

red

list o

f ite

ms

over

a g

iven

al

phab

et.

E.G

. “AG

CA”

is a

DN

A s

eque

nce

over

the

alph

abet

{A,

C, G

, T}.

“AC

”is

a s

ubse

quen

ce o

f “AGCA

”bu

t not

a s

ubst

ring;

“GC

A”is

a s

ubst

ring

Giv

en s

eque

nce S

and

a su

bseq

uenc

e pa

ttern

S’

, an

occu

rren

ce

of S

’in

Sco

nsis

ts o

f the

pos

ition

s of

the

item

s fr

om S

’in

S.

EG

: con

side

r S

=“AC

ACBC

B”

<1,

5>, <

1,7>

, <3,

5>, <

3,7>

are

occ

urre

nces

of “

AB

”<

1,2,

5>, <

1,2,

7>, <

1,4,

5>, …

are

occ

urre

nces

of “

AC

B”

Page 84: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g84

Max

imum

-gap

con

stra

int s

atis

fact

ion

A (

max

imum

) ga

p co

nstr

aint

: spe

cifie

d by

a p

ositi

ve in

tege

r g.

Giv

en S

& a

n oc

curr

ence

os

= <i 1

, …i m

>, i

fi k

+1–i k

<= g

+1

for

all 1

<=

k <m,

then

os

fulfi

lls th

e g-

gap

cons

trai

nt.

If a

subs

eque

nce S’

has

one

occu

rren

ce fu

lfilli

ng a

gap

con

stra

int,

then

S’

satis

fies

the

gap

cons

trai

nt.

The

<3,

5> o

ccur

renc

e of

“A

B”

in S =

“ACACBC

B”, s

atis

fies

the

max

imum

gap

con

stra

int g

=1.

T

he <

3,4,

5> o

ccur

renc

e of

“A

CB

”in

S =

“ACAC

BCB”

satis

fies

the

max

imum

gap

con

stra

int g

=1.

The

<1,

2,5>

, <1,

4,5>

, <3,

4,5>

occ

urre

nces

of “

AC

B”

in S =

“AC

ACBC

B”sa

tisfy

the

max

imum

gap

con

stra

int g

=2.

One

seq

uenc

e co

ntrib

utes

to a

t mos

t one

to c

ount

.

Page 85: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g85

g-M

DS

Min

ing

Pro

blem

Giv

en tw

o se

ts pos &

neg

of s

eque

nces

, tw

o su

ppor

t th

resh

olds

min

p&

min

n, &

a m

axim

um g

ap g

, a p

atte

rnp

is a

Min

imal

Dis

tingu

ishi

ng S

ubse

quen

cew

ith g

-gap

co

nstr

aint

(g-

MD

S),

if th

ese

cond

ition

s ar

e m

et:

Giv

en pos

,neg

, min

p, m

inn

and g,

the g-

MD

S m

inin

g pr

oble

m is

to fi

nd a

ll th

e g-

MD

Ss.

β

β

β

1. F

requ

ency

con

ditio

n: supp p

os(p

,g)

>=

min

p;

2. In

freq

uenc

y co

nditi

on: s

upp n

eg(p

,g)

<=

min

n;3.

Min

imal

ityco

nditi

on: T

here

is n

o su

bseq

uenc

e of

psa

tisfy

ing

1 &

2.

Page 86: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g86

Exa

mpl

e g-

MD

S

Giv

en m

inp=

1/3,

min

n=0,

g=

1,po

s=

{C

BA

B, A

AC

CB

, BB

AA

C},

neg

= {

BC

AB

,AB

AC

B}

1-M

DS

are

: BB

, CC

, BA

A, C

BA

“ACC

”is

freq

uent

in p

os&

non

-occ

urrin

g in

neg

, but

it is

not

m

inim

al (

its s

ubse

quen

ce “CC

”mee

ts th

e fir

st tw

o co

nditi

ons)

.

Page 87: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g87

g-M

DS

min

ing

: Cha

lleng

es

The

min

sup

port

thre

shol

ds in

min

ing

dist

ingu

ishi

ng

patte

rns

need

to b

e lo

wer

than

thos

e us

ed fo

r m

inin

g fr

eque

nt p

atte

rns.

Min

sup

port

s of

fer

very

wea

k pr

unin

g po

wer

on

the

larg

e se

arch

spa

ce.

Max

imum

gap

con

stra

int i

s ne

ither

mon

oton

e no

r

anti-

mon

oton

e.

Gap

che

ckin

g re

quire

s cl

ever

han

dlin

g.

Page 88: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g88

Con

SG

apM

iner

The

Co

nS

Gap

Min

eral

gorit

hm w

orks

in th

ree

step

s:

1.C

andi

date

Gen

erat

ion:

C

andi

date

s ar

e ge

nera

ted

with

out d

uplic

atio

n. E

ffici

ent

prun

ing

stra

tegi

es a

re e

mpl

oyed

.

2.S

uppo

rt C

alcu

latio

n an

d G

ap C

heck

ing:

F

or e

ach

gene

rate

d ca

ndid

ate c,

supp p

os(c,

g)an

d su

ppneg(c,

g)ar

e ca

lcul

ated

usi

ng b

itset

oper

atio

ns.

3.M

inim

izat

ion:

R

emov

e al

l the

non

-min

imal

pat

tern

s (u

sing

pat

tern

tree

s).

Page 89: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g89

Con

SG

apM

iner

: Can

dida

te G

ener

atio

n

neg

5

pos

3

pos

2

neg

pos

Cla

ss

41

Seq

uen

ceID

{ }

BA

AA

AAA

(0, 0

)AA

B (0

, 1)

AAC

AACA

(0,

0)

AACB

(1,

1)

AACC

(1,

0)

AACB

A (0

, 0)

AACB

B (0

, 0)

AACB

C (0

, 0)

……

C(3

, 2)

(3, 2

)(3

, 2)

(2, 1

)

(2, 1

)

•D

FS

tree

•T

wo

coun

ts p

er n

ode/

patte

rn

•D

on’t

exte

nd p

os-in

freq

uent

pat

tern

s

•A

void

dup

licat

es &

cer

tain

non

-min

imal

g-

MD

S (

e.g.

don

’t ex

tend

g-M

DS

)

CB

AB

AA

CC

B

BB

AA

CB

CA

B

AB

AC

B

Page 90: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g90

Use

Bits

etO

pera

tion

for

Gap

Che

ckin

g

We

enco

de th

e oc

curr

ence

s’en

ding

pos

ition

s in

to a

bits

etan

d us

e a

serie

s of

bitw

ise

oper

atio

ns to

gen

erat

e a

new

ca

ndid

ate

sequ

ence

’s b

itset

.

AT

CG

AG

TA

TC

G

AC

CA

GT

AT

CG

AT

TA

CC

AG

TA

TC

G

AC

TG

TA

TT

AC

CA

GT

AT

CG

Sto

ring

proj

ecte

d su

ffixe

s an

d pe

rfor

min

g sc

ans

is e

xpen

sive

.

e.g.

Giv

en a

seq

uenc

eA

CT

GT

AT

TA

CC

AG

TA

TC

G

to c

heck

whe

ther

AG

is a

su

bseq

uenc

e fo

r g

=1:

Pro

ject

ion

s w

ith

pre

fix

A :

Pro

ject

ion

s w

ith

AG

obt

aine

d fr

om th

e ab

ove:

AG

TA

TC

G

Page 91: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g91

Con

SG

apM

iner

: Sup

port

& G

ap C

heck

ing

(1)

Initi

al B

itset

Arr

ay C

onst

ruct

ion:

For

eac

h ite

m x

, co

nstr

uct a

n ar

ray

of b

itset

sto

des

crib

e w

here

x o

ccur

s in

eac

h se

quen

ce fr

om pos

and neg.

neg

AB

AC

B5

pos

BB

AA

C3

pos

AA

CC

B2

neg

pos

Cla

ss

BC

AB

4

CB

AB

1

Seq

uenc

eID

1010

0

0010

0011

0

1100

0

0010

sing

le-it

em A

Dat

aset

Initi

al B

itset

Arr

ay

Page 92: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g92

EG

: gen

erat

e m

ask

bits

etfo

r X

=“A”

in s

eque

nce

5 (w

ith m

ax g

ap g

=1)

:

neg

5

po

s3

po

s2

neg

po

s

Cla

ss

41

Seq

uen

ceID

CB

AB

AA

CC

BB

BA

AC

BC

AB

AB

AC

B

1 0 1 0 0

> >

0 1 0 1 0

0 1 0 1 0

> >

0 0 1 0 1

OR

0 1 1 1 1

Mas

k bi

tset

forX

:

Mas

k bi

tset

: al

l the

lega

l pos

ition

s in

the

seq

uenc

e at

mos

t (g

+1)

-pos

ition

s aw

ay f

rom

tai

l of

an o

ccur

renc

e of

the

(m

axim

um p

refix

of

the)

pat

tern

.

Tw

o st

eps:

(1)

g+

1 rig

ht s

hifts

; (2)

OR

them

Con

SG

apM

iner

: Sup

port

& G

ap C

heck

ing

(2)

Page 93: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g93

EG

: Gen

erat

e bi

tset

arra

y (b

a) fo

r X’

=“BA

”fr

om X

=‘B’

(g= 1)

neg

5

po

s3

po

s2

neg

po

s

Cla

ss

41

Seq

uen

ceID

CB

AB

AA

CC

BB

BA

AC

BC

AB

AB

AC

B

ba(X

):

0101

0000

1

1100

0

1001

0100

1

mas

k(X

’):

0011

0000

0

0111

0

0110

0011

0

2 sh

ifts

plus

OR ba

(‘A’):

0010

1100

0

0011

0

0010

1010

0

&

ba(X

’):

0010

0000

0

0011

0

0010

0010

0

mas

k(X

’):

0011

0000

0

0111

0

0110

0011

0

1.G

et b

afo

r X

=‘B

2.S

hift

ba(X

) to

get

mas

k fo

r X

’ = ‘B

A’

3.A

ND

ba(

‘A’)

and

mas

k(X

’) to

get

ba(

X’)

Nu

mb

er

of

arra

ys

wit

h

som

e 1

= co

un

t

Con

SG

apM

iner

: Sup

port

& G

ap C

heck

ing

(3)

Page 94: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g94

Exe

cutio

n tim

e pe

rfor

man

ce o

n pr

otei

n fa

mili

es

110100

1000

6.25

%12

.50%

18.7

5%25

%31

.25%

min

imal

sup

port

running time (sec)

0.0

0.1

1.0

10.0

100.

010

00.0

13

57

9

max

imal

gap

running time (sec)

run

tim

e vs

sup

por

t, f

or

g =

5

run

tim

e vs

g, f

or

α=

0.3

12

5(5

)(123

, 186

)

Avg

. Len

. (P

os, N

eg)

DU

F16

95 (

5)D

UF

1694

(16

)

Neg

(#)

Pos

(#)

(205

, 262

)

Avg

. Len

. (P

os, N

eg)

Tat

D_D

Nas

e(11

9)T

atC

(74)

Neg

(#)

Pos

(#) 10

0

1000

1000

0

5.40

%13

.50%

16.2

0%18

.90%

21.6

0%24

.30%

min

imal

sup

port

running time (sec)

run

tim

e vs

sup

por

t, f

or

g =

5

110100

1000

1000

0

34

56

7

max

imal

gap

running time (sec)

α

run

tim

e vs

g, f

or

α=

0.2

7(2

0)

Page 95: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g95

Pat

tern

Len

gth

Dis

trib

utio

n --

Pro

tein

Fam

ilies

The

leng

th a

nd fr

eque

ncy

dist

ribut

ion

of p

atte

rns:

TaC

vsT

atD

_DN

ase,

g =

5, α

=13

.5%.

1

100

1000

0

1000

000

34

56

78

910

11

leng

th o

f pat

tern

s

#5-MDS

1

100

1000

0

1000

000

1~10

11~2

021

~30

31~4

041

~50

>50

freq

uenc

y co

unt

#5-MDSLe

ngt

h d

istr

ibu

tion

Freq

uen

cy d

istr

ibu

tion

Page 96: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g96

Bib

le B

ooks

Exp

erim

ent

New

Tes

tam

ent (

Mat

thew

, Mar

k, L

uke

and

John

) vs

Old

Tes

tam

ent (

Gen

esis

, Exo

dus,

Lev

iticu

s an

d N

umbe

rs):

010203040

0.13

%0.

27%

0.40

%0.

53%

0.66

%

min

imal

sup

port

running time (sec)

25

Max

. Len

.

7

Avg

. Len

.

3344

Alp

habe

t

4893

3768

#Neg

#Pos

2025303540

02

46

8

max

imal

gap

running time (sec)run

tim

e vs

sup

por

t, f

or

g =

6.

run

tim

e vs

g, f

or

α=

0.0

01

3.

Som

e in

tere

stin

g te

rms

foun

d fr

om th

e B

ible

bo

oks

(New

Tes

tam

ent v

sO

ld T

esta

men

t):

Tru

ly k

ingd

om (

12)

Chi

ef p

riest

s (5

3)

Que

stio

n sa

ying

(13

)F

orgi

vene

ss in

(22

)

answ

er tr

uly

(10)

good

new

s (2

3)

seat

ed h

and

(10)

eter

nal l

ife (

24)

Sub

sequ

ence

s (c

ount

)S

ubst

rings

(co

unt)

Page 97: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g97

Ext

ensi

ons

Allo

win

g m

in g

ap c

onst

rain

tA

llow

ing

max

win

dow

leng

th c

onst

rain

tC

onsi

derin

g di

ffere

nt m

inim

izat

ion

stra

tegi

es:

Sub

sequ

ence

-bas

ed m

inim

izat

ion

(des

crib

ed o

n pr

evio

us s

lides

)C

over

age

(mat

chin

g tid

setc

onta

inm

ent)

+

subs

eque

nce

base

d m

inim

izat

ion

Pre

fix b

ased

min

imiz

atio

n

Page 98: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g98

Mot

if m

inin

g

Fin

d se

quen

ce p

atte

rns

freq

uent

aro

und

a si

te m

arke

r,

but i

nfre

quen

t els

ewhe

reC

an a

lso

cons

ider

two

clas

ses:

Fin

d pa

ttern

s fr

eque

nt a

roun

d si

te m

arke

r in

+ve

clas

s, b

ut in

fr

eque

nt a

t oth

er p

ositi

ons,

and

infr

eque

nt a

roun

d si

te m

arke

r in

–v

ecl

ass

Ofte

n, b

iolo

gica

l stu

dies

use

bac

kgro

und

prob

abili

ties

inst

ead

of

a re

al -

veda

tase

t

Pop

ular

con

cept

/tool

in b

iolo

gica

l stu

dies

Page 99: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g99

Con

tras

ts fo

r G

raph

Dat

a

Can

cap

ture

str

uctu

ral d

iffer

ence

sS

ubgr

aphs

appe

arin

g in

one

cla

ss b

ut n

ot in

th

e ot

her

clas

s•

Che

mic

al c

ompo

und

anal

ysis

•S

ocia

l net

wor

k co

mpa

rison

Page 100: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g10

0

Con

tras

ts fo

r gr

aph

data

Con

t.

Sta

ndar

d fr

eque

nt s

ubgr

aph

min

ing

Giv

en a

gra

ph d

atab

ase,

find

con

nect

ed

subg

raph

sap

pear

ing

freq

uent

ly

Con

tras

t sub

grap

hspa

rtic

ular

ly fo

cus

on

disc

rimin

atio

n an

d m

inim

ality

Page 101: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g10

1

Min

imal

con

tras

t sub

grap

hs[T

ing

and

Bai

ley

06]

A c

ontr

ast g

raph

is a

sub

grap

hap

pear

ing

in o

ne c

lass

of g

raph

s an

d ne

ver

in

anot

her

clas

s of

gra

phs

Min

imal

if n

one

of it

s su

bgra

phs

are

cont

rast

sM

ay b

e d

isco

nn

ecte

d•

Allo

ws

succ

inct

des

crip

tion

of d

iffer

ence

s•

But

req

uire

s la

rger

sea

rch

spac

e

Will

focu

s on

one

ver

sus

one

case

Page 102: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g10

2

Con

tras

t sub

grap

hex

ampl

ev 0

(a)

v 1(a

)v 2

(a)

v 3(c

)

e 2(a

)e 0

(a)

e 1(a

)

e 3(a

)e 4

(a)

Gra

ph

A

v 0(a

)

v 1(a

)v 2

(a)

e 2(a

)e 0

(a)

e 1(a

)

Gra

ph

C

v 0(a

)

v 1(a

)v 3

(c)

e 0(a

) Gra

ph

D

v 3(c

)

Gra

ph

E

Gra

ph

B

v 0(a

)

v 1(a

)v 2

(a)

v 3(a

)

e 2(a

)

e 0(a

)e 1

(a) e 3

(a)

e 4(a

)v 4

(a)

Pos

itive

Neg

ativ

e

Con

tras

tC

ontr

ast

Con

tras

t

Page 103: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g10

3

Min

imal

con

tras

t sub

grap

hs

Min

imal

cont

rast

gra

phs

are

of tw

o ty

pes

Tho

se w

ith o

nly

vert

ices

(a

vert

ex s

et)

Tho

se w

ithou

t iso

late

d ve

rtic

es (

edge

set

s)

Can

pro

ve th

at fo

r 1-

1 ca

se, t

he m

inim

alco

ntra

st s

ubgr

aphs

are

the

unio

n of

Min

. Co

n. V

erte

x S

ets

+ M

in. C

on

. Ed

ge

Set

s

Page 104: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g10

4

Min

ing

cont

rast

sub

grap

hs

Mai

n id

eaF

ind

the

max

imal

com

mon

edg

e se

ts•

The

se m

ay b

e di

scon

nect

ed

App

ly a

min

imal

hyp

ergr

aph

tran

sver

sal

oper

atio

n to

der

ive

the

min

imal

con

tras

t edg

e se

tsfr

om th

e m

axim

al c

omm

on e

dge

sets

Mus

t com

pute

min

imal

con

tras

t ver

tex

sets

se

para

tely

and

then

min

imal

uni

on w

ith th

e m

inim

al c

ontr

ast e

dge

sets

Page 105: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g10

5

Con

tras

t gra

ph m

inin

g w

orkf

low

Pos

itive

G

raph

Gp

Neg

ativ

e G

raph

Gn2

Neg

ativ

eG

raph

Gn3�

Neg

ativ

eG

raph

Gn1

Max

imal

Com

mon

E

dge

Set

s 2

(Max

imal

Com

mon

V

erte

x S

ets

2)��

Max

imal

Com

mon

E

dge

Set

s 3

(Max

imal

Com

mon

V

erte

x S

ets

1)

Max

imal

Com

mon

E

dge

Set

s 1

(Max

imal

Com

mon

V

erte

x S

ets

1)

Max

imal

C

omm

on

Edg

e S

ets

(Max

imal

C

omm

on

Ver

tex

Set

s)

Com

plem

ents

of

Max

imal

Com

mon

E

dge

Set

s

(Com

plem

ents

of

Max

imal

Com

mon

V

erte

x S

ets)

Min

imal

C

ontr

ast

Edg

e S

ets

(Min

imal

V

erte

x S

ets)

Com

plem

ent

Mini

mal

Tran

sver

sals

Page 106: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g10

6

Giv

en a

gra

ph d

atab

ase

and

a qu

ery

q. F

ind

all g

raph

s in

the

data

base

con

tain

ed in

q.

App

licat

ions

Que

ryin

g im

age

data

base

s re

pres

ente

d as

attr

ibut

ed r

elat

iona

l gr

aphs

. E

ffici

ently

find

all

obje

cts

from

the

data

base

con

tain

ed

in a

giv

en s

cene

(qu

ery)

.

Usi

ng d

iscr

imin

ativ

e gr

aphs

for

cont

ainm

ent s

earc

h an

d in

dexi

ng

[Che

n et

al 0

7]

mod

el g

raph

dat

abas

e D

quer

y gr

aph

q mod

els

cont

aine

d by

q

Page 107: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g10

7

Dis

crim

inat

ive

grap

hs fo

r in

dexi

ng

Con

t.

Mai

n id

ea:

Giv

en a

que

ry g

raph

q a

nd a

dat

abas

e gr

aph

g •If

a fe

atur

e f i

s no

t con

tain

ed in

q a

nd f

is

cont

aine

d in

g, t

hen

g is

not

con

tain

ed in

q

Als

o ex

ploi

t sim

ilarit

y be

twee

n gr

aphs

.If

f is

a co

mm

on s

ubst

ruct

ure

betw

een

g1

and

g2, t

hen

if f i

s no

t con

tain

ed in

the

quer

y,

both

g1

and

g2 a

re n

ot c

onta

ined

in th

e qu

ery

Page 108: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g10

8

Gra

ph C

onta

inm

ent E

xam

ple

[Fro

m

Che

n et

al 0

7]

00

1f 4

01

1f 3

01

1f 2

11

1f 1

g cg b

g a

(ga)

(gb)

(gc)

A S

ampl

e D

atab

ase

(f1)

(f2)

(f3)

(f4)

Fea

ture

s

Page 109: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g10

9

Dis

crim

inat

ive

grap

hs fo

r in

dexi

ng

Aim

to s

elec

tthe

``c

ontr

ast f

eatu

res’

’ tha

t hav

e th

e m

ost p

runi

ng p

ower

(sav

e m

ost

isom

orph

ism

test

s)T

hese

are

feat

ures

that

are

con

tain

ed b

y m

any

grap

hs in

the

data

base

, but

are

unl

ikel

y to

be

cont

aine

d by

a q

uery

gra

ph.

Gen

erat

e lo

ts o

f can

dida

tes

usin

g a

freq

uent

su

bgra

phm

inin

g an

d th

en fi

lter

outp

ut g

raph

s fo

r di

scrim

inat

ive

pow

er

Page 110: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g11

0

Gen

erat

ing

the

Inde

x

Afte

r th

e co

ntra

st s

ubgr

aphs

have

bee

n fo

und,

sel

ect a

sub

set o

f the

mU

se a

set

cov

er h

euris

ticto

sel

ect a

set

that

``

cove

rs’’

all t

he g

raph

s in

the

data

base

, in

the

cont

ext o

f a g

iven

que

ry q

For

mul

tiple

que

ries,

use

a m

axim

um

cove

rage

with

cos

t app

roac

h

Page 111: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g11

1

Con

tras

ts fo

r tr

ees

Spe

cial

cas

e of

gra

phs

Low

er c

ompl

exity

Lots

of a

ctiv

ity in

the

docu

men

t/XM

L ar

ea, f

or

chan

ge d

etec

tion.

Not

ions

suc

h as

edi

t dis

tanc

e m

ore

typi

cal

for

this

con

text

Page 112: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g11

2

Con

tras

ts o

f mod

els

Mod

els

can

be c

lust

erin

gs, d

ecis

ion

tree

s, …

Why

is c

ontr

astin

g us

eful

her

e ?

Con

tras

t/com

pare

a us

er g

ener

ated

mod

el a

gain

st a

kn

own

refe

renc

e m

odel

, to

eval

uate

acc

urac

y/de

gree

of

diff

eren

ce.

May

wis

h to

com

pare

deg

ree

of d

iffer

ence

betw

een

one

algo

rithm

usi

ng v

aryi

ng p

aram

eter

sE

limin

ate

redu

ndan

cyam

ong

mod

els

by c

hoos

ing

diss

imila

r re

pres

enta

tives

Page 113: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g11

3

Con

tras

ts o

f mod

els

Con

t.

Isn’

t thi

s ju

st a

dis

sim

ilarit

y m

easu

re ?

Li

ke E

uclid

ean

dist

ance

?S

imila

r, b

ut o

pera

ting

on m

ore

com

plex

ob

ject

s, n

ot ju

st v

ecto

rs

Diff

icul

ties

are

For

rul

e ba

sed

clas

sifie

rs, c

an’t

just

rep

ort o

n nu

mbe

r of

diff

eren

t rul

es

Page 114: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g11

4

Clu

ster

ing

com

paris

on

Pop

ular

clu

ster

ing

com

paris

on m

easu

res

Ran

d in

dex

an

d J

acca

rdin

dex

•M

easu

re th

e pr

opor

tion

of p

oint

pai

rs o

n w

hich

the

two

clus

terin

gsag

ree

Mu

tual

info

rmat

ion

•H

ow m

uch

info

rmat

ion

one

clus

terin

g gi

ves

abou

t th

e ot

her

Clu

ster

ing

err

or

•C

lass

ifica

tion

erro

r m

etric

Page 115: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g11

5

Clu

ster

ing

Com

paris

on M

easu

res

Nea

rly a

ll te

chni

ques

use

a ‘C

onfu

sion

Mat

rix’

of tw

o cl

uste

rings

. Exa

mpl

e : L

et C

= {

c 1, c

2, c

3)

and

C’=

{c’

1, c

’ 2, c

’ 3}

mij

= |

c i∩

c’j|

57

8c’

3

82

10c’

2

114

5c’

1

c 3c 2

c 1m

Page 116: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g11

6

Pai

r co

untin

g

Con

side

rs th

e nu

mbe

r of

poi

nts

on w

hich

two

clus

terin

gsag

ree

or d

isag

ree.

Eac

h pa

ir fa

lls in

to o

ne

of fo

ur c

ateg

orie

sN

11–

nu

mbe

r of

pai

rs o

f poi

nts

wh

ich

are

in

th

e sa

me

clu

ster

in b

oth

C a

nd

C’

N00

–n

um

ber

of p

airs

of p

oin

ts w

hic

h a

re

not

in t

he

sam

e cl

ust

er in

bot

h C

an

d C

’N

10–

nu

mbe

r of

pai

rs o

f poi

nts

wh

ich

are

in

th

e sa

me

clu

ster

in C

bu

t n

ot in

C’

N01

–n

um

ber

of p

airs

of p

oin

ts w

hic

h a

re

in t

he

sam

e cl

ust

er in

C’ b

ut

not

in C

N -

tota

l nu

mbe

r of

pai

rs o

f poi

nts

Page 117: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g11

7

Ran

d(C

,C’)

=

Jacc

ard(

C,C

’) =

Tw

o po

pula

r in

dexe

s -

Ran

d an

d Ja

ccar

d

Pai

r C

ount

ing

N11

+ N

00N

N11

N11

+ N

01 +

N10

Page 118: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g11

8

Clu

ster

ing

Err

or M

etric

(C

lass

ifica

tion

Err

or M

etric

)

An

inje

ctiv

e m

appi

ng o

f C=

{1,…

,K}

into

C’=

{1…

,K’}.

Nee

d to

find

max

imum

in

ters

ectio

n fo

r al

l pos

sibl

e m

appi

ngs.

Clu

ster

ing

erro

r=(1

4+10

+5)

/60=

0.48

3

Bes

t mat

ch is

{c2,

c’ 1

}, {

c 1, c

’ 2},

{c

3, c

’ 3}}

57

8c’

3

82

10c’

2

114

5c’

1

c 3c 2

c 1m

Page 119: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g11

9

Clu

ster

ing

Com

paris

on D

iffic

ultie

s

Ref

eren

ce

Whi

ch m

ost s

imila

r to

clu

ster

ing

(a)?

R

and(

a,b)

=R

and(

a,c)

Ja

ccar

d(a,

b)=

Jacc

ard(

a,c)

!

(a)

(b)

(c)

Page 120: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g12

0

Com

parin

g da

tase

ts v

ia in

duce

d m

odel

s

Giv

en tw

o da

tase

ts, w

e m

ay c

ompa

re th

eir

diffe

renc

e, b

y co

nsid

erin

g th

e di

ffere

nce

or

devi

atio

n be

twee

n th

e m

odel

sth

at c

an b

e in

duce

d fr

om th

emM

odel

s he

re c

an r

efer

to d

ecis

ion

tree

s,

freq

uent

item

sets

, em

ergi

ng p

atte

rns,

etc

May

als

o co

mpa

re a

n ol

d m

odel

to a

new

da

tase

tH

ow m

uch

does

it m

isre

pres

ent ?

Page 121: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g12

1

The

FO

CU

S F

ram

ewor

k [G

anti

et a

l 02]

Dev

elop

s a

sing

le m

easu

re fo

r qu

antif

ying

the

diffe

renc

e be

twee

n th

e in

tere

stin

g ch

arac

teris

tics

in e

ach

data

set.

Key

Idea

: ``A

mod

el h

as a

str

uctu

ral c

ompo

nent

th

at id

entif

ies

inte

rest

ing

regi

ons

of th

e at

trib

ute

spac

e …

eac

h su

ch r

egio

n is

sum

mar

ized

by

one

(or

seve

ral)

mea

sure

(s)’’

Diff

eren

ce b

etw

een

two

clas

sifie

rs is

mea

sure

d by

am

ount

of w

ork

need

ed to

cha

nge

them

into

so

me

com

mon

spe

cial

izat

ion

Page 122: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g12

2

Foc

usF

ram

ewor

k C

ont.

For

com

parin

g tw

o m

odel

s, d

ivid

e th

e m

odel

s ea

ch in

to r

egio

ns a

nd th

en

com

pare

the

regi

ons

indi

vidu

ally

For

a d

ecis

ion

tree

, com

pare

leaf

nod

es o

f ea

ch m

odel

Agg

rega

te th

e pa

irwis

edi

ffere

nces

bet

wee

n ea

ch o

f the

reg

ions

Page 123: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g12

3

Dec

isio

n tr

ee e

xam

ple

[Tak

en fr

om G

anti

et 0

2]

(0.1

,0.0

)

(0.0

,0.3

)

(0.0

5,0.

55)

30

100K

(0.1

8,0.

1)

(0.0

,0.1

)

(0.1

,0.5

2)

50

80K

�[0

.05-

0.1]

[0.0

-0.0

4]

[0.1

-0.1

4]

[0.0

-0.0

]

100K 80

K

3050

Salary

Age

Salary

Salary

Age

Age

[0.0

-0.0

]

[0.0

-0.0

]

T1:

D1

T2:

D2

T3:

GC

R o

f T

1 an

d T

2(j

ust

fo

r cl

ass1

)

Diff

eren

ce(D

1,D

2)=

|0.0

-0.0

|+|0

.0-0

.04|

+|0

.1-0

.14|

+|0

.0-0

.0|+

|0.0

-0.0

|+|0

.05-

0.1|

=0.

13

(cla

ss1,

clas

s2)

(cla

ss1’

,cla

ss2’

)(c

lass

1-cl

ass1

’)

Page 124: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g12

4

Cor

resp

onde

nce

Tra

cing

of

Cha

nges

[Wan

g et

al 0

3]

Cor

resp

onde

nce

trac

ing

aim

s to

mak

e ch

ange

bet

wee

n th

e tw

o m

odel

s un

ders

tand

able

by

expl

icitl

y de

scrib

ing

chan

ges

and

then

ran

king

them

Page 125: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g12

5

Cor

resp

onde

nce

Tra

cing

Exa

mpl

e [T

aken

from

Wan

g et

al 0

3]

Con

side

r ol

d an

d ne

w r

ule

base

d cl

assi

fiers

O

ldID

’s o

f ins

tanc

es c

lass

ified

O1:

If A

4=1

then

C3

[0,2

,7,9

,13,

15,1

7]

O

2: If

A3=

1 an

d A

4=2

then

C2

[1,4

,6,1

0,12

,16]

O3:

If A

3=2

and

A4=

2 th

en C

1 [3

,5,8

,11,

14]

New

N1:

If A

3=1

and

A4=

1 th

en C

3 [0

,9,1

5]N

2: If

A3=

1 an

d A

4=2

then

C2

[1,4

,6,1

0,12

,16]

N3:

If A

3=2

and

A4=

1 th

en C

2 [2

,7,1

3,17

]N

4: If

A3=

2 an

d A

4=2

then

C1

[3,5

,8,1

1,14

]

Page 126: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g12

6

Cor

resp

onde

nce

Exa

mpl

e co

nt.

Rul

es N

1 an

d N

3 cl

assi

fy th

e ex

ampl

es th

at

wer

e cl

assi

fied

by r

ule

O1.

So

the

chan

ges

for

the

sub

popu

latio

n co

vere

d by

O1

can

be

desc

ribed

as

<O

1,N

1> a

nd <

O1,

N3>

Cha

nges

<O

2,N

2> a

nd <

O3,

N4>

are

triv

ial

beca

use

the

old

and

new

rul

es a

re id

entic

al.

Page 127: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g12

7

Rul

e A

ccur

acy

Incr

ease

.

The

qua

ntita

tive

chan

ge Q

of <

O,N

> is

th

e es

timat

ed a

ccur

acy

incr

ease

(+

or

-)

due

to th

e ch

ange

from

O to

N.

Cha

nges

are

ran

ked

acco

rdin

g to

qu

antit

ativ

e ch

ange

Q a

nd th

en p

rese

nted

to

the

user

Page 128: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g12

8

Com

mon

them

es fo

r co

ntra

st

min

ing

Diff

eren

t rep

rese

ntat

ions

Min

imal

ityis

the

mos

t com

mon

Sup

port

/rat

io c

onst

rain

ts q

uite

pop

ular

, th

ough

not

nec

essa

rily

the

best

Con

junc

tions

mos

t pop

ular

for

rela

tiona

l cas

e

Larg

e nu

mbe

r of

con

tras

t pat

tern

s ar

e ou

tput

Page 129: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g12

9

Rec

omm

enda

tions

to P

ract

ition

ers

Som

e im

port

ant p

oint

s ar

eC

ontra

st p

atte

rns

can

capt

ure

dist

ingu

ishi

ng

patte

rns

betw

een

clas

ses

Con

trast

pat

tern

s ca

n be

use

d to

bui

ld h

igh

qual

ity c

lass

ifier

sC

ontra

st p

atte

rns

can

capt

ure

usef

ul p

atte

rns

for d

etec

ting/

treat

ing

dise

ases

, or o

ther

ev

ents

/con

ditio

ns

Page 130: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g13

0

Ope

n P

robl

ems

in C

ontr

ast D

ata

Min

ing

How

to m

eani

ngfu

lly a

sses

s qu

ality

of c

ontr

asts

, esp

ecia

lly fo

r no

n-re

latio

nal d

ata.

How

to e

xpla

in th

e se

man

tics

of c

ontr

asts

Min

ing

of c

ontr

asts

usi

ng u

ser

spec

ified

dom

ain

know

ledg

eH

ighl

y ex

pres

sive

cont

rast

s (f

irst o

rder

..)

Dev

elop

new

way

s to

bui

ld c

ontr

ast b

ased

cla

ssifi

ers

and

findi

ngth

e hi

ghes

t im

pact

cont

rast

sR

are

clas

s cl

assi

ficat

ion

and

cont

rast

s st

ill a

n un

settl

ed is

sue

Dis

cove

ry o

f con

tras

ts in

mas

sive

dat

aset

s.E

ffici

ently

min

e co

ntra

sts

whe

n th

ere

are

thou

sand

s of

at

trib

utes

, suc

h as

in m

edic

al d

omai

nsE

ffici

ent m

inin

g of

top-

k co

ntra

st p

atte

rns

Are

ther

e m

eani

ngfu

l app

roxi

mat

ions

(e.

g. s

ampl

ing)

?

Page 131: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g13

1

Sum

mar

y

We

have

giv

en a

wid

e su

rvey

of c

ontr

ast

min

ing.

It s

houl

d no

w b

e cl

eare

rW

hy c

ontr

ast d

ata

min

ing

is im

port

ant a

nd

whe

n it

can

be u

sed

How

it c

an b

e us

ed fo

r ve

ry p

ower

ful

clas

sifie

rsW

hat a

lgor

ithm

s ca

n be

use

d fo

r co

ntra

st

data

min

ing

Page 132: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g13

2

Ack

now

ledg

emen

ts

We

are

grat

eful

to th

e fo

llow

ing

peop

le fo

r th

eir

help

ful c

omm

ents

or

mat

eria

ls fo

r th

is tu

toria

lE

ric B

aeJi

awei

Han

Xia

onan

JiR

aoK

otag

iriJi

nyan

LiE

lsa

Loek

itoK

athe

rine

Ram

say

Lim

soon

Won

g

Page 133: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g13

3

Bib

liogr

aphy

Thi

s bi

blio

grap

hy c

onta

ins

thre

e se

ctio

ns:

Min

ing

of E

mer

ging

Pat

tern

s, C

hang

e P

atte

rns,

C

ontr

ast/D

iffer

ence

Pat

tern

sE

mer

ging

/Con

tras

t Pat

tern

Bas

ed C

lass

ifica

tion

Oth

er A

pplic

atio

ns o

f Em

ergi

ng P

atte

rns

An

up to

dat

e ve

rsio

n of

this

bib

liogr

aphy

is a

vaila

ble

at

http

://w

ww

.cs.

wrig

ht.e

du/~

gdon

g/E

PC

.htm

l

Ple

ase

let

us

kno

w o

f an

y ex

tra

refe

ren

ces

to in

clu

de

!

Page 134: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g13

4

Bib

liogr

aphy

(M

inin

g of

Em

ergi

ng P

atte

rns,

Cha

nge

Pat

tern

s,

Con

tras

t/Diff

eren

ce P

atte

rns)

Aru

nasa

lam

, Bav

ani a

nd C

haw

la, S

anja

y an

d S

un, P

ei. S

trik

ing

Tw

o B

irds

with

One

Sto

ne: S

imul

tane

ous

Min

ing

of P

ositi

ve a

nd N

egat

ive

Spa

tial P

atte

rns.

In P

roce

edin

gs o

f the

Fift

h S

IAM

Inte

rnat

iona

l Con

fere

nce

on D

ata

Min

ing,

Apr

il 21

-23,

pp,

New

port

Bea

ch, C

A, U

SA

, SIA

M 2

005

Bav

ani A

runa

sala

m, S

anja

y C

haw

la: C

CC

S: a

top-

dow

n as

soci

ativ

e cl

assi

fier

for

imba

lanc

ed c

lass

dis

trib

utio

n.

KD

D 2

006:

517

-522

Eric

Bae

, Jam

es B

aile

y, G

uozh

u D

ong:

Clu

ster

ing

Sim

ilarit

y C

ompa

rison

Usi

ng D

ensi

ty P

rofil

es. A

ustr

alia

n C

onfe

renc

e on

Art

ifici

al In

telli

genc

e 20

06: 3

42-3

51Ja

mes

Bai

ley,

Tho

mas

Man

ouki

an, K

otag

iri R

amam

ohan

arao

: Fas

t Alg

orith

ms

for

Min

ing

Em

ergi

ng P

atte

rns.

P

KD

D 2

002:

39-

50.

J. B

aile

y an

d T

. Man

ouki

an a

nd K

. Ram

amoh

anar

ao: A

Fas

t Alg

orith

m fo

r C

ompu

ting

Hyp

ergr

aph

Tra

nsve

rsal

s an

d its

App

licat

ion

in M

inin

g E

mer

ging

Pat

tern

s. P

roce

edin

gs o

f the

3rd

IEE

E In

tern

atio

nal C

onfe

renc

e on

Dat

a M

inin

g (I

CD

M).

Pag

es 4

85-4

88. F

lorid

a, U

SA

, Nov

embe

r 20

03.

Ste

phen

D. B

ay, M

icha

el J

. Paz

zani

: Det

ectin

g C

hang

e in

Cat

egor

ical

Dat

a: M

inin

g C

ontr

ast S

ets.

KD

D 1

999:

30

2-30

6.S

teph

en D

. Bay

, Mic

hael

J. P

azza

ni: D

etec

ting

Gro

up D

iffer

ence

s: M

inin

g C

ontr

ast S

ets.

Dat

a M

in. K

now

l. D

isco

v. 5

(3):

213

-246

(20

01)

Cris

tian

Buc

ila, J

ohan

nes

Geh

rke,

Dan

iel K

ifer,

Wal

ker

M. W

hite

: Dua

lMin

er: A

Dua

l-Pru

ning

Alg

orith

m fo

r Ite

mse

ts w

ith C

onst

rain

ts. D

ata

Min

. Kno

wl.

Dis

cov.

7(3

): 2

41-2

72 (

2003

)Y

ando

ng C

ai, N

ick

Cer

cone

, Jia

wei

Han

: An

Attr

ibut

e-O

rient

ed A

ppro

ach

for

Lear

ning

Cla

ssifi

catio

n R

ules

from

R

elat

iona

l Dat

abas

es. I

CD

E 1

990:

281

-288

Sar

ah C

han,

Ben

Kao

, Chi

Lap

Yip

, Mic

hael

Tan

g: M

inin

g E

mer

ging

Sub

strin

gs. D

AS

FA

A 2

003.

Yix

in C

hen,

Guo

zhu

Don

g, J

iaw

ei H

an, J

ian

Pei

, Ben

jam

in W

. Wah

, Jia

nyon

g W

ang:

Onl

ine

Ana

lytic

al

Pro

cess

ing

Str

eam

Dat

a: Is

It F

easi

ble?

DM

KD

200

2C

hen

Che

n, X

ifeng

Yan

, Phi

lip S

. Yu,

Jia

wei

Han

, Don

g-Q

ing

Zha

ng, X

iaoh

ui G

u: T

owar

ds G

raph

Con

tain

men

t S

earc

h an

d In

dexi

ng. V

LDB

200

7: 9

26-9

37

Page 135: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g13

5

Bib

liogr

aphy

(M

inin

g of

Em

ergi

ng P

atte

rns,

Cha

nge

Pat

tern

s,

Con

tras

t/Diff

eren

ce P

atte

rns)

Gra

ham

Cor

mod

e, S

. Mut

hukr

ishn

an: W

hat's

new

: fin

ding

sig

nific

ant d

iffer

ence

s in

net

wor

k da

ta s

trea

ms.

IE

EE

/AC

M T

rans

. Net

w. 1

3(6)

: 121

9-12

32 (

2005

)Lu

c D

e R

aedt

, Alb

rech

t Zim

mer

man

n: C

onst

rain

t-B

ased

Pat

tern

Set

Min

ing.

SD

M 2

007

Luc

De

Rae

dt: T

owar

ds Q

uery

Eva

luat

ion

in In

duct

ive

Dat

abas

es U

sing

Ver

sion

Spa

ces.

Dat

abas

e S

uppo

rt fo

r D

ata

Min

ing

App

licat

ions

200

4: 1

17-1

34Lu

c D

e R

aedt

, Ste

fan

Kra

mer

: The

Lev

elw

ise

Ver

sion

Spa

ce A

lgor

ithm

and

its

App

licat

ion

to M

olec

ular

F

ragm

ent F

indi

ng. I

JCA

I 200

1: 8

53-8

62G

uozh

u D

ong,

Jin

yan

Li: E

ffici

ent M

inin

g of

Em

ergi

ng P

atte

rns:

Dis

cove

ring

Tre

nds

and

Diff

eren

ces.

KD

D 1

999:

43

-52.

G

uozh

u D

ong,

Jin

yan

Li: M

inin

g bo

rder

des

crip

tions

of e

mer

ging

pat

tern

s fr

om d

atas

et p

airs

. Kno

wl.

Inf.

Sys

t. 8(

2): 1

78-2

02 (

2005

).D

ong,

G. a

nd H

an, J

. and

Lak

shm

anan

, L.V

.S. a

nd P

ei, J

. and

Wan

g, H

. and

Yu,

P.S

. Onl

ine

Min

ing

of C

hang

es

from

Dat

a S

trea

ms:

Res

earc

h P

robl

ems

and

Pre

limin

ary

Res

ults

, Pro

ceed

ings

of t

he 2

003

AC

M S

IGM

OD

W

orks

hop

on M

anag

emen

t and

Pro

cess

ing

of D

ata

Str

eam

s, 2

003

Guo

zhu

Don

g, J

iaw

eiH

an, J

oyce

M. W

. Lam

, Jia

nP

ei, K

eW

ang,

Wei

Zou

: Min

ing

Con

stra

ined

Gra

dien

ts in

La

rge

Dat

abas

es. I

EE

E T

rans

. Kno

wl.

Dat

a E

ng. 1

6(8)

: 922

-938

(20

04).

Joha

nnes

Fis

cher

, Vol

ker

Heu

n, S

tefa

n K

ram

er: O

ptim

al S

trin

g M

inin

g U

nder

Fre

quen

cy C

onst

rain

ts. P

KD

D

2006

: 139

-150

Ven

kate

shG

anti,

Joh

anne

s G

ehrk

e, R

aghu

Ram

akris

hnan

: A F

ram

ewor

k fo

r M

easu

ring

Cha

nges

in D

ata

Cha

ract

eris

tics.

PO

DS

199

9: 1

26-1

37V

enka

tesh

Gan

ti, J

ohan

nes

Geh

rke,

Rag

huR

amak

rishn

an, W

ei-Y

in L

oh: A

Fra

mew

ork

for

Mea

surin

g D

iffer

ence

s in

Dat

a C

hara

cter

istic

s. J

. Com

put.

Sys

t. S

ci. 6

4(3)

: 542

-578

(20

02)

Gar

riga,

G.C

. and

Kra

lj, P

. and

Lav

rac,

N. C

lose

d S

ets

for

Labe

led

Dat

a?, P

KD

D, 2

006

Hild

erm

an, R

.J. a

nd P

eckh

am, T

. A S

tatis

tical

ly S

ound

Alte

rnat

ive

App

roac

h to

Min

ing

Con

tras

t Set

s,

Pro

ceed

ings

of t

he 4

th A

ustr

alas

ian

Dat

a M

inin

g C

onfe

renc

e, 2

005

(pp1

57-1

72)

Page 136: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g13

6

Bib

liogr

aphy

(M

inin

g of

Em

ergi

ng P

atte

rns,

Cha

nge

Pat

tern

s,

Con

tras

t/Diff

eren

ce P

atte

rns)

Hui

-jing

Hua

ng, Y

ongs

ong

Qin

, Xia

ofen

gZ

hu, J

ilian

Zha

ng, a

nd S

hich

aoZ

hang

. Diff

eren

ce

Det

ectio

n B

etw

een

Tw

o C

ontr

ast S

ets.

Pro

ceed

ings

of t

he 8

th I

nter

natio

nal C

onfe

renc

e on

Dat

a W

areh

ousi

ng a

nd K

now

ledg

e D

isco

very

(D

aWak

), 2

006.

Imbe

rman

, S.P

. and

Tan

sel,

A.U

. and

Pac

uit,

E. A

n E

ffici

ent M

etho

d F

or F

indi

ng E

mer

ging

Fre

quen

t Ite

mse

ts,

3rd

Inte

rnat

iona

l Wor

ksho

p on

Min

ing

Tem

pora

l and

Seq

uent

ial D

ata,

pp1

12--

121,

200

4T

omas

z Im

ielin

ski,

Leon

id K

hach

iyan

, Am

inA

bdul

ghan

i: C

ubeg

rade

s: G

ener

aliz

ing

Ass

ocia

tion

Rul

es. D

ata

Min

. K

now

l. D

isco

v. 6

(3):

219

-257

(20

02)

Inak

oshi

, H. a

nd A

ndo,

T. a

nd S

ato,

A. a

nd O

kam

oto,

S. D

isco

very

of e

mer

ging

pat

tern

s fr

om n

eare

st n

eigh

bors

, In

tern

atio

nal C

onfe

renc

e on

Mac

hine

Lea

rnin

g an

d C

yber

netic

s, 2

002.

X

iaon

anJi

, Jam

es B

aile

y, G

uozh

u D

ong:

Min

ing

Min

imal

Dis

tingu

ishi

ng S

ubse

quen

ce P

atte

rns

with

Gap

C

onst

rain

ts. I

CD

M 2

005:

194

-201

.X

iaon

anJi

, Jam

es B

aile

y, G

uozh

u D

ong:

Min

ing

Min

imal

Dis

tingu

ishi

ng S

ubse

quen

ce P

atte

rns

with

Gap

C

onst

rain

ts. K

now

l. In

f. S

yst.

11(3

): 2

59--

286

(200

7).

Dan

iel K

ifer,

Sha

iBen

-Dav

id, J

ohan

nes

Geh

rke:

Det

ectin

g C

hang

e in

Dat

a S

trea

ms.

VLD

B 2

004:

180

-191

P K

ralj,

N L

avra

c, D

Gam

berg

er, A

Krs

taci

c. C

ontr

ast S

et M

inin

g fo

r D

istin

guis

hing

Bet

wee

n S

imila

r D

isea

ses.

LN

CS

Vol

ume

4594

, 200

7.S

auD

an L

ee, L

uc D

e R

aedt

: An

Effi

cien

t Alg

orith

m fo

r M

inin

g S

trin

g D

atab

ases

Und

er C

onst

rain

ts. K

DID

200

4:

108-

129

Hai

quan

Li, J

inya

nLi

, Lim

soon

Won

g, M

engl

ing

Fen

g, Y

ap-P

eng

Tan

: Rel

ativ

e ris

k an

d od

ds r

atio

: a d

ata

min

ing

pers

pect

ive.

PO

DS

200

5: 3

68-3

77Ji

nyan

Li, G

uim

eiLi

u an

d Li

mso

onW

ong.

Min

ing

Sta

tistic

ally

Impo

rtan

t Equ

ival

ence

Cla

sses

and

del

ta-

Dis

crim

inat

ive

Em

ergi

ng P

atte

rns.

KD

D 2

007.

Jiny

anLi

, Tho

mas

Man

ouki

an, G

uozh

u D

ong,

Kot

agiri

Ram

amoh

anar

ao: I

ncre

men

tal M

aint

enan

ce o

n th

e B

orde

r of

the

Spa

ce o

f Em

ergi

ng P

atte

rns.

Dat

a M

in. K

now

l. D

isco

v. 9

(1):

89-

116

(200

4).

Page 137: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g13

7

Bib

liogr

aphy

(M

inin

g of

Em

ergi

ng P

atte

rns,

Cha

nge

Pat

tern

s,

Con

tras

t/Diff

eren

ce P

atte

rns)

Jiny

anLi

and

Qia

ngY

ang.

Str

ong

Com

poun

d-R

isk

Fac

tors

: Effi

cien

t Dis

cove

ry th

roug

h E

mer

ging

Pat

tern

s an

d C

ontr

ast S

ets.

IEE

E T

rans

actio

ns o

n In

form

atio

n T

echn

olog

y in

Bio

med

icin

e. T

o ap

pear

.Li

n, J

. and

Keo

gh, E

. G

roup

SA

X: E

xten

ding

the

Not

ion

of C

ontr

ast S

ets

to T

ime

Ser

ies

and

Mul

timed

ia D

ata.

P

roce

edin

gs o

f the

10t

h eu

rope

anco

nfer

ence

on

prin

cipl

es a

nd p

ract

ice

of k

now

ledg

e di

scov

ery

inda

taba

ses.

B

erlin

, Ger

man

y, S

epte

mbe

r, 2

006.

Bin

g Li

u, K

eW

ang,

Lai

-Fun

Mun

, Xin

-Zhi

Qi:

Usi

ng D

ecis

ion

Tre

e In

duct

ion

for

Dis

cove

ring

Hol

es in

Dat

a.

PR

ICA

I 199

8: 1

82-1

93B

ing

Liu,

Lia

ng-P

ing

Ku,

Wyn

ne H

su: D

isco

verin

g In

tere

stin

g H

oles

in D

ata.

IJC

AI(

2) 1

997:

930

-935

Bin

g Li

u, W

ynne

Hsu

, Yim

ing

Ma:

Dis

cove

ring

the

set o

f fun

dam

enta

l rul

e ch

ange

s. K

DD

200

1: 3

35-3

40.

Els

a Lo

ekito

, Jam

es B

aile

y: F

ast M

inin

g of

Hig

h D

imen

sion

al E

xpre

ssiv

e C

ontr

ast P

atte

rns

Usi

ng Z

ero-

supp

ress

ed B

inar

y D

ecis

ion

Dia

gram

s. K

DD

200

6: 3

07-3

16.

Yu

Men

g, M

arga

ret H

. Dun

ham

: Effi

cien

t Min

ing

of E

mer

ging

Eve

nts

in a

Dyn

amic

Spa

tiote

mpo

ral E

nviro

nmen

t. P

AK

DD

200

6: 7

50-7

54T

om M

. Mitc

hell:

Ver

sion

Spa

ces:

A C

andi

date

Elim

inat

ion

App

roac

h to

Rul

e Le

arni

ng. I

JCA

I 197

7: 3

05-3

10A

mit

Sat

sang

i, O

smar

R. Z

aian

e, C

ontr

astin

g th

e C

ontr

ast S

ets:

An

Alte

rnat

ive

App

roac

h, E

leve

nth

Inte

rnat

iona

l D

atab

ase

Eng

inee

ring

and

App

licat

ions

Sym

posi

um (

IDE

AS

200

7), B

anff,

Can

ada,

Sep

tem

ber

6-8,

200

7 M

iche

le S

ebag

: Del

ayin

g th

e C

hoic

e of

Bia

s: A

Dis

junc

tive

Ver

sion

Spa

ce A

ppro

ach.

ICM

L 19

96: 4

44-4

52M

iche

le S

ebag

: Usi

ng C

onst

rain

ts to

Bui

ldin

g V

ersi

on S

pace

s. E

CM

L 19

94: 2

57-2

71A

rnau

d S

oule

t, B

runo

Cré

mill

eux,

Fra

nçoi

s R

ioul

t: C

onde

nsed

Rep

rese

ntat

ion

of E

Ps

and

Pat

tern

s Q

uant

ified

by

Fre

quen

cy-B

ased

Mea

sure

s. K

DID

200

4: 1

73-1

90P

awel

Ter

leck

i, K

rzys

ztof

Wal

czak

: On

the

rela

tion

betw

een

roug

h se

t red

ucts

and

jum

ping

em

ergi

ng p

atte

rns.

In

f. S

ci. 1

77(1

): 7

4-83

(20

07).

Page 138: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g13

8

Bib

liogr

aphy

(M

inin

g of

Em

ergi

ng P

atte

rns,

Cha

nge

Pat

tern

s,

Con

tras

t/Diff

eren

ce P

atte

rns)

Rog

er M

ing

Hie

ngT

ing,

Jam

es B

aile

y: M

inin

g M

inim

al C

ontr

ast S

ubgr

aph

Pat

tern

s. S

DM

200

6.V

. S. T

seng

, C. J

. Chu

, and

Tyn

e Li

ang,

An

Effi

cien

t Met

hod

for

Min

ing

Tem

pora

l Em

ergi

ng It

emse

tsF

rom

Dat

a S

trea

ms,

Inte

rnat

iona

l Com

pute

r S

ympo

sium

, Wor

ksho

p on

Sof

twar

e E

ngin

eerin

g, D

atab

ases

and

Kno

wle

dge

Dis

cove

ry, 2

006

J. V

reek

en, M

. van

Lee

uwen

, A. S

iebe

s: C

hara

cter

isin

gth

e D

iffer

ence

. KD

D 2

007.

Hai

xun

Wan

g, W

ei F

an, P

hilip

S. Y

u, J

iaw

eiH

an: M

inin

g co

ncep

t-dr

iftin

g da

ta s

trea

ms

usin

g en

sem

ble

clas

sifie

rs. K

DD

200

3: 2

26-2

35P

eng

Wan

g, H

aixu

nW

ang,

Xia

oche

nW

u, W

ei W

ang,

Bai

leS

hi: O

n R

educ

ing

Cla

ssifi

er G

ranu

larit

y in

Min

ing

Con

cept

-Drif

ting

Dat

a S

trea

ms.

ICD

M 2

005:

474

-481

Lush

eng

Wan

g, H

aoZ

hao,

Guo

zhu

Don

g, J

ianp

ing

Li: O

n th

e co

mpl

exity

of f

indi

ng e

mer

ging

pat

tern

s. T

heor

. C

ompu

t. S

ci. 3

35(1

): 1

5-27

(20

05).

Ke

Wan

g, S

enqi

ang

Zho

u, A

daW

ai-C

hee

Fu,

Jef

frey

Xu

Yu:

Min

ing

Cha

nges

of C

lass

ifica

tion

by

Cor

resp

onde

nce

Tra

cing

. SD

M 2

003.

Geo

ffrey

I. W

ebb:

Dis

cove

ring

Sig

nific

ant P

atte

rns.

Mac

hine

Lea

rnin

g 68

(1):

1-3

3 (2

007)

Geo

ffrey

I. W

ebb,

Son

gmao

Zha

ng: K

-Opt

imal

Rul

e D

isco

very

. Dat

a M

in. K

now

l. D

isco

v. 1

0(1)

: 39-

79 (

2005

)G

eoffr

ey I.

Web

b, S

hane

M. B

utle

r, D

ougl

as A

. New

land

s: O

n de

tect

ing

diffe

renc

es b

etw

een

grou

ps. K

DD

200

3:

256-

265.

Xiu

zhen

Zha

ng, G

uozh

u D

ong,

Kot

agiri

Ram

amoh

anar

ao: E

xplo

ring

cons

trai

nts

to e

ffici

ently

min

e em

ergi

ng

patte

rns

from

larg

e hi

gh-d

imen

sion

al d

atas

ets.

KD

D 2

000:

310

-314

.Li

zhua

ngZ

hao,

Moh

amm

ed J

. Zak

i, N

aren

Ram

akris

hnan

: BLO

SO

M: a

fram

ewor

k fo

r m

inin

g ar

bitr

ary

bool

ean

expr

essi

ons.

KD

D 2

006:

827

-832

Page 139: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g13

9

Bib

liogr

aphy

(E

mer

ging

/Con

tras

t Pat

tern

Bas

ed

Cla

ssifi

catio

n)

Ham

adA

lham

mad

y, K

otag

iriR

amam

ohan

arao

: The

App

licat

ion

of E

mer

ging

Pat

tern

s fo

r Im

prov

ing

the

Qua

lity

of R

are-

Cla

ss C

lass

ifica

tion.

PA

KD

D 2

004:

207

-211

Ham

adA

lham

mad

y, K

otag

iriR

amam

ohan

arao

: Usi

ng E

mer

ging

Pat

tern

s an

d D

ecis

ion

Tre

es in

Rar

e-C

lass

C

lass

ifica

tion.

ICD

M 2

004:

315

-318

Ham

adA

lham

mad

y, K

otag

iriR

amam

ohan

arao

: Exp

andi

ng th

e T

rain

ing

Dat

a S

pace

Usi

ng E

mer

ging

Pat

tern

s an

d G

enet

ic M

etho

ds. S

DM

200

5H

amad

Alh

amm

ady,

Kot

agiri

Ram

amoh

anar

ao: U

sing

Em

ergi

ng P

atte

rns

to C

onst

ruct

Wei

ghte

d D

ecis

ion

Tre

es.

IEE

E T

rans

. Kno

wl.

Dat

a E

ng.

18(7

): 8

65-8

76 (

2006

).H

amad

Alh

amm

ady,

Kot

agiri

Ram

amoh

anar

ao: M

inin

g E

mer

ging

Pat

tern

s an

d C

lass

ifica

tion

in D

ata

Str

eam

s.

Web

Inte

llige

nce

2005

: 272

-275

Jam

es B

aile

y, T

hom

as M

anou

kian

, Kot

agiri

Ram

amoh

anar

ao: C

lass

ifica

tion

Usi

ng C

onst

rain

ed E

mer

ging

P

atte

rns.

WA

IM 2

003:

226

-237

Guo

zhu

Don

g, X

iuzh

enZ

hang

, Lim

soon

Won

g, J

inya

nLi

: CA

EP

: Cla

ssifi

catio

n by

Agg

rega

ting

Em

ergi

ng

Pat

tern

s. D

isco

very

Sci

ence

1999

: 30-

42.

Hon

gjia

nF

an, K

otag

iriR

amam

ohan

arao

: An

Effi

cien

t Sin

gle-

Sca

n A

lgor

ithm

for

Min

ing

Ess

entia

l Jum

ping

E

mer

ging

Pat

tern

s fo

r C

lass

ifica

tion.

PA

KD

D 2

002:

456

-462

Hon

gjia

nF

an, K

otag

iriR

amam

ohan

arao

: Effi

cien

tly M

inin

g In

tere

stin

g E

mer

ging

Pat

tern

s. W

AIM

200

3: 1

89-2

01H

ongj

ian

Fan

, Kot

agiri

Ram

amoh

anar

ao: N

oise

Tol

eran

t Cla

ssifi

catio

n by

Chi

Em

ergi

ng P

atte

rns.

PA

KD

D 2

004:

20

1-20

6H

ongj

ian

Fan

, Min

g F

an, K

otag

iriR

amam

ohan

arao

, Men

gxu

Liu:

Fur

ther

Impr

ovin

g E

mer

ging

Pat

tern

Bas

ed

Cla

ssifi

ers

Via

Bag

ging

. PA

KD

D 2

006:

91-

96H

ongj

ian

Fan

, Kot

agiri

Ram

amoh

anar

ao: A

wei

ghtin

g sc

hem

e ba

sed

on e

mer

ging

pat

tern

s fo

r w

eigh

ted

supp

ort

vect

or m

achi

nes.

GrC

2005

: 435

-440

Page 140: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g14

0

Bib

liogr

aphy

(E

mer

ging

/Con

tras

t Pat

tern

Bas

ed

Cla

ssifi

catio

n)

Hon

gjia

nF

an, K

otag

iriR

amam

ohan

arao

: Fas

t Dis

cove

ry a

nd th

e G

ener

aliz

atio

n of

Str

ong

Jum

ping

Em

ergi

ng

Pat

tern

s fo

r B

uild

ing

Com

pact

and

Acc

urat

e C

lass

ifier

s. IE

EE

Tra

ns. K

now

l. D

ata

Eng

. 18(

6): 7

21-7

37 (

2006

)Ji

nyan

Li, G

uozh

u D

ong,

Kot

agiri

Ram

amoh

anar

ao: I

nsta

nce-

Bas

ed C

lass

ifica

tion

by E

mer

ging

Pat

tern

s. P

KD

D

2000

: 191

-200

Jiny

anLi

, Guo

zhu

Don

g, K

otag

iriR

amam

ohan

arao

: Mak

ing

Use

of t

he M

ost E

xpre

ssiv

e Ju

mpi

ng E

mer

ging

P

atte

rns

for

Cla

ssifi

catio

n. P

AK

DD

200

0: 2

20-2

32Ji

nyan

Li, G

uozh

u D

ong,

Kot

agiri

Ram

amoh

anar

ao: M

akin

g U

se o

f the

Mos

t Exp

ress

ive

Jum

ping

Em

ergi

ng

Pat

tern

s fo

r C

lass

ifica

tion.

Kno

wl.

Inf.

Sys

t. 3(

2): 1

31-1

45 (

2001

)Ji

nyan

Li, K

otag

iriR

amam

ohan

arao

, Guo

zhu

Don

g: E

mer

ging

Pat

tern

s an

d C

lass

ifica

tion.

AS

IAN

200

0:15

-32

Jiny

anLi

, Guo

zhu

Don

g, K

otag

iriR

amam

ohan

arao

, Lim

soon

Won

g: D

eEP

s: A

New

Inst

ance

-Bas

ed L

azy

Dis

cove

ry a

nd C

lass

ifica

tion

Sys

tem

. Mac

hine

Lea

rnin

g 54

(2):

99-

124

(200

4).

Wen

min

Li, J

iaw

eiH

an, J

ian

Pei

: CM

AR

: Acc

urat

e an

d E

ffici

ent C

lass

ifica

tion

Bas

ed o

n M

ultip

le C

lass

-A

ssoc

iatio

n R

ules

. IC

DM

200

1: 3

69-3

76Ji

nyan

Li, K

otag

iriR

amam

ohan

arao

, Guo

zhu

Don

g: C

ombi

ning

the

Str

engt

h of

Pat

tern

Fre

quen

cy a

nd D

ista

nce

for

Cla

ssifi

catio

n. P

AK

DD

200

1: 4

55-4

66B

ing

Liu,

Wyn

ne H

su, Y

imin

gM

a: In

tegr

atin

g C

lass

ifica

tion

and

Ass

ocia

tion

Rul

e M

inin

g. K

DD

1998

: 80-

86K

otag

iriR

amam

ohan

arao

, Jam

es B

aile

y: D

isco

very

of E

mer

ging

Pat

tern

s an

d T

heir

Use

in C

lass

ifica

tion.

A

ustr

alia

n C

onfe

renc

e on

Art

ifici

al In

telli

genc

e 20

03: 1

-12

Ram

amoh

anar

ao, K

. and

Bai

ley,

J. a

nd F

an, H

. Effi

cien

t Min

ing

of C

ontr

ast P

atte

rns

and

The

ir A

pplic

atio

ns to

C

lass

ifica

tion,

Thi

rd In

tern

atio

nal C

onfe

renc

e on

Inte

llige

nt S

ensi

ng a

nd In

form

atio

n P

roce

ssin

g, 2

005

(39-

-47)

.R

amam

ohan

arao

, K. a

nd F

an, H

. Pat

tern

s B

ased

Cla

ssifi

ers,

Wor

ld W

ide

Web

200

7: 1

0(71

--83

).Q

unS

un, X

iuzh

enZ

hang

, Kot

agiri

Ram

amoh

anar

ao: N

oise

Tol

eran

ce o

f EP

-Bas

ed C

lass

ifier

s. A

ustr

alia

n C

onfe

renc

e on

Art

ifici

al In

telli

genc

e 20

03: 7

96-8

06

Page 141: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g14

1

Bib

liogr

aphy

(E

mer

ging

/Con

tras

t Pat

tern

Bas

ed

Cla

ssifi

catio

n)

Xia

oxin

Yin

, Jia

wei

Han

: CP

AR

: Cla

ssifi

catio

n ba

sed

on P

redi

ctiv

e A

ssoc

iatio

n R

ules

. SD

M 2

003

Xiu

zhen

Zha

ng, G

uozh

u D

ong,

Kot

agiri

Ram

amoh

anar

ao: I

nfor

mat

ion-

Bas

ed C

lass

ifica

tion

by A

ggre

gatin

g E

mer

ging

Pat

tern

s. ID

EA

L 20

00: 4

8-53

Xiu

zhen

Zha

ng, G

uozh

u D

ong,

Kot

agiri

Ram

amoh

anar

ao: B

uild

ing

Beh

avio

urK

now

ledg

e S

pace

to M

ake

Cla

ssifi

catio

n D

ecis

ion.

PA

KD

D 2

001:

488

-494

Zho

u W

ang,

Hon

gjia

nF

an, K

otag

iriR

amam

ohan

arao

: Exp

loiti

ng M

axim

al E

mer

ging

Pat

tern

s fo

r C

lass

ifica

tion.

A

ustr

alia

n C

onfe

renc

e on

Art

ifici

al In

telli

genc

e 20

04: 1

062-

1068

Page 142: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g14

2

Bib

liogr

aphy

(O

ther

App

licat

ions

of E

mer

ging

Pat

tern

s)

Ann

e-La

ure

Bou

lest

eix,

Ger

hard

Tut

z, K

orbi

nian

Str

imm

er: A

CA

RT

-bas

ed a

ppro

ach

to d

isco

ver

emer

ging

pa

ttern

s in

mic

roar

ray

data

. Bio

info

rmat

ics

19(1

8): 2

465-

2472

(20

03).

Liju

nC

hen,

Guo

zhu

Don

g: M

asqu

erad

er D

etec

tion

Usi

ng O

CLE

P: O

ne C

lass

Cla

ssifi

catio

n U

sing

Len

gth

Sta

tistic

s of

Em

ergi

ng P

atte

rns.

Pro

ceed

ings

of I

nter

natio

nal W

orks

hop

on IN

form

atio

nP

roce

ssin

g ov

er E

volv

ing

Net

wor

ks (

WIN

PE

N),

200

6.G

uozh

u D

ong,

Kau

stub

hD

eshp

ande

: Effi

cien

t Min

ing

of N

iche

s an

d S

et R

outin

es. P

AK

DD

200

1: 2

34-2

46G

rand

inet

ti, W

.M. a

nd C

hesn

evar

, C.I.

and

Fal

appa

, M.A

. Enh

ance

d A

ppro

xim

atio

n of

the

Em

ergi

ng P

atte

rn

Spa

ce u

sing

an

Incr

emen

tal A

ppro

ach,

Pro

ceed

ings

of V

II W

orks

hop

of R

esea

rche

rs in

Com

pute

r S

cien

ces,

A

rgen

tine,

pp2

63--

267,

200

5Ji

nyan

Li, H

uiqi

ngLi

u, S

ee-K

iong

Ng,

Lim

soon

Won

g. D

isco

very

of S

igni

fican

t Rul

es fo

r C

lass

ifyin

g C

ance

r D

iagn

osis

Dat

a . B

ioin

form

atic

s. 1

9 (s

uppl

. 2):

ii93

-ii10

2. (

Thi

s pa

per

was

als

o pr

esen

ted

in th

e 20

03 E

urop

ean

Con

fere

nce

on C

ompu

tatio

nal B

iolo

gy, P

aris

, Fra

nce,

Sep

tem

ber

26-3

0.)

Jiny

anLi

, Hui

qing

Liu,

Jam

es R

. Dow

ning

, Alle

n E

ng-J

uhY

eoh,

Lim

soon

Won

g. S

impl

e R

ules

Und

erly

ing

Gen

e E

xpre

ssio

n P

rofil

es o

f Mor

e th

an S

ix S

ubty

pes

of A

cute

Lym

phob

last

icLe

ukem

ia (

ALL

) P

atie

nts.

Bio

info

rmat

ics.

19

:71-

-78,

200

3.

Jiny

anLi

, Lim

soon

Won

g: E

mer

ging

pat

tern

s an

d ge

ne e

xpre

ssio

n da

ta. G

enom

e In

form

atic

s, 2

001:

12(3

--13

).

Jiny

anLi

, Lim

soon

Won

g: Id

entif

ying

goo

d di

agno

stic

gen

e gr

oups

from

gen

e ex

pres

sion

pro

files

usi

ng th

e co

ncep

t of e

mer

ging

pat

tern

s. B

ioin

form

atic

s 18

(5):

725

-734

(20

02)

Jiny

anLi

, Lim

soon

Won

g. G

eogr

aphy

of D

iffer

ence

s B

etw

een

Tw

o C

lass

es o

f Dat

a. P

roce

edin

gs 6

th E

urop

ean

Con

fere

nce

on P

rinci

ples

of D

ata

Min

ing

and

Kno

wle

dge

Dis

cove

ry,p

ages

325

--33

7, H

elsi

nki,

Fin

land

, Aug

ust

2002

.Ji

nyan

Li a

nd L

imso

onW

ong.

Str

uctu

ral G

eogr

aphy

of t

he s

pace

of e

mer

ging

pat

tern

s. In

telli

gent

Dat

a A

naly

sis

(ID

A):

An

Inte

rnat

iona

l Jou

rnal

, Vol

ume

9, p

ages

567

-588

, Nov

embe

r 20

05.

Jiny

anLi

, Xiu

zhen

Zha

ng, G

uozh

u D

ong,

Kot

agiri

Ram

amoh

anar

ao, Q

unS

un: E

ffici

ent M

inin

g of

Hig

h C

onfid

ienc

eA

ssoc

iatio

n R

ules

with

out S

uppo

rt T

hres

hold

s. P

KD

D 1

999:

406

-411

Page 143: Emerging patterns based classifier

IEE

E IC

DM

28-

31 O

ct. 0

7C

ontr

ast D

ata

Min

ing:

Met

hods

and

App

licat

ions

Ja

mes

Bai

ley

and

Guo

zhu

Don

g14

3

Bib

liogr

aphy

(O

ther

App

licat

ions

of E

mer

ging

Pat

tern

s)

Shi

hong

Mao

, Guo

zhu

Don

g: D

isco

very

of H

ighl

y D

iffer

entia

tive

Gen

e G

roup

s fr

om M

icro

arra

yG

ene

Exp

ress

ion

Dat

a U

sing

the

Gen

e C

lub

App

roac

h. J

. Bio

info

rmat

ics

and

Com

puta

tiona

l Bio

logy

3(6

): 1

263-

1280

(20

05).

Pod

raza

, R. a

nd T

omas

zew

ski,

K. K

TD

A: E

mer

ging

Pat

tern

s B

ased

Dat

a A

naly

sis

Sys

tem

, Pro

ceed

ings

of X

XI

Fal

l Mee

ting

of P

olis

h In

form

atio

n P

roce

ssin

g S

ocie

ty, p

p213

--22

1, 2

005

Rio

ult,

F. M

inin

g st

rong

em

ergi

ng p

atte

rns

in w

ide

SA

GE

dat

a, P

roce

edin

gs o

f the

EC

ML/

PK

DD

Dis

cove

ry

Cha

lleng

e W

orks

hop,

Pis

a, It

aly,

pp1

27--

138,

200

4E

ng-J

uhY

eoh,

Mar

y E

. Ros

s, S

heila

A. S

hurt

leff,

W. K

ent W

illia

m, D

ivye

nP

atel

, Ram

iMah

fouz

, Fre

d G

. Beh

m,

Sus

ana

C. R

aim

ondi

, Mar

y V

. Rei

lling

, Ana

miP

atel

, Che

ng C

heng

, Dar

io C

ampa

na, D

awn

Wilk

ins,

Xia

odon

gZ

hou,

Jin

yan

Li, H

uiqi

ngLi

u, C

hin-

Hon

Pui

, Will

iam

E. E

vans

, Cla

yton

Nae

ve, L

imso

onW

ong,

Jam

es R

. D

owni

ng. C

lass

ifica

tion,

sub

type

dis

cove

ry, a

nd p

redi

ctio

n of

out

com

e in

ped

iatr

ic a

cute

lym

phob

last

icle

ukem

ia

by g

ene

expr

essi

on p

rofil

ing.

Can

cer

Cel

l, 1:

133-

-143

, Mar

ch 2

002.

Y

oon,

H.S

. and

Lee

, S.H

. and

Kim

, J.H

. App

licat

ion

of E

mer

ging

Pat

tern

s fo

r M

ulti-

sour

ce B

io-D

ata

Cla

ssifi

catio

n an

d A

naly

sis,

LE

CT

UR

E N

OT

ES

IN C

OM

PU

TE

R S

CIE

NC

E V

ol36

10, 2

005.

Y

u, L

.T.H

. and

Chu

ng, F

. and

Cha

n, S

.C.F

. and

Yue

n, S

.M.C

. Usi

ngem

ergi

ng p

atte

rn b

ased

pro

ject

ed

clus

terin

g an

d ge

ne e

xpre

ssio

n da

ta fo

r ca

ncer

det

ectio

n, P

roce

edin

gs o

f the

sec

ond

conf

eren

ce o

n A

sia-

Pac

ific

bioi

nfor

mat

ics,

pp7

5--8

4, 2

004.

Zha

ng, X

. and

Don

g, G

. and

Won

g, L

. Usi

ng C

AE

P to

pre

dict

tran

slat

ion

initi

atio

n si

tes

from

gen

omic

DN

A

sequ

ence

s, T

R20

01/2

2, C

SS

E, U

niv.

of M

elbo

urne

, 200

1.