Top Banner
Similarity Evaluation Techniques for Filtering Problems ? ? Vagan Terziyan Vagan Terziyan University of Jyvaskyla University of Jyvaskyla [email protected]
50

Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla [email protected].

Mar 29, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Similarity Evaluation Techniques for Filtering Problems

??Vagan TerziyanVagan TerziyanUniversity of JyvaskylaUniversity of Jyvaskyla

[email protected]

Page 2: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Evaluating Distance between Various Domain Objects and Concepts - one of the basic abilities of an intelligent agent

Are these two the same?

… No !The difference is

equal to 0.234

Page 3: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Contents

Goal Basic Concepts External Similarity Evaluation An Example Internal Similarity Evaluation Conclusions

Page 4: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Reference

Puuronen S., Terziyan V., A Similarity Evaluation A Similarity Evaluation Technique for Data Mining with an Ensemble of Technique for Data Mining with an Ensemble of ClassifiersClassifiers, In: A.M. Tjoa, R.R. Wagner and A. Al-Zobaidie (Eds.), Proc. of the 11th Intern. Workshop on Database and Expert Systems Applications, IEEE CS Press, Los Alamitos, California, 2000, pp. 1155-1159. http://dlib.computer.org/conferen/dexa/0680/pdf/06801155.pdf

Page 5: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Goal

The goal of this research is to develop simple similarity evaluation technique to be used for social filtering

Result of social filtering here here is prediction of a customer’s evaluation of certain product based on known opinions about this product from other customers

Page 6: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Basic Concepts:Virtual Training Environment (VTE)

VTE is a quadruple:

<D,C,S,P>• D is the set of goods D1, D2,..., Dn in the VTE;

• C is the set of evaluation marks C1, C2,..., Cm , that are used to rank the products;

• S is the set of customers S1, S2,..., Sr , who select evaluation marks to rank the products;

• P is the set of semantic predicates that define relationships between D, C, S

Page 7: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Basic Concepts:Semantic Predicate P

. te D to evalualect C to se

refuseselect or does not ,if S

;aluate D to ev

o select C refuses t,if S

;D product aluate the to ev

ark C selects mstomer S,if the cu

),S,CP(D

ij

k

i

jk

i

jk

kji

0

1

1

Page 8: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Problem 1:Deriving External Similarity Values

DC

S

DiCj

Sk

SDk,i

DCi,j

SCk,j

Page 9: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

External Similarity Values

External Similarity Values (ESV): binary relations DC, SC, and SD between the elements of (sub)sets of D and C; S and C; and S and D.

ESV are based on total support among all the customers for voting for the appropriate connection (or refusal to vote)

DC

S

DiCj

Sk

SDk,i

DCi,j

SCk,j

Page 10: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Problem 2:Deriving Internal Similarity Values

D C

S

Di’

SSk’,k’’

DDi’,i’’ CCj’,j’’

Di’’

Cj’

Cj’’

Sk’

Sk’’

Page 11: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Internal Similarity Values

Internal Similarity Values (ISV): binary relations between two subsets of D, two subsets of C and two subsets of S.

ISV are based on total support among all the customers for voting for the appropriate connection (or refusal to vote)

D C

S

Di’

SSk’,k’’

DDi’,i’’ CCj’,j’’

Di’’

Cj’

Cj’’

Sk’

Sk’’

Page 12: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Why we Need Similarity Values (or Distance Measure) ? Distance between products is used to advertise the

customers a new product based on evaluation of already known similar products

distance between evaluations is necessary to estimate evaluation error when necessary, e.g. in the case of adaptive filtering technologies used

distance between customers is useful to evaluate weights of all customers when necessary, e.g. to be able to integrate their opinions by weighted voting.

Page 13: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Deriving External Relation DC:How well evaluation fits the product

DC CD P D C S D D C Ci j j i i j k i jk

r

, , ( , , ), ,

DC

S

DiCj

Sk2

DCi,j=3

Sk1

Sk3

Customers

Products Evaluation marks

Page 14: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Deriving External Relation SC:Measures customer’s competence in the use of evaluation marks

The value of the relation (Sk,Cj) in a way represents the total support that the customer Sk obtains selecting (refusing to select) the mark Cj to evaluate all the products.

SC CS DC P D C S S S C Ck j j k i j i j ki

n

k j, , , ( , , ), ,

Page 15: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Example of SC Relation

DC

SSk

Cj

D2

SCk,j=4

D1

D4

D3

CDj1 = -3

CDj2 = 6

CDj3 = 0

CDj4 = 1

Customers

Products Evaluation marks

Page 16: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Deriving External Relation SD:Measures customer’s competence in the products

The value of the relation (Sk,Di) represents the total support that the agent Sk receives selecting (or refusing to select) all the solutions to solve the problem Di.

SD DS DC P D C S S S D Dk i i k i j i j kj

m

k i, , , ( , , ), ,

Page 17: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Example of SD Relation

DC

SSk

Di

C1

SDk,i=2

C2

CD1i = -3

CD2i = 5

ProductsEvaluation marks

Customers

Page 18: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Normalizing External Relations to the Interval [0,1]

min(value)-max(value)

)valuemin( value=value valuegnormalizin

-

DC CDDC r

ri j j ii j

, ,,

2

SC CSSC n r

n rk j j kk j

, ,, ( )

( )

2

2 1

SD DSSD m r

m rk i i kk i

, ,, ( )

( )

2

2 1

n is the number of products

m is the number of evaluation marks

r is the number of customers

Page 19: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Competence of a customer

Di

Conceptual pattern of goods’ features

Conceptual pattern of evaluation marks definitions

GoodsEvaluation

marks

Cj

Customer

Competence in the goods

Competence in the evaluation marks

Page 20: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Customer’s Evaluation:competence quality in Products

Q Sn

SDDk k i

i

n( ) , 1

Page 21: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Customer’s Evaluation:competence quality in evaluation marks use

Q Sm

SCCk k j

j

m( ) , 1

Page 22: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Quality Balance Theorem

Q S Q SDk

Ck( ) ( )

The evaluation of a customer’s competence (ranking, weighting, quality evaluation) does not depend on the competence area “virtual world of products” or “conceptual world of evaluation marks” because both competence values are always equal.

Page 23: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Proof

Q Sn

SDn

SD m r

m rD

k k ii

nk i

i

n( )

( )

( ),,

1 1 2

2 1

1

2

2 1n

DC P D C S m r

m r

i j i j kj

m

i

n( ( , , )) ( )

( )

,

1

2

2 1m

DC P D C S n r

n r

i j i j ki

n

j

m( ( , , )) ( )

( )

,

...

...

1 2

2 1

1

m

SC n r

n r mSC Q S

k j

j

m

k jj

mC

k,

,

( )

( )( )

Page 24: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

An Example

Let us suppose that four customers have to evaluate three products from virtual shop using five different evaluation marks available.

The customers should define their selection of appropriate mark for every product.

The final goal is to obtain a cooperative evaluation result of all the customers concerning the quality of products.

Page 25: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

C set (evaluation marks) in the Example

Evaluation marks Notation

Nicely designed C1

Expensive C2

Easy to use C3

Reliable C4

Safe C5

Page 26: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

S (customers) Set in the Example

Customers IDs Notation

Fox S1

Wolf S2

Cat S3

Hare S4

Page 27: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

D (products) Set in the Example

D2 - Nokia Communicator 9110

D1 - Ultra Cast Spinning Reel

D3 - iGrafx Process Management

Software

Page 28: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Evaluations Made for the Good “Reel”

D1

P(D,C,S) C1 C2 C3 C4 C5

S1 1 -1 -1 0 -1

S2 0+ -1** 0 ++ 1* -1***

S3 0 0 -1 1 0

S4 1 -1 0 0 1Customer Wolf prefers to select mark Reliable* to evaluate “Reel” and it refuses to select Expensive** or Safe***. Wolf does not use or refuse to use the Nicely designed+ or Easy to use++ marks for evaluation.

Page 29: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Evaluations Made for the Good “Communicator”

D2

P C1 C2 C3 C4 C5

S1 -1 0 -1 0 1

S2 1 -1 -1 0 0

S3 1 -1 0 1 1

S4 -1 0 0 1 0

Page 30: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Evaluations Made for the Good “Software”

D3

P C1 C2 C3 C4 C5

S1 1 0 1 -1 0

S2 0 1 0 -1 1

S3 -1 -1 1 -1 1

S4 -1 -1 1 -1 1

Page 31: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Example: Calculating Value DC3,4

D3

P C1 C2 C3 C4 C5

S1 1 0 1 -1 0

S2 0 1 0 -1 1

S3 -1 -1 1 -1 1

S4 -1 -1 1 -1 1

r

kjikjiijji CCDDSCDPCDDC ,),,,(,,

4)1()1()1()1(),,(4

434,3 k

kSCDPDC

Page 32: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Resulting DC relation

DC C1 C2 C3 C4 C5

D1 2 -3 -2 2 -1

D2 0 -2 -2 2 2

D3 -1 -1 3 -4 3

Page 33: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Normalized and “Thresholded” DC relation

[DC] C1 C2 C3 C4 C5

D1 0.75 0.125 0.25 0.75 0.375D2 0.5 0.25 0.25 0.75 0.75D3 0.375 0.375 0.875 0 0.875

[DC] 0.75 C1 C2 C3 C4 C5

D1 1 -1 -1 1 0

D2 0 -1 -1 1 1

D3 0 0 1 -1 1

0 10.50.25 0.75

0 1-1

Page 34: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Result of Cooperative Goods Evaluation Based on DC Relation

D2 is reliable, safe, not expensive,

but not easy to use

D1 is nicely designed, reliable, not

expensive, but not easy to use

D3 is easy to use, safe, but not

reliable

Page 35: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

An Example: Calculating Value SD1,1

D1

P C1 C2 C3 C4 C5

S1 1 -1 -1 0 -1S2 0 -1 0 1 -1

S3 0 0 -1 1 0

S4 1 -1 0 0 1

DC C1 C2 C3 C4 C5

D1 2 -3 -2 2 -1

D2 0 -2 -2 2 2

D3 -1 -1 3 -4 3

8)1()1(02)1()2()1()3(12),,(5

11,11,1 j

jj SCDPDCSD

m

jikkjijikiik DDSSSCDPDCDSSD ,),,,(,,,

Page 36: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

An Example: Calculating Value SC4,4

n

ijkkjijikjjk CCSSSCDPDCCSSC ,),,,(,,,

DC C1 C2 C3 C4C5

D1 2 -3 -2 2 -1

D2 0 -2 -2 2 2

D3 -1 -1 3 -4 3

P C1 C2 C3 C4C5

D1

S41 -1 0 0 1

D2

S4-1 0 0 1 0

D3

S4-1 -1 1 -1 1

6)1()4(1202),,(3

444,4,4 i

ii SCDPDCSC

Page 37: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Resulting SD and SC relations

SD D1 D2 D3

S1 8 4 6

S2 6 4 6

S3 4 6 12

S4 4 2 12

SC C1 C2 C3 C4 C5

S1 1 3 7 4 3

S2 0 4 2 6 4

S3 1 3 5 8 5

S4 3 4 3 6 2

Page 38: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

… or similar to “Software” .

Fox’s evaluations should be rejected ifthey concern goods similar to “Communicator”

Evaluations obtained from thecustomer Fox should be accepted if heevaluates goods similar to “Reels” ...

Normalized and “Thresholded” SD relation

[SD] 0.75 D1 D2 D3

S1 1 -1 1

S2 1 -1 1

S3 -1 1 1

S4 -1 -1 1

FoxWolfCatHare

Page 39: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Only evaluation from the customerCat can be accepted if it concernsgoods similar to “Communicator”

Normalized and “Thresholded” SD relation

[SD] 0.75 D1 D2 D3

S1 1 -1 1

S2 1 -1 1

S3 -1 1 1

S4 -1 -1 1

FoxWolfCatHare

All four customers are expectedto give an acceptable evaluations

concerning “Software” related goods

Page 40: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

… or reliability of a good .

Evaluation obtained from the customer Fox should be accepted if it concern usability (easy to use) of a good...

Fox’s evaluations should be rejected

if they concern design of goods

Normalized and “Thresholded” SC relation

[SC]0.75 C1 C2 C3 C4 C5

S1 -1 0 1 1 0

S2 -1 1 -1 1 1

S3 -1 0 1 1 1

S4 0 1 0 1 -1

FoxWolfCatHare

Nicely designed Expensive

Easy to use Reliable Safe

Page 41: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Problem 2:Deriving Internal Similarity Values

D C

S

Di’

SSk’,k’’

DDi’,i’’ CCj’,j’’

Di’’

Cj’

Cj’’

Sk’

Sk’’

Page 42: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Internal Similarity Values

Internal Similarity Values (ISV): binary relations between two subsets of D, two subsets of C and two subsets of S.

ISV are based on total support among all the customers for voting for the appropriate connection (or refusal to vote)

D C

S

Di’

SSk’,k’’

DDi’,i’’ CCj’,j’’

Di’’

Cj’

Cj’’

Sk’

Sk’’

Page 43: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Deriving Internal Similarity Values

Set A Set I

A’

A”

A’I

IA”

A’A”I

A’

A”

a)

Set A

Set I

A’

A”

A’I

JA”

A’A”IJ

A’

A”

b)

Set J

IJ

Via one intermediate set Via two intermediate sets

Page 44: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Internal Similarity for Customers:Goods-based Similarity

D C

SS’S’’D

S’’

S’DS’’

S’D

S S S S S S S D DSD' '' ' '' ' '',

Goods

Customers

Page 45: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Internal Similarity for Customers:Evaluation marks-Based Similarity

D C

SS’S’’C

S’’

S’

CS’’

S’C

S S S S S S S C CSC' '' ' '' ' '',

Evaluation marks

Customers

Page 46: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Internal Similarity for Customers:Evaluation marks-Goods-Based Similarity

D C

SS’S’’CD

S’’

S’DS’’S’C

CD

S S S S S S S C CD DSCD' '' ' '' ' '',

Customers

Evaluation marks

Goods

Page 47: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Internal Similarity for Evaluation Marks

DC

S

C’C’’S

C’’

C’

C’S

SC’’

DC

S

C’C’’D

C’’

C’C’D

DC’’

DC

S

C’C’’DS

C’’

C’C’D

SC’’DS

Customers-based similarity Goods-based similarity

Goods-customers-based similarity

Page 48: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Internal Similarity for Goods

Customers-based similarity Evaluation marks-based similarity

Evaluation marks-customers-based similarity

DC

S

D’D’’S

D’’

D’

D’S

SD’’

DC

S

D’D’’C

D’’

D’ D’C

CD’’

DC

S

D’D’’CS

D’’

D’ D’C

SD’’CS

Page 49: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Normalized and “Thresholded” DDC relation

[CD] 0.75 D1 D2 D3

C1 1 0 0

C2 -1 -1 0

C3 -1 -1 1

C4 1 1 -1

C5 0 1 1

[DC] 0.75 C1 C2 C3 C4 C5

D1 1 -1 -1 1 0

D2 0 -1 -1 1 1

D3 0 0 1 -1 1

[DD] 0.75 D1 D2 D3

D1 1 1 -1

D2 1 1 0

D3 -1 0 1

similar

neutral

different

Page 50: Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi.

Conclusion

Discussion was given to methods of deriving the total support of each binary similarity relation. This can be used, for example, to derive the most supported goods evaluation and to rank the customers according to their competence

We also discussed relations between elements taken from the same set: goods, evaluation marks, or customers. This can be used, for example, to divide customers into groups of similar competence relatively to the goods evaluation environment