Top Banner
A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.
23

A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

Dec 17, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

A Ranking Algorithm for Semantic search engine –

spam and fake detection case studyBy: Soheila Dehghanzadeh.Web technology lab weekly

seminars.

Page 2: A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

Agenda :

• Web spam definition• A brief overview of Search engines• Search engine phases:– Crawling– Indexing– Index lookup– Ranking lookup results

• My proposed ranking algorithm

Page 3: A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

Web spam and fake:

• In web of data anyone is able to say anything about anything.

• Low quality data should not be mentioned in top search results.

Page 4: A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

A Search Engine:

Page 5: A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

A Search Engine:

Page 6: A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

web of data vs. web of documents.

• WODocNo: link type and no trustworthiness (just popularity).

• WOData: should consider link type and link context (for provenance and proof of trust).

Page 7: A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

Crawling & Indexing phase…

• Using ldspider to crawl linked data.• Using hexastore for complete indexing the

crawled data. Special thanks to Panagiotis Karras for providing hexastore implementation in python.

Page 8: A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

•Index lookup results for extension…some Results may not include keyword but they have high quality and relevance.Result expansion to hide the locality effect. Some sites is referred many times but in this special context other professional sites lookup results are more interested.

Web of

data

result

Page 9: A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

HexaStore

• Index structure that we use in our search engine.• Each RDF element type deserve to have special

index structure build round it. • Every possible ordering of the importance or

precedence of the three elements in the indexing scheme is materialized.

• Each index structure in a hexastore centers around one RDF element and defines a perioritation between the other 2 elements.

Page 10: A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

Sample spo indexing in a hexastore

Si

P(I,1) P(I,2) P(I,Ni)

O(i1,1)

O(i1,2)

O(i1,ki1)

O(i2,1)

O(i2,2)

O(i2,ki2)

O(iNi,1)

O(iNi,2)

O(iNi,kiNi)

Space complexity :Spo+sp+pso+so+pos+po

Page 11: A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

My idea!• Import the base result set to jena and extend it.• Extending the base set with ontology reasoning rules so that extra resources and

relations will be added through reasoning rules.• The added resources• The added relation has no context so their trustworthiness is an aggregation

function on (x,y,rule) relations---• Resources will be added only through sameAs predicate • Resources will be ranked according to relevance to query terms (using ontobroker

– pagerank – objectrank- triplerank – HITS,….)• Query

– Keyword query– Structured query– Ontology based query (using an interface to get query) - ontobroker

• Relation (properties) will be ranked according to contexts(provenance) using relation ranking methods such as semRank or we can look at context’s pageRank.

• Note that First we rank resources and second we rank relations . However it depends on the user query whether it is looking for relations or resources.

Page 12: A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

• Lookup on quads for keyword (Soheila)• Q1: http://um.s11,givenname,”Soheila”,UM• Q2:http://NIOC/p25,fullname,”Soheila Dehghan”,NIOC• Q3:http://nigc-khrz/e66,firstname,”Soheila”,NIGC• Q4: http://fake/f4,name,”Soheila”,fake

Q1http://

facebook/u122http://isport

/us122SA(UM) SA(FB)

Q3SA(UM)

Q2SA(NIGC)

http://linkedIn/u12

SA(LI)

Q4SA(FK)

SA(FK) SA(FK)

CheeseB.Gates

ScottDancewith(FK)

Meet(CNN)

Buy(Spam)

Page 13: A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

Result set expansion methods:

• step1: using sameas predicate on found Qaudes and extend ResultSet to Q1,…,Qr

• index LookUp– Q1(S),SameAs,?,?Qr(S),SameAs,?,?– ?,SameAs,Q1(S),? ?,SameAs,Qr(S),?

• (Q1,…,Q4Q1,…,Q4,FBURI,LinkedInURI,isportURI) in our case.• apply PR on Extended graph with SameAs which

SameAs links are replaced with PR weight of sameAs context.(to know the trustwothiness of each contexts).

Page 14: A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

Result set expansion methods:• Step2: LookUp all properties of Q1(s),…,Qr(s)– Q1(s),?,?,?—?,?,Q1(s),?– …– Qr(s),?,?,?—?,?,Qr(s),?

• Step4: add inferred relation using domain ontology(context is composed of ontology+inference process)• Step4: rank Q1,…,Qr according to their TpageRank

(computed online from graph of step1 ), rank relations according to their context pageRank(which is computed by Google offline)

• Note : contexts who has PR lower than a treshhold won’t be mentioned.they maybe Spam or Fake Sites.

Page 15: A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

Structured query on quads indexes

• Single pivot:– (S,?,?,?),(?,p,?,?),(?,?,o.?),(?,?,?.C)

• Double pivot:– (S,p,?,?),(s,?,O,?),(s,?,?,C),(?,P,O,?),(?,P,?,C),

(?,?,o,C)

• Triple pivot:– (s,p,o,?),(s,p,?,c),(s,?,o,c),(?,p,o,c)

Page 16: A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

• Step1: if the specified parts was URI then a direct lookup is performed by search engine . Otherwise if user have specified keyword for each parts then firstly a keyword search will be done and then for each result URI a lookup will be performed.

Page 17: A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

Lookup on quads for ontological queries

Page 18: A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

Soheila(FUM)

FUM

GAS

NIOC Team

Sally(NIOC)

Studied in(FUM)

Worked at(GAS)

played in(NIOC)

Owl:sameAs(NIOC)

DehghanZadeh(GAS)

Owl:sameAs(FUM)

Kahani(FUM)

Supervisor(FUM)

Page 19: A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

Related works for ranking web of data…

• Objectrank• Ding• Sindice ti-idf• EntityRank.• semRank• ReConRank• ontobroker…

Page 20: A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

Proof of trust

• Jena inference Explanation will be used to represent as a proof of trust

Page 21: A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

Evaluation …

• Compare Spam ranks • Compare query time• Compare index size

Page 22: A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

Any question?

Page 23: A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

Best things in the life are

free.Thanks for attention.