Top Banner
Introduction Indexing RDF using inverted indexes Ranking based retrieval for RDF objects based on structured IR A Semantic Search evaluation framework The case study: Lucene Vs BM25F Conclusions and Future Work Using BM25F for Semantic Search Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor Fresno Metadata Research Center, UNC, UCM, UNED April 26 - 2010 Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor Fresno Using BM25F for Semantic Search
34

Using BM25F for Semantic Search

Nov 01, 2014

Download

Technology

Information Retrieval (IR) approaches for semantic web search engines have become very populars in the last years. Popularization of different IR libraries, like Lucene, that allows IR implementations almost out-of-the-box have make easier IR integration in Semantic Web search engines. However, one of the most important features of Semantic Web documents is the structure, since this structure allow us to represent
semantic in a machine readable format. In this paper we analyze the specific problems of structured IR and how to adapt weighting schemas for semantic document retrieval.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Using BM25F for Semantic Search

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, JoaquinPerez-Iglesias, Victor Fresno

Metadata Research Center, UNC, UCM, UNED

April 26 - 2010

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 2: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Outline

1 Introduction

2 Indexing RDF using inverted indexesIndexing based on links

3 Ranking based retrieval for RDF objects based on structured IRRanking for structured documentsDangers to combine scores from different document fields

4 A Semantic Search evaluation framework

5 The case study: Lucene Vs BM25FResults and discussion

6 Conclusions and Future Work

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 3: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Keyword-based Semantic Search

Keyword-based Semantic Web search engine development hasbecome a major research area garnering much attention in theSemantic Web community over the last seven years.

Just for the sake of curiosity

It is possible to improve quality results in terms of relevanceapplying just classical IR approaches to RDF semantic structure?

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 4: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Two main problems

Indexing RDF triples using inverted indexes

Ranking based retrieval for RDF objects

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 5: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Two main problems

Indexing RDF triples using inverted indexes

Ranking based retrieval for RDF objects

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 6: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Indexing based on links

Outline

1 Introduction

2 Indexing RDF using inverted indexesIndexing based on links

3 Ranking based retrieval for RDF objects based on structured IRRanking for structured documentsDangers to combine scores from different document fields

4 A Semantic Search evaluation framework

5 The case study: Lucene Vs BM25FResults and discussion

6 Conclusions and Future Work

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 7: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Indexing based on links

Problem

How to store RDF triples in inverted indexes OR how to representsubjects, predicates, and objects information in a n ×m matrix.

Solutions

SIREN (based on XML indexing techniques)

SEMPLORE model (based on the idea of artificial documentswith fields)

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 8: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Indexing based on links

Problem

How to store RDF triples in inverted indexes OR how to representsubjects, predicates, and objects information in a n ×m matrix.

Solutions

SIREN (based on XML indexing techniques)

SEMPLORE model (based on the idea of artificial documentswith fields)

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 9: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Indexing based on links

Problem

How to store RDF triples in inverted indexes OR how to representsubjects, predicates, and objects information in a n ×m matrix.

Solutions

SIREN (based on XML indexing techniques)

SEMPLORE model (based on the idea of artificial documentswith fields)

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 10: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Indexing based on links

We follow SEMPLORE model with some changes.

Index Structure based on SEMPLORE model

FIELD CONTENT

text plain text

title keywords from the URI

obj objects

inlinks incoming link defined by a predicate

type rdf:type

Table: Fields used to represent RDF structure in the inverted index.

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 11: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Indexing based on links

Two dbpedia entries

The Godfather http://dbpedia.org/page/The Godfather

Francis Ford Coppolahttp://dbpedia.org/page/Francis Ford Coppola

Triple

The Goodfather (Subject) - dbpprop:director (Predicate)- FrancisFord Coppola (Object)

Using inlink text to index the landing URL

The word director can be used as keyword to index the entrydescribed by this URIhttp://dbpedia.org/page/Francis Ford Coppola.

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 12: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Indexing based on links

Two dbpedia entries

The Godfather http://dbpedia.org/page/The Godfather

Francis Ford Coppolahttp://dbpedia.org/page/Francis Ford Coppola

Triple

The Goodfather (Subject) - dbpprop:director (Predicate)- FrancisFord Coppola (Object)

Using inlink text to index the landing URL

The word director can be used as keyword to index the entrydescribed by this URIhttp://dbpedia.org/page/Francis Ford Coppola.

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 13: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Indexing based on links

Two dbpedia entries

The Godfather http://dbpedia.org/page/The Godfather

Francis Ford Coppolahttp://dbpedia.org/page/Francis Ford Coppola

Triple

The Goodfather (Subject) - dbpprop:director (Predicate)- FrancisFord Coppola (Object)

Using inlink text to index the landing URL

The word director can be used as keyword to index the entrydescribed by this URIhttp://dbpedia.org/page/Francis Ford Coppola.

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 14: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Ranking for structured documentsDangers to combine scores from different document fields

Outline

1 Introduction

2 Indexing RDF using inverted indexesIndexing based on links

3 Ranking based retrieval for RDF objects based on structured IRRanking for structured documentsDangers to combine scores from different document fields

4 A Semantic Search evaluation framework

5 The case study: Lucene Vs BM25FResults and discussion

6 Conclusions and Future Work

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 15: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Ranking for structured documentsDangers to combine scores from different document fields

Classical IR

For long time, search engines have been dealing with flatdocuments, that is, without structure.

Consequence

The main consequence of this approach is the fact that termswithin a document are considered to have the same relevance (orvalue), disregarding their role in the document.

Simplification

This assumption implies a relevance model simplification based onbag of words, and, therefore, useful information is lost.

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 16: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Ranking for structured documentsDangers to combine scores from different document fields

Classical IR

For long time, search engines have been dealing with flatdocuments, that is, without structure.

Consequence

The main consequence of this approach is the fact that termswithin a document are considered to have the same relevance (orvalue), disregarding their role in the document.

Simplification

This assumption implies a relevance model simplification based onbag of words, and, therefore, useful information is lost.

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 17: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Ranking for structured documentsDangers to combine scores from different document fields

Classical IR

For long time, search engines have been dealing with flatdocuments, that is, without structure.

Consequence

The main consequence of this approach is the fact that termswithin a document are considered to have the same relevance (orvalue), disregarding their role in the document.

Simplification

This assumption implies a relevance model simplification based onbag of words, and, therefore, useful information is lost.

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 18: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Ranking for structured documentsDangers to combine scores from different document fields

Structured IR

Structured IR uses the document’s structure to identify wherethe most representative terms of the document are (e.g. title,abstract,HTML or XML tags, etc)

Boost factors are used to modify the impact of every term inthe ranking function in order to take into account thedocument’s structure.

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 19: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Ranking for structured documentsDangers to combine scores from different document fields

Ranking functions

State of the art models have been adapted to this situation.

BM25F

LM for structured documents

... but this adaptation have some tricks.

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 20: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Ranking for structured documentsDangers to combine scores from different document fields

The Problem

The linear combination of weigths for each field of the document isnot enough if a saturation function, like log(tf ) or

√tf is used in

the TF function

Figure: Source: Robertson et al.2004

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 21: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Ranking for structured documentsDangers to combine scores from different document fields

Lucene’s ranking function

The method used by Lucene to compute the score of an structureddocument is based on the linear combination of the scores for eachfield of the document.

score(q, d) =∑c∈d

score(q, c) (1)

wherescore(q, c) =

∑t∈q

tfc(t, d) ∗ idf (t) ∗ wc (2)

andtfc(t, d) =

√freq(t) (3)

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 22: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Outline

1 Introduction

2 Indexing RDF using inverted indexesIndexing based on links

3 Ranking based retrieval for RDF objects based on structured IRRanking for structured documentsDangers to combine scores from different document fields

4 A Semantic Search evaluation framework

5 The case study: Lucene Vs BM25FResults and discussion

6 Conclusions and Future Work

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 23: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

The collection

INEX evaluation framework fits good enough to the goal ofevaluating Semantic Search systems with small changes

We have mapped Dbpedia to the Wikipedia version used inthe INEX contest

Dbpedia entries contain semantic information drawn fromWikipedia pages

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 24: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

DBpedia entries: A sort of structured documents

http://dbpedia.org/resource/The Lord of the Rings

http://dbpedia.org/resource/Berlin

http://dbpedia.org/resource/Semantic Web

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 25: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Statistics of the collections

Currently Dbpedia contains almost three millions of entries and theINEX Wikipedia collection contains 2,666,190 documents. As aresult, our corpus only takes into account the 2,233,718 documentor entities that result from the intersection of both collections.

Topics (Queries)

Given the corpus, INEX 2009 topics and assessments are adaptedto this intersection. The result of this operation have been 68topics and a modified assessments file.

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 26: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Results and discussion

Outline

1 Introduction

2 Indexing RDF using inverted indexesIndexing based on links

3 Ranking based retrieval for RDF objects based on structured IRRanking for structured documentsDangers to combine scores from different document fields

4 A Semantic Search evaluation framework

5 The case study: Lucene Vs BM25FResults and discussion

6 Conclusions and Future Work

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 27: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Results and discussion

Semantic Search Engines

Sindice

Watson

Falcon

SEMPLORE

Everybody is using Lucene, but are they using Lucene’s rankingfunction? I don’t know.

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 28: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Results and discussion

Using TITLE, DESCRIPTION and NARRATIVE from Topics

MAP P@5 P@10 GMAP R-Prec

Lucene .1560 .4147 .3368 .0957 .2100

LuceneF .1200 .3971 .2971 .0578 .1632

BM25 .1746 ..4735 .3868 .1081 .2257

BM25F .1822 .4647 .3824 .1170 .2262

Table: MAP, P@5, P@10, GMAP, R-Prec for long queries. All thismeasures ranges from 0 to 1

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 29: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Results and discussion

Sensibility test for BM25F. All this measures ranges from 0 to 1.te = text, ti = title, in = inlinks, ob = obj , ty = type,all = allfields

te te+ti te+in te+ob te+ty all

MAP .1756 .1867 .1760 .1749 .1750 .1822

GMAP .1084 .1190 .1098 .1080 .1080 .1170

P@5 .4529 .4559 .4500 .4500 .4559 .4746P@10 .3882 .3941 .3897 .3853 .3853 .3824

Table: Sensibility test for BM25F. All this measures ranges from 0 to 1.te = text, ti = title, in = inlinks, ob = obj , ty = type, all = allfields

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 30: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Results and discussion

0.0 0.2 0.4 0.6

01

23

45

density.default(x = data4$map)

N = 69 Bandwidth = 0.03185

Den

sity

Figure: Density of the MAP values for different ranking approaches(BM25=blue, BM25F=red, Lucene=yellow, Lucene multifield=black)

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 31: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Outline

1 Introduction

2 Indexing RDF using inverted indexesIndexing based on links

3 Ranking based retrieval for RDF objects based on structured IRRanking for structured documentsDangers to combine scores from different document fields

4 A Semantic Search evaluation framework

5 The case study: Lucene Vs BM25FResults and discussion

6 Conclusions and Future Work

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 32: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Conclusions

Lucene hurts the retrieval performance, while BM25F doesnot when we are working on structured information, which isvery important for Semantic Search.

IR ranking functions are not able to take profit from thesemantic information contained in the fields with less text.

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 33: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

Future work

It is necessary more work on how to adapt IR rankingfunctions to Semantic Search

It is not trivial to use semantic information from the Web ofdata to improve search on the Web for keywords basedretrieval

It is important to identify what kind of information needs canbe solved using semantic information

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search

Page 34: Using BM25F for Semantic Search

IntroductionIndexing RDF using inverted indexes

Ranking based retrieval for RDF objects based on structured IRA Semantic Search evaluation framework

The case study: Lucene Vs BM25FConclusions and Future Work

BM25F implementation for Lucene is available

Joaquın Perez-Iglesias, Jose R. Perez-Aguera, Vıctor Fresno, YuvalZ. Feinstein: Integrating the Probabilistic Models BM25/BM25Finto Lucene CoRR abs/0911.5046: (2009)http://nlp.uned.es/ jperezi/Lucene-BM25/

Jose R. Perez-Aguera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias, Victor FresnoUsing BM25F for Semantic Search