Chances and Challenges in Comparing Cross-Language Retrieval Tools

Post on 11-Nov-2014

527 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation at IRF symposium 2010 Vienna June 3, 2010

Transcript

Chances and Challenges in ComparingCross-Language Retrieval Tools

Giovanna RodaVienna, Austria

Irf Symposium 2010 / June 3, 2010

CLEF-IP: the Intellectual Property track at CLEF

CLEF-IP is an evaluation track within the Cross LanguageEvaluation Forum (Clef). 1

organized by the IRF

first track ran in 2009

running this year for the second time

1http://www.clef-campaign.org

CLEF-IP: the Intellectual Property track at CLEF

CLEF-IP is an evaluation track within the Cross LanguageEvaluation Forum (Clef). 1

organized by the IRF

first track ran in 2009

running this year for the second time

1http://www.clef-campaign.org

CLEF-IP: the Intellectual Property track at CLEF

CLEF-IP is an evaluation track within the Cross LanguageEvaluation Forum (Clef). 1

organized by the IRF

first track ran in 2009

running this year for the second time

1http://www.clef-campaign.org

CLEF-IP: the Intellectual Property track at CLEF

CLEF-IP is an evaluation track within the Cross LanguageEvaluation Forum (Clef). 1

organized by the IRF

first track ran in 2009

running this year for the second time

1http://www.clef-campaign.org

CLEF-IP: the Intellectual Property track at CLEF

CLEF-IP is an evaluation track within the Cross LanguageEvaluation Forum (Clef). 1

organized by the IRF

first track ran in 2009

running this year for the second time

1http://www.clef-campaign.org

What is an evaluation track?

An evaluation track in Information Retrieval is a cooperative actionaimed at comparing different techniques on a common retrievaltask.

produces experimental data that can be analyzed and used toimprove existing systems

fosters exchange of ideas and cooperation

produces a reusable test collection, sets milestones

Test collection

A test collection consists traditionally of target data, a set ofqueries, and relevance assessments for each query.

What is an evaluation track?

An evaluation track in Information Retrieval is a cooperative actionaimed at comparing different techniques on a common retrievaltask.

produces experimental data that can be analyzed and used toimprove existing systems

fosters exchange of ideas and cooperation

produces a reusable test collection, sets milestones

Test collection

A test collection consists traditionally of target data, a set ofqueries, and relevance assessments for each query.

What is an evaluation track?

An evaluation track in Information Retrieval is a cooperative actionaimed at comparing different techniques on a common retrievaltask.

produces experimental data that can be analyzed and used toimprove existing systems

fosters exchange of ideas and cooperation

produces a reusable test collection, sets milestones

Test collection

A test collection consists traditionally of target data, a set ofqueries, and relevance assessments for each query.

What is an evaluation track?

An evaluation track in Information Retrieval is a cooperative actionaimed at comparing different techniques on a common retrievaltask.

produces experimental data that can be analyzed and used toimprove existing systems

fosters exchange of ideas and cooperation

produces a reusable test collection, sets milestones

Test collection

A test collection consists traditionally of target data, a set ofqueries, and relevance assessments for each query.

What is an evaluation track?

An evaluation track in Information Retrieval is a cooperative actionaimed at comparing different techniques on a common retrievaltask.

produces experimental data that can be analyzed and used toimprove existing systems

fosters exchange of ideas and cooperation

produces a reusable test collection, sets milestones

Test collection

A test collection consists traditionally of target data, a set ofqueries, and relevance assessments for each query.

Clef–Ip 2009: the task

The main task in the Clef–Ip track was to find prior art for agiven patent.

Prior art search

Prior art search consists in identifying all information (includingnon-patent literature) that might be relevant to a patent’s claim ofnovelty.

Clef–Ip 2009: the task

The main task in the Clef–Ip track was to find prior art for agiven patent.

Prior art search

Prior art search consists in identifying all information (includingnon-patent literature) that might be relevant to a patent’s claim ofnovelty.

Participants - 2009 track

1 Tech. Univ. Darmstadt, Dept. of CS,Ubiquitous Knowledge Processing Lab (DE)

2 Univ. Neuchatel - Computer Science (CH)

3 Santiago de Compostela Univ. - Dept.Electronica y Computacion (ES)

4 University of Tampere - Info Studies (FI)

5 Interactive Media and Swedish Institute ofComputer Science (SE)

6 Geneva Univ. - Centre Universitaired’Informatique (CH)

7 Glasgow Univ. - IR Group Keith (UK)

8 Centrum Wiskunde & Informatica - InteractiveInformation Access (NL)

Participants - 2009 track

1 Tech. Univ. Darmstadt, Dept. of CS,Ubiquitous Knowledge Processing Lab (DE)

2 Univ. Neuchatel - Computer Science (CH)

3 Santiago de Compostela Univ. - Dept.Electronica y Computacion (ES)

4 University of Tampere - Info Studies (FI)

5 Interactive Media and Swedish Institute ofComputer Science (SE)

6 Geneva Univ. - Centre Universitaired’Informatique (CH)

7 Glasgow Univ. - IR Group Keith (UK)

8 Centrum Wiskunde & Informatica - InteractiveInformation Access (NL)

Participants - 2009 track

1 Tech. Univ. Darmstadt, Dept. of CS,Ubiquitous Knowledge Processing Lab (DE)

2 Univ. Neuchatel - Computer Science (CH)

3 Santiago de Compostela Univ. - Dept.Electronica y Computacion (ES)

4 University of Tampere - Info Studies (FI)

5 Interactive Media and Swedish Institute ofComputer Science (SE)

6 Geneva Univ. - Centre Universitaired’Informatique (CH)

7 Glasgow Univ. - IR Group Keith (UK)

8 Centrum Wiskunde & Informatica - InteractiveInformation Access (NL)

Participants - 2009 track

1 Tech. Univ. Darmstadt, Dept. of CS,Ubiquitous Knowledge Processing Lab (DE)

2 Univ. Neuchatel - Computer Science (CH)

3 Santiago de Compostela Univ. - Dept.Electronica y Computacion (ES)

4 University of Tampere - Info Studies (FI)

5 Interactive Media and Swedish Institute ofComputer Science (SE)

6 Geneva Univ. - Centre Universitaired’Informatique (CH)

7 Glasgow Univ. - IR Group Keith (UK)

8 Centrum Wiskunde & Informatica - InteractiveInformation Access (NL)

Participants - 2009 track

1 Tech. Univ. Darmstadt, Dept. of CS,Ubiquitous Knowledge Processing Lab (DE)

2 Univ. Neuchatel - Computer Science (CH)

3 Santiago de Compostela Univ. - Dept.Electronica y Computacion (ES)

4 University of Tampere - Info Studies (FI)

5 Interactive Media and Swedish Institute ofComputer Science (SE)

6 Geneva Univ. - Centre Universitaired’Informatique (CH)

7 Glasgow Univ. - IR Group Keith (UK)

8 Centrum Wiskunde & Informatica - InteractiveInformation Access (NL)

Participants - 2009 track

1 Tech. Univ. Darmstadt, Dept. of CS,Ubiquitous Knowledge Processing Lab (DE)

2 Univ. Neuchatel - Computer Science (CH)

3 Santiago de Compostela Univ. - Dept.Electronica y Computacion (ES)

4 University of Tampere - Info Studies (FI)

5 Interactive Media and Swedish Institute ofComputer Science (SE)

6 Geneva Univ. - Centre Universitaired’Informatique (CH)

7 Glasgow Univ. - IR Group Keith (UK)

8 Centrum Wiskunde & Informatica - InteractiveInformation Access (NL)

Participants - 2009 track

1 Tech. Univ. Darmstadt, Dept. of CS,Ubiquitous Knowledge Processing Lab (DE)

2 Univ. Neuchatel - Computer Science (CH)

3 Santiago de Compostela Univ. - Dept.Electronica y Computacion (ES)

4 University of Tampere - Info Studies (FI)

5 Interactive Media and Swedish Institute ofComputer Science (SE)

6 Geneva Univ. - Centre Universitaired’Informatique (CH)

7 Glasgow Univ. - IR Group Keith (UK)

8 Centrum Wiskunde & Informatica - InteractiveInformation Access (NL)

Participants - 2009 track

1 Tech. Univ. Darmstadt, Dept. of CS,Ubiquitous Knowledge Processing Lab (DE)

2 Univ. Neuchatel - Computer Science (CH)

3 Santiago de Compostela Univ. - Dept.Electronica y Computacion (ES)

4 University of Tampere - Info Studies (FI)

5 Interactive Media and Swedish Institute ofComputer Science (SE)

6 Geneva Univ. - Centre Universitaired’Informatique (CH)

7 Glasgow Univ. - IR Group Keith (UK)

8 Centrum Wiskunde & Informatica - InteractiveInformation Access (NL)

Participants - 2009 track

9 Geneva Univ. Hospitals - Service of MedicalInformatics (CH)

10 Humboldt Univ. - Dept. of German Languageand Linguistics (DE)

11 Dublin City Univ. - School of Computing (IE)

12 Radboud Univ. Nijmegen - Centre for LanguageStudies & Speech Technologies (NL)

13 Hildesheim Univ. - Information Systems &Machine Learning Lab (DE)

14 Technical Univ. Valencia - Natural LanguageEngineering (ES)

15 Al. I. Cuza University of Iasi - Natural LanguageProcessing (RO)

Participants - 2009 track

9 Geneva Univ. Hospitals - Service of MedicalInformatics (CH)

10 Humboldt Univ. - Dept. of German Languageand Linguistics (DE)

11 Dublin City Univ. - School of Computing (IE)

12 Radboud Univ. Nijmegen - Centre for LanguageStudies & Speech Technologies (NL)

13 Hildesheim Univ. - Information Systems &Machine Learning Lab (DE)

14 Technical Univ. Valencia - Natural LanguageEngineering (ES)

15 Al. I. Cuza University of Iasi - Natural LanguageProcessing (RO)

Participants - 2009 track

9 Geneva Univ. Hospitals - Service of MedicalInformatics (CH)

10 Humboldt Univ. - Dept. of German Languageand Linguistics (DE)

11 Dublin City Univ. - School of Computing (IE)

12 Radboud Univ. Nijmegen - Centre for LanguageStudies & Speech Technologies (NL)

13 Hildesheim Univ. - Information Systems &Machine Learning Lab (DE)

14 Technical Univ. Valencia - Natural LanguageEngineering (ES)

15 Al. I. Cuza University of Iasi - Natural LanguageProcessing (RO)

Participants - 2009 track

9 Geneva Univ. Hospitals - Service of MedicalInformatics (CH)

10 Humboldt Univ. - Dept. of German Languageand Linguistics (DE)

11 Dublin City Univ. - School of Computing (IE)

12 Radboud Univ. Nijmegen - Centre for LanguageStudies & Speech Technologies (NL)

13 Hildesheim Univ. - Information Systems &Machine Learning Lab (DE)

14 Technical Univ. Valencia - Natural LanguageEngineering (ES)

15 Al. I. Cuza University of Iasi - Natural LanguageProcessing (RO)

Participants - 2009 track

9 Geneva Univ. Hospitals - Service of MedicalInformatics (CH)

10 Humboldt Univ. - Dept. of German Languageand Linguistics (DE)

11 Dublin City Univ. - School of Computing (IE)

12 Radboud Univ. Nijmegen - Centre for LanguageStudies & Speech Technologies (NL)

13 Hildesheim Univ. - Information Systems &Machine Learning Lab (DE)

14 Technical Univ. Valencia - Natural LanguageEngineering (ES)

15 Al. I. Cuza University of Iasi - Natural LanguageProcessing (RO)

Participants - 2009 track

9 Geneva Univ. Hospitals - Service of MedicalInformatics (CH)

10 Humboldt Univ. - Dept. of German Languageand Linguistics (DE)

11 Dublin City Univ. - School of Computing (IE)

12 Radboud Univ. Nijmegen - Centre for LanguageStudies & Speech Technologies (NL)

13 Hildesheim Univ. - Information Systems &Machine Learning Lab (DE)

14 Technical Univ. Valencia - Natural LanguageEngineering (ES)

15 Al. I. Cuza University of Iasi - Natural LanguageProcessing (RO)

Participants - 2009 track

9 Geneva Univ. Hospitals - Service of MedicalInformatics (CH)

10 Humboldt Univ. - Dept. of German Languageand Linguistics (DE)

11 Dublin City Univ. - School of Computing (IE)

12 Radboud Univ. Nijmegen - Centre for LanguageStudies & Speech Technologies (NL)

13 Hildesheim Univ. - Information Systems &Machine Learning Lab (DE)

14 Technical Univ. Valencia - Natural LanguageEngineering (ES)

15 Al. I. Cuza University of Iasi - Natural LanguageProcessing (RO)

Participants - 2009 track

15 participants

48 experimentssubmitted for the maintask

10 experimentssubmitted for thelanguage tasks

Participants - 2009 track

15 participants

48 experimentssubmitted for the maintask

10 experimentssubmitted for thelanguage tasks

Participants - 2009 track

15 participants

48 experimentssubmitted for the maintask

10 experimentssubmitted for thelanguage tasks

Participants - 2009 track

15 participants

48 experimentssubmitted for the maintask

10 experimentssubmitted for thelanguage tasks

2009-2010: participants

2009-2010: evolution of the CLEF-IP track

2009

2010

1 task: prior art search

prior art candidate searchand classification task

targeting granted patents

patent applications

15 participants

20 participants

all from academia

4 industrial participants

families and citations

include forward citations

manual assessments

expanded lists of relevantdocs

standard evaluation mea-sures

new measure: pres, morerecall-oriented

2009-2010: evolution of the CLEF-IP track

2009

2010

1 task: prior art search

prior art candidate searchand classification task

targeting granted patents

patent applications

15 participants

20 participants

all from academia

4 industrial participants

families and citations

include forward citations

manual assessments

expanded lists of relevantdocs

standard evaluation mea-sures

new measure: pres, morerecall-oriented

2009-2010: evolution of the CLEF-IP track

2009 2010

1 task: prior art search

prior art candidate searchand classification task

targeting granted patents

patent applications

15 participants

20 participants

all from academia

4 industrial participants

families and citations

include forward citations

manual assessments

expanded lists of relevantdocs

standard evaluation mea-sures

new measure: pres, morerecall-oriented

2009-2010: evolution of the CLEF-IP track

2009 2010

1 task: prior art search prior art candidate searchand classification task

targeting granted patents

patent applications

15 participants

20 participants

all from academia

4 industrial participants

families and citations

include forward citations

manual assessments

expanded lists of relevantdocs

standard evaluation mea-sures

new measure: pres, morerecall-oriented

2009-2010: evolution of the CLEF-IP track

2009 2010

1 task: prior art search prior art candidate searchand classification task

targeting granted patents patent applications

15 participants

20 participants

all from academia

4 industrial participants

families and citations

include forward citations

manual assessments

expanded lists of relevantdocs

standard evaluation mea-sures

new measure: pres, morerecall-oriented

2009-2010: evolution of the CLEF-IP track

2009 2010

1 task: prior art search prior art candidate searchand classification task

targeting granted patents patent applications

15 participants 20 participants

all from academia

4 industrial participants

families and citations

include forward citations

manual assessments

expanded lists of relevantdocs

standard evaluation mea-sures

new measure: pres, morerecall-oriented

2009-2010: evolution of the CLEF-IP track

2009 2010

1 task: prior art search prior art candidate searchand classification task

targeting granted patents patent applications

15 participants 20 participants

all from academia 4 industrial participants

families and citations

include forward citations

manual assessments

expanded lists of relevantdocs

standard evaluation mea-sures

new measure: pres, morerecall-oriented

2009-2010: evolution of the CLEF-IP track

2009 2010

1 task: prior art search prior art candidate searchand classification task

targeting granted patents patent applications

15 participants 20 participants

all from academia 4 industrial participants

families and citations include forward citations

manual assessments

expanded lists of relevantdocs

standard evaluation mea-sures

new measure: pres, morerecall-oriented

2009-2010: evolution of the CLEF-IP track

2009 2010

1 task: prior art search prior art candidate searchand classification task

targeting granted patents patent applications

15 participants 20 participants

all from academia 4 industrial participants

families and citations include forward citations

manual assessments expanded lists of relevantdocs

standard evaluation mea-sures

new measure: pres, morerecall-oriented

2009-2010: evolution of the CLEF-IP track

2009 2010

1 task: prior art search prior art candidate searchand classification task

targeting granted patents patent applications

15 participants 20 participants

all from academia 4 industrial participants

families and citations include forward citations

manual assessments expanded lists of relevantdocs

standard evaluation mea-sures

new measure: pres, morerecall-oriented

What are relevance assessments

A test collection (also known as gold standard) consists of a targetdataset, a set of queries, and relevance assessments correspondingto each query.

The CLEF-IP test collection:

target data: 2 million EP patents

queries: full-text patents (without images)

relevance assessments: extended citations

What are relevance assessments

A test collection (also known as gold standard) consists of a targetdataset, a set of queries, and relevance assessments correspondingto each query.

The CLEF-IP test collection:

target data: 2 million EP patents

queries: full-text patents (without images)

relevance assessments: extended citations

What are relevance assessments

A test collection (also known as gold standard) consists of a targetdataset, a set of queries, and relevance assessments correspondingto each query.

The CLEF-IP test collection:

target data: 2 million EP patents

queries: full-text patents (without images)

relevance assessments: extended citations

What are relevance assessments

A test collection (also known as gold standard) consists of a targetdataset, a set of queries, and relevance assessments correspondingto each query.

The CLEF-IP test collection:

target data: 2 million EP patents

queries: full-text patents (without images)

relevance assessments: extended citations

What are relevance assessments

A test collection (also known as gold standard) consists of a targetdataset, a set of queries, and relevance assessments correspondingto each query.

The CLEF-IP test collection:

target data: 2 million EP patents

queries: full-text patents (without images)

relevance assessments: extended citations

Relevance assessments

We used patents cited as prior art as relevance assessments.

Sources of citations:

1 applicant’s disclosure: the Uspto requires applicants todisclose all known relevant publications

2 patent office search report: each patent office will do a searchfor prior art to judge the novelty of a patent

3 opposition procedures: patents cited to prove that a grantedpatent is not novel

Relevance assessments

We used patents cited as prior art as relevance assessments.

Sources of citations:

1 applicant’s disclosure: the Uspto requires applicants todisclose all known relevant publications

2 patent office search report: each patent office will do a searchfor prior art to judge the novelty of a patent

3 opposition procedures: patents cited to prove that a grantedpatent is not novel

Relevance assessments

We used patents cited as prior art as relevance assessments.

Sources of citations:

1 applicant’s disclosure: the Uspto requires applicants todisclose all known relevant publications

2 patent office search report: each patent office will do a searchfor prior art to judge the novelty of a patent

3 opposition procedures: patents cited to prove that a grantedpatent is not novel

Relevance assessments

We used patents cited as prior art as relevance assessments.

Sources of citations:

1 applicant’s disclosure: the Uspto requires applicants todisclose all known relevant publications

2 patent office search report: each patent office will do a searchfor prior art to judge the novelty of a patent

3 opposition procedures: patents cited to prove that a grantedpatent is not novel

Relevance assessments

We used patents cited as prior art as relevance assessments.

Sources of citations:

1 applicant’s disclosure: the Uspto requires applicants todisclose all known relevant publications

2 patent office search report: each patent office will do a searchfor prior art to judge the novelty of a patent

3 opposition procedures: patents cited to prove that a grantedpatent is not novel

Extended citations as relevance assessments

direct citations and their families

Extended citations as relevance assessments

direct citations of family members ...

Extended citations as relevance assessments

... and their families

Patent families

A patent family consists of patents granted by different patentauthorities but related to the same invention.

simple family all family members share the same priority number

extended family there are several definitions, in the INPADOCdatabase all documents which are directly orindirectly linked via a priority number belong to thesame family

Patent families

A patent family consists of patents granted by different patentauthorities but related to the same invention.

simple family all family members share the same priority number

extended family there are several definitions, in the INPADOCdatabase all documents which are directly orindirectly linked via a priority number belong to thesame family

Patent families

A patent family consists of patents granted by different patentauthorities but related to the same invention.

simple family all family members share the same priority number

extended family there are several definitions, in the INPADOCdatabase all documents which are directly orindirectly linked via a priority number belong to thesame family

Patent families

Patent documents are linked bypriorities

Patent families

Patent documents are linked bypriorities

INPADOC family.

Patent families

Patent documents are linked bypriorities

Clef–Ip uses simple families.

Relevance assessments 2010

Expanding the 2009 extended citations:

1 include citations of forward citations ...

2 ... and their families

This is apparently a well-known method among patent searchers.

Zig-zag search?

Relevance assessments 2010

Expanding the 2009 extended citations:

1 include citations of forward citations ...

2 ... and their families

This is apparently a well-known method among patent searchers.

Zig-zag search?

Relevance assessments 2010

Expanding the 2009 extended citations:

1 include citations of forward citations ...

2 ... and their families

This is apparently a well-known method among patent searchers.

Zig-zag search?

Relevance assessments 2010

Expanding the 2009 extended citations:

1 include citations of forward citations ...

2 ... and their families

This is apparently a well-known method among patent searchers.

Zig-zag search?

Relevance assessments 2010

Expanding the 2009 extended citations:

1 include citations of forward citations ...

2 ... and their families

This is apparently a well-known method among patent searchers.

Zig-zag search?

How good are the CLEF-IP relevance assessments?

CLEF-IP uses families + citations:

How good are the CLEF-IP relevance assessments?

how complete are extendedcitations as a relevanceassessments?

will every prior art patent beincluded in this set?

and if not, what percentageof prior art items are capturedby extended citations?

when considering forwardcitations, how good areextended citations as a priorart candidate set?

How good are the CLEF-IP relevance assessments?

how complete are extendedcitations as a relevanceassessments?

will every prior art patent beincluded in this set?

and if not, what percentageof prior art items are capturedby extended citations?

when considering forwardcitations, how good areextended citations as a priorart candidate set?

How good are the CLEF-IP relevance assessments?

how complete are extendedcitations as a relevanceassessments?

will every prior art patent beincluded in this set?

and if not, what percentageof prior art items are capturedby extended citations?

when considering forwardcitations, how good areextended citations as a priorart candidate set?

How good are the CLEF-IP relevance assessments?

how complete are extendedcitations as a relevanceassessments?

will every prior art patent beincluded in this set?

and if not, what percentageof prior art items are capturedby extended citations?

when considering forwardcitations, how good areextended citations as a priorart candidate set?

Feedback from patent experts needed

Quality of prior art candidate sets has to be assessed

Feedback from patent experts needed

Know-how of patent search experts is needed

Feedback from patent experts needed

at Clef–Ip 2009 7 patent search professionals assessed 12search results

the task was not well defined and there weremisunderstandings on the concept of relevance

amount of data was not sufficient to draw conclusions

Feedback from patent experts needed

at Clef–Ip 2009 7 patent search professionals assessed 12search results

the task was not well defined and there weremisunderstandings on the concept of relevance

amount of data was not sufficient to draw conclusions

Feedback from patent experts needed

at Clef–Ip 2009 7 patent search professionals assessed 12search results

the task was not well defined and there weremisunderstandings on the concept of relevance

amount of data was not sufficient to draw conclusions

Feedback from patent experts needed

Some initiatives associated with Clef–Ip

The results of evaluation tracks are mostly useful for the researchcommunity.

This community often produces prototypes that are of littleinterest to the end-user.

Next I’d like to present two concrete outcomes - not of Clef–Ipdirectly but arising from work in patent retrieval evaluation

Some initiatives associated with Clef–Ip

The results of evaluation tracks are mostly useful for the researchcommunity.

This community often produces prototypes that are of littleinterest to the end-user.

Next I’d like to present two concrete outcomes - not of Clef–Ipdirectly but arising from work in patent retrieval evaluation

Some initiatives associated with Clef–Ip

The results of evaluation tracks are mostly useful for the researchcommunity.

This community often produces prototypes that are of littleinterest to the end-user.

Next I’d like to present two concrete outcomes - not of Clef–Ipdirectly but arising from work in patent retrieval evaluation

Soire

Soire

developed at Matrixware

service-oriented architecture - available as a a Web service

allows to replicate IR experiments based on classicalevaluation model

tested on the CLEF-IP data

customized for the evaluation of machine translation

Soire

developed at Matrixware

service-oriented architecture - available as a a Web service

allows to replicate IR experiments based on classicalevaluation model

tested on the CLEF-IP data

customized for the evaluation of machine translation

Soire

developed at Matrixware

service-oriented architecture - available as a a Web service

allows to replicate IR experiments based on classicalevaluation model

tested on the CLEF-IP data

customized for the evaluation of machine translation

Soire

developed at Matrixware

service-oriented architecture - available as a a Web service

allows to replicate IR experiments based on classicalevaluation model

tested on the CLEF-IP data

customized for the evaluation of machine translation

Soire

developed at Matrixware

service-oriented architecture - available as a a Web service

allows to replicate IR experiments based on classicalevaluation model

tested on the CLEF-IP data

customized for the evaluation of machine translation

Spinque

Spinque

a spin-off (2010) from CWI (the Dutch National ResearchCenter in Computer Science and Mathematics)

introduces search-by-strategy

provides optimized strategies for patent search - tested onCLEF-IP data

transparency: understand your search results to improvestrategy

Spinque

a spin-off (2010) from CWI (the Dutch National ResearchCenter in Computer Science and Mathematics)

introduces search-by-strategy

provides optimized strategies for patent search - tested onCLEF-IP data

transparency: understand your search results to improvestrategy

Spinque

a spin-off (2010) from CWI (the Dutch National ResearchCenter in Computer Science and Mathematics)

introduces search-by-strategy

provides optimized strategies for patent search - tested onCLEF-IP data

transparency: understand your search results to improvestrategy

Spinque

a spin-off (2010) from CWI (the Dutch National ResearchCenter in Computer Science and Mathematics)

introduces search-by-strategy

provides optimized strategies for patent search - tested onCLEF-IP data

transparency: understand your search results to improvestrategy

Clef–Ip 2009 learnings

The Humboldt University implemented a model for patent searchthat produced the best results.

The model combined several strategies:

using metadata (IPC, ECLA)

indexes built at lemma level

an additional phrase index for English

crosslingual concept index (multilingual terminologicaldatabase)

Clef–Ip 2009 learnings

The Humboldt University implemented a model for patent searchthat produced the best results.

The model combined several strategies:

using metadata (IPC, ECLA)

indexes built at lemma level

an additional phrase index for English

crosslingual concept index (multilingual terminologicaldatabase)

Clef–Ip 2009 learnings

The Humboldt University implemented a model for patent searchthat produced the best results.

The model combined several strategies:

using metadata (IPC, ECLA)

indexes built at lemma level

an additional phrase index for English

crosslingual concept index (multilingual terminologicaldatabase)

Clef–Ip 2009 learnings

The Humboldt University implemented a model for patent searchthat produced the best results.

The model combined several strategies:

using metadata (IPC, ECLA)

indexes built at lemma level

an additional phrase index for English

crosslingual concept index (multilingual terminologicaldatabase)

Clef–Ip 2009 learnings

The Humboldt University implemented a model for patent searchthat produced the best results.

The model combined several strategies:

using metadata (IPC, ECLA)

indexes built at lemma level

an additional phrase index for English

crosslingual concept index (multilingual terminologicaldatabase)

Clef–Ip 2009 learnings

The Humboldt University implemented a model for patent searchthat produced the best results.

The model combined several strategies:

using metadata (IPC, ECLA)

indexes built at lemma level

an additional phrase index for English

crosslingual concept index (multilingual terminologicaldatabase)

Some additional investigations

Some citations were hard to find

% runs class≤ 5 hard

5 < x ≤ 10 very difficult

10 < x ≤ 50 difficult

50 < x ≤ 75 medium

75 < x ≤ 100 easy

Some additional investigations

Some citations were hard to find

% runs class≤ 5 hard

5 < x ≤ 10 very difficult

10 < x ≤ 50 difficult

50 < x ≤ 75 medium

75 < x ≤ 100 easy

Some additional investigations

We looked at the content of citations and citing patents.

Some additional investigations

Ongoing investigations.

Thank you for your attention.

top related