Top Banner
ntcir5-clef 2005- 09-22 Noriko kando 1 What’s happening at NTCIR Noriko Kando National Institute of Informatics http://research.nii.ac.jp/ntcir/ kando (at) nii. ac. jp
72

Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics kando (at)

Jan 02, 2016

Download

Documents

Eugenia Cole
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 1

What’s happening at NTCIR

Noriko KandoNational Institute of Informaticshttp://research.nii.ac.jp/ntcir/

kando (at) nii. ac. jp

Page 2: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 2

NTCIR Workshop is :

A series of evaluation workshops designed to enhance research in information access technologies by providing infrastructure of large-scale evaluation.

Project started late 1997, Once per 1½ years1st : Nov.1,1998- Sept.1,1999

2nd : June,2000– March,20013rd : Sept 2001- Oct 20024th: Apr 2003 – June 20045th: Oct 2004 – Dec 2005

* Nii Test Collection for Information Retrieval systems* Co-sponsored by NII and MEXT Grant-in-Aid on Informatics

Page 3: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 3

Focus of NTCIR

Lab-type IR Test New Challenges

Forum for Researchers

Asian Languages/cross-language

Variety of Genre

Parallel/comparable Corpus

Intersection of IR + NLPTo make information in the documents more usable for users! Realistic eval/user task

Idea Exchange

Discussion/Investigation on Evaluation methods/metrics

Page 4: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 4

Tasks (Research Areas) of NTCIR Workshops

Tasks

Japanese IR

Cross-lingual IR

Patent Retrieval

Web Retrieval

Term Extraction/Role Analysis

QuestionAnsweringCross-Language QuestionAnswering  

Text Summarization

[Pilot] Trend Info

2nd 3rd 5th

Nov 98

1st

Sept.2004About once per 1 ½

years

Project started late 1997 4th

Page 5: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 5

NTCIR-5 (Mtg: Dec.6-9, 2005)• CLIR: focus: NE, OOV, news docs 2000-2001CJK• CLQA: E-C, C-C, E-J (Pilot, New)• Patent Retrieval:

– Invalidity Search, 10 yr patent fulltext ca90GB– Text Categorization to F-terms (good granularity for p

atent map axis)• QAC: Series of Questions (J-J) • WEB: Navigational Retrieval, New 1.5TB docs • [Pilot] Must: MUltimodal Summarization for Trend inform

ation, extract numeric information from a set of documents, and visualize them to show their trends

You are most welcom

e!

Page 6: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 6

Schedule for NTCIR-5

[TASK]Dec 2004: Document ReleaseApril-July, 2005: Formal Run1 Sept 2005: Evaluation Results Return (excpt CLIR)15 Oct 2005: Paper Submission6-9 Dec 2005: Conference, at NII, Tokyo Japan *Proceedings will be published at the Conference.

[Open Submission]1 Oct 2005: Paper Due1 Nov 2005: Late Breaking Short paper Due15 Nov 2005: Notification

Page 7: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 7

0 20 40 60 80 100

1st workshop

2st workshop

3rd workshop

4th workshop

5th workshop

# of groups

# of countries

NTCIR workshop: Number of Participating Groups

102groups from

15countries, registered

102

65

36

28

12

9

8

6

1074

Page 8: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 8

0

20

40

60

80

100

120

1st (

1998

-9)

2nd (2

000-

1)

3rd

(200

1-2)

4th (2

003-

4)

5th (2

005-

6)

# o

f Pa

rtic

ipat

ingG

roup

s

Trend I nfo

CLQA

QA

Summarization

Term Extraction

Web Retrieval

Patent Retrieval

NonJ apanese I R

CLI R

J apanese I R0

20

40

60

80

100

120

1st (

1998

-9)

2nd (2

000-

1)

3rd

(200

1-2)

4th (2

003-

4)

5th (2

005-

6)

# o

f Pa

rtic

ipat

ingG

roup

s

Trend I nfo

CLQA

QA

Summarization

Term Extraction

Web Retrieval

Patent Retrieval

NonJ apanese I R

CLI R

J apanese I R

Chinese

JE JE,EJ、 EC

xCJEK

Number of Participants by Tasks

102 groups from 15 countries registered

Registered for NTCIR-5

Page 9: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 9

0

20

40

60

80

100

1st (

1998

-9)

2nd (2

000-

1)

3rd (2

001-2)

4th (2

003-

4)

5th (2

005-

6)

# o

f Pa

rtic

ipat

ingG

roup

sTrend I nfo

CLQA

QA

Summarization

Term Extraction

Web Retrieval

Patent Retrieval

NonJ apanese I R

CLI R

J apanese I R0

20

40

60

80

100

1st (

1998

-9)

2nd (2

000-

1)

3rd (2

001-2)

4th (2

003-

4)

5th (2

005-

6)

# o

f Pa

rtic

ipat

ingG

roup

sTrend I nfo

CLQA

QA

Summarization

Term Extraction

Web Retrieval

Patent Retrieval

NonJ apanese I R

CLI R

J apanese I R

Chinese

JEJE,EJ、 EC xCJEK

Number of Participants by Tasks

77 Active Participants from 15 Countries

Submitted Results

Page 10: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 11

Geographical Distribution of Participants

FinlandGermanyIrelandNetherlandsSpainSwitzerland

CanadaUSA

Australia

China PRCHong KongJapanKoreaSingaporeTaiwan ROC

Page 11: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 12

NTCIR Workshop 5 (2004-2005) Organizers

+CLIRHsin-Hsi Chen, NTU Kuang-hua Chen, NTUKazuaki Kishida, Surugadai UKazuko Kuriyama, Shirayuri USukhoon Lee, NCUSung Hyon Myaeng, IIUNoriko Kando, NII

+CLQAKuang-hua Chen, NTUChuan-Jie Lin , Nat Taiwan Ocean UYutaka Sakaki, ATR

+PATENTAtsushi Fujii, Tsukuba UMakoto Iwayama, Hitachi/TITECNoriko Kando, NII

+QAJunichi Fukumoto, Ritsumeikan U Tsuneaki Kato, U TokyoFumito Masui, Mie U

+WEBKeizo Oyama, NIIMasao Takaku, NII

+Trend Info [Pilot]

Program chair: Noriko Kando, NII

Page 12: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 13

NTCIR Workshop 5 (2004-2005) Organizers

+CLIRHsin-Hsi Chen, NTU Kuang-hua Chen, NTUKazuaki Kishida, Surugadai UKazuko Kuriyama, Shirayuri USukhoon Lee, NCUSung Hyon Myaeng, IIUNoriko Kando, NII

+CLQAKuang-hua Chen, NTUChuan-Jie Lin , Nat Taiwan Ocean UYutaka Sakaki, ATR

+PATENTAtsushi Fujii, Tsukuba UMakoto Iwayama, Hitachi/TITECNoriko Kando, NII

+QAJunichi Fukumoto, Ritsumeikan U Tsuneaki Kato, U TokyoFumito Masui, Mie U

+WEBKeizo Oyama, NIIMasao Takaku, NII

+Trend Info [Pilot]

Program chair: Noriko Kando, NII

Page 13: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 14

NTCIR test collections

Collection task

Documents topic./QRelevance

/Answe

rGenre Size

Language

Language

NTCIR-1 IRAcademi

c577MB JE J 3

CIRB010 IR News 132MB Ct CtE 4

NTCIR-2IR Academi

c800MB JE JE 4

NTCIR-2 Summ Summ News 180 docs J J  NTCIR-3 CLIR

IRNews 884MB CtKJE CtKJE

4

NTCIR-3 PATENT

IRPatent 18GB(+5GB) J(JE) CsCtKJE 3

NTCIR-3 QA QA News 282MB J J(E) exact

NTCIR-3 Summ Summ News60 docs + 50

setsJ -  

NTCIR-3 WEB IR WEB 100GB Multiple J(E) 4+relativeNTCIR-4 CLIR IR News ca 3GB CtKJE CtKJE 4NTCIR-4

PATENTIR Patent 45GB J(JE)

CsCtKJE3

NTCIR-4 QA QA News 776MB J J(E) 4NTCIR-4 Summ Summ News 30 sets J -  NTCIR-4 WEB IR WEB 100GB Multiple J(E)  Ct : Traditional Chinese 、 Cs : Simplified Chinese 、 K : Korean 、 J :

Japanese 、 E : English

Page 14: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 15

Situation on the Data Distribution of Research Purpose Use of NTCIR

Test Collections

58 2714

57 2212

125 2822

0 100 200

numbers

NTCI R- 1

NTCI R- 2

NTCI R- 3Unversity

privateenterprise

others

43 96

25 69

70 153

0 100 200 300

numbers

NTCIR- 1

NTCIR- 2

NTCIR- 3

foreign

countries

the interior

0

50

100

150

200

250

NTCI R-1 NTCI R-2 NTCI R-3

numb

ers

Page 15: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 16

What’s New to NTCIR-4

- Open Submission Session

- ACM-TALIP Special Issue Recommendation

- Open Attendance

- Research Purpose Use of the Submission Raw Data

Started with NTCIR-3 CLIR, and then will enlarge

- Online Working Notes and Slides

Page 16: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 17

What’s New to NTCIR-5- Open Submission >>>>

- ACM-TALIP Special Issue Recommendation (need changing the strategy), but

Special Issue on Patent at IP&M

- Open Attendance >>>>

- Research Purpose Use of the Submission Raw Data >>>>

- Online Working Notes and Slides >>>>

Proceedings at Conference Only (No working notes)

- Pilot tasks and feasibility studies using different funding scheme, ex. Multi modal trend information [co-funding NTT, Tokyo U], “why” question w/automatic “pyramid” evaluation [w: ISI/UCS]

Page 17: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 18

Acknowledgment

• Central Daily News• China Daily News• China Times Inc.• Chosunilbo• Hankooki.com• Industrial Property Cooper

ation Center• Japan Parent Office• Japan Patent Information

Organization

Korea Economic DailyLinguistic Data ConsortiumMainichi NewspaperNippon Database Kaihatsu, Co. Ltd.NTTNRI Cyber PatentPATOLISthe Sing Tao GroupTaiwan NewsTokyo UnivUDN.COMWisers Information Ltd.Yomiuri Shinbun

Page 18: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

Cross-Language Information Retrieval (CLIR) Task

Task Organizers Kazuaki Kishida*, Kuang-hua Chen, Sukhoon Lee,

Hsin-Hsi Chen, Koji Eguchi, Noriko KandoKazuko Kuriyama, Sung Hyon Myaeng

Page 19: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 20

NTCIR-5 CLIR

Documents

50 topics

Published in 1998-1999

KoreanKorean

JapaneseJapaneseEnglishEnglish

ChinesetradChinesetrad

•Short Q: D-only and T-only are mandatory•Background info of search requests•Balance btw topic-types: - specific (ex. Particular event) vs generic- proper nouns vs without PN- domestic/regional/international

J J

EE

E

J E

J

J

C

C

C

C

C

C

K

K

K K

K

E

1.6 M Docs3.3 GB

Page 20: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 21

Design of CLIR Task

• Subtasks– Multilingual CLIR (MLIR) : e.g., C - CJKE– Bilingual CLIR (BLIR): e.g., C - J– Single Language IR (SLIR): e.g., C - C– Pivot Bilingual CLIR (PLIR): e.g., C - E - J

• Languages– Chinese (C), Japanese (J), Korean (K),

English (E)

Page 21: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 22

ChinesetradJapanese Korean

23K doc 66K docEnglish

Documents for CLIR at NTCIR

ChinesetradJapanese EnglishKorean

380K doc

250K doc

590K doc

350K doc

Published in 1998-1999

NTCIR-3

NTCIR-4

Every language is multi-sources.Every language is multi-sources.

220K doc

250K doc

Publiched in 1998-1999 Published in 1994

870MB

3.3GB

ChinesetradJapanese EnglishKorean

901K doc

220K doc

858K doc

259K doc

NTCIR-5 Published in 2000-2001

Page 22: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 23

Test Collection

• Queries – 50 topics • Relevance Judgments – 4 grades

– Highly Relevant (S), Relevant (A), Partial Relevant (B), Non-Relevant (C)

• Mandatory Runs– TITLE-only run, DESC-only run

Page 23: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 24

Result Submission

• 24 groups submitted results– From Australia, Canada, China PRC,

Finland, Germany, Hong Kong, Japan, Korea, Netherlands, Singapore, Spain, Switzerland, Taiwan, USA (14 countries and areas)

Page 24: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 25

Techniques Used (NTCIR-4)• Indexing, Stop Words, Decompounding• Mostly “Query Trans”, but one “Bi-Directoral”• Query and Document translation

– MT, MRD, Parallel corpora• Translation disambiguation• Out-of-vocabulary (OOV) problem

– Use of Web resources– Transliteration - Cognate

• Query expansion techniques– Pseudo-relevance feedback, FPRF– Use of Knowledge ontology

• Merging strategies

Page 25: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 26

Homework from NTCIR-4: Best SLIR and BLIR runs

(D-run, Rigid) MAP and % to Monolingual

C-C .3255 J-J .3804

J-C .0548 16.8% C-J .2309 60.7%

K-C .1447 44.5% K-J .2935 77.2%

E-C .0663 20.4% E-J .2674 70.3%

K-K .4685 E-E .3469

C-K .3973 84.8% C-E .2238 64.5%

J-K .3984 85.0% J-E .3340 96.2%

E-K .3249 69.3% K-E .2250 64.9%

Extremely high!

Page 26: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

Patent Retrieval Task

Task Organizers Atsushi Fujii (Univ of Tsukuba) Makoto Iwayama (TIT/Hitachi)

Noriko Kando (NII)

Page 27: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 28

Patent Retrieval Taskssituation & users’ information seeking

task

NTCIR-3 PATENT(2001-2002)

NTCIR-4,-5 PATENT(2003-2004)(2004-2005)

Technological Survey: Search patents by newspaperEnd user: non-experts (ex. Business manager)

From a claim of a new patent application, search patents that can invalidate the new patent application. User: patent experts

Patent Claims

Patent ApplicationsNewspaper

5 yrs, 45GB: 10 yrs 90GB

Page 28: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 29

NTCIR-4 Patent (2003-2004)

Translation

(1993-1997)Full text with author’s abstract(in Japanese)

(1993-1997)Abstract(in English)

Ca.3.5 M docs

3.5 million docs.

DOCUMENTS

Japanese

English

Chinesetrad

Chinesesymp

Korean

TOPICS (34 manual + 69 automatic)

1993-97 are used for evaluation

Patents (claims)

By professional abstractors

Ca. 45GB

Main: Search patents by patent - text retrieval + relevant passage pinpointing

Feasibility: patent map automatic creation - make a table from a set of relevant patents on a topic (more than 100 patents), to see the tech trends. text mining), 3 year task

Main: Search patents by patent - text retrieval + relevant passage pinpointing

Feasibility: patent map automatic creation - make a table from a set of relevant patents on a topic (more than 100 patents), to see the tech trends. text mining), 3 year task

Page 29: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 30

NTCIR-5 Patent (2004-2005)

Translation

(1993-2002)Full text with author’s abstract(in Japanese)

(1993-2002)Abstract(in English)

Ca.7 M docs

7 million docs.5 GB

DOCUMENTS

Japanese

English

TOPICS (34 manual + 1200-11 automatic)

Patents (claims)

By professional abstractors

Ca. 90GB

Search patents by patent - text retrieval + relevant passage pinpointingPassage RetrievalF-term Classification

Search patents by patent - text retrieval + relevant passage pinpointingPassage RetrievalF-term Classification

Page 30: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 32

Search topics• Japanese patent application rejected by

Japanese Patent Office (JPO)• 34 main topics: selected and judged by

human patent experts of “Japan Intellectual Property Association” (JIPA) (created at NTCIR-4)

• 1189 additional topics: applications rejected by JPO/ evaluate by using the citations only

• Quite few relevant documents

Page 31: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 33

Example search topic

<TOPIC><NUM>008</NUM><LANG>EN</LANG><FDATE>19960527</FDATE><CLAIM>(Claim 1) A sensor device, characterized in thatan open recessed part is formed on a box-shaped forming base, a conductive film of a designated pattern is formed on the surface of the forming base including the inner surface of the recessed part, an element for a sensor is bonded to the recessed part,and the forming base is closed with a cover.</CLAIM>...</TOPIC>

Relevant documents must be prior art, which had been open to the public before the topicpatent was filed

Target for invalidation

Date of filing

Page 32: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 34

Relevance judgment

• Document-based relevant judgment – A: patent that can invalidate the topic

claim– B: patent that can invalidate the topic

claim, when used with other patents • passage-based relevant judgment:

– combinational relevance• Submitted runs were evaluated by

mean average precision (MAP)

Page 33: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 35

Two stage refinement

NTCIR-4

Page 34: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 36

NTCIR-5 DocIR AB

0

2

4

6

8

10

12

14

16

0 10 20 30 40 50 60 70 80 90 100Recall (%)

Pre

cisi

on (%)

d0038 (16.84) E

d0077 (16.54) B

d0027 (15.73) A

d0055 (15.66) D

d0047 (14.47) C

d0011 (13.96) F

d0044 (7.57) G

d0001 (6.75) J

d0040 (5.48) I

d0043 (4.13) H

Page 35: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 37

NTCIR-4 DocIR AB

0

10

20

30

40

50

60

0 10 20 30 40 50 60 70 80 90 100Recall (%)

Pre

cisi

on (%)

d0025 (25.06) A

d0061 (23.69) B

d0050 (21.66) D

d0046 (20.35) C

d0036 (18.23) E

d0044 (15.73) G

d0012 (14.89) F

d0032 (11.13) J

d0043 (8.52) H

d0041 (7.72) I

NTCIR-5 DocIR AB, Manual Queries

Page 36: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 38

Passage Retrieval

• Provide Topics and Relevant documents– NTCIR-4 Topics 41

• Dry runs 7 , Formal runs 34

– Relevant Docs   378

– Sort the passages in the relevant docs

Page 37: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 39

Ex. Results filepassage retrieval

Topic ID

Always 0

Passage ID

rank

score

Run ID

0001 0 1993-123456-5 1 9999 ntc10001 0 1993-123456-3 2 9999 ntc10001 0 1993-123456-0 3 9999 ntc10002 0 1994-000002-3 4 9999 ntc10002 0 1994-000002-1 5 9999 ntc1...

Page 38: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 40

Evaluation of passage Retrieval

• MAP– See both Recall and Precision

• Expected Search Length (ESL)– See Precision

Page 39: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 41

Ex. evaluation by ESL

Relevant docs

………

Relevant passage

• evaluate by the number of passages (search length) that the user read by he/she obtains sufficient evidences• average the search length by each rel doc

Search length = 5

Page 40: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 42

Passage IR A A

0

10

20

30

40

50

60

70

0 10 20 30 40 50 60 70 80 90 100

Recall (%)

Pre

cisi

on (%)

d0004 (54.41) Ad0011 (52.23) Id0010 (50.72) Ed0016 (48.27) BBase1 (33.61)

PR-curve by Macro

average

Baseline: in order of the passage ID

Page 41: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 43

Passage IR A B

0

10

20

30

40

50

60

70

0 10 20 30 40 50 60 70 80 90 100

Recall (%)

Pre

cisi

on (%)

d0004 (50.41) Ad0011 (47.81) Id0010 (47.13) Ed0016 (46.62) BBase1 (34.51)

Page 42: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 44

Passage IR B A

0

10

20

30

40

50

60

70

0 10 20 30 40 50 60 70 80 90 100

Recall (%)

Pre

cisi

on (%)

d0004 (55.53) Ad0010 (48.91) Ed0011 (48.75) Id0012 (44.49) BBase1 (37.00)

Page 43: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 45

Passage IR B B

0

10

20

30

40

50

60

70

80

0 10 20 30 40 50 60 70 80 90 100

Recall (%)

Pre

cisi

on (%)

d0004 (52.46) Ad0010 (46.36) Ed0011 (46.10) Id0012 (43.59) BBase1 (37.17)

Page 44: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 46

NTCIR-4 Feasibility Study: automatic patent map

generation

searchtopic

classification

documents

visualization

topics and documents in NTCIR-3 collection

application

JAPIO abst

PAJ

multi-dimensionalmatrix

retrieval

Page 45: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 47

 crystalline reliability long

operating life

emission stability

emission intensity

structure of active layer

    1998-1450001998-233554

   

electrode composition

  1998-107318   1998-1900631998-209498

1998-209495

electrode arrangement

  1998-2150341998-223930

1998-2425181998-1732301998-2094991998-256602

1998-2425151998-270757

structure of light emitting element

1998-1355161998-2425861998-247761

  1998-1355141998-256668

 1998-0129231998-2477451998-256597

problems to be solved

solu

tion

sExample (blue light-emitting diode)

given

participants identify lines and columns

Page 46: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 48

NTCIR4 FS(patent map) Lesson learned

• Classification(Clustering) : very good

• Labeling the clusters: future work

• “Solution” only

• Too small # of topics

• Evaluation: insufficient– Can not cross system evaluation

Page 47: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 49

NTCIR5 F term classification

• IUse existing Classification (F terms)– Many topics– Cross-system Evaluation

• F term: multi-perspective classification– Can be used for Patent Map Automatic Creati

on

Page 48: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 50

tasks

• Topic classification– Provide Topic to each patent or Abstract

• F term classification– Provide F terms to patents (or abstracts) in a

specific topic)

Page 49: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 51

Purpose of the task• Topic classification: 

– Classification of the structured documents

• F term classification: – Multi-perspective classification

Page 50: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

Question Answering Challenge

Task OrganizersJun'ichi FUKUMOTO

Tsuneaki KATOFumito MASUI

Page 51: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 53

Question Answering Challenge at NTCIR

Subtask 3: A series of questions.

Report writing task: topic centered vs browsing, Eval by F-measure

-Exact Answers - Return in 48 hours

-Doc IDs are required as support information

Same as the subtask-3 at NTCIR-4

Page 52: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 54

Series of QuestionSituation Settings (User’s Task)

1. Collecting information about a particular topic– One (hidden) global topic and series of

Qs on subtopics of the global topic2. Browsing along transitive interests

– Topic or focus of the Qs are shifting through the interaction of the user and system.

– Local coherence with the previous Q only

Page 53: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 55

Relation to Multi-Doc Summarization

Answering a series of Qs has a close relation with Multi-Doc Summarization:– Series of Qs covers subtopics shall be

contained in a summary; can be used as “quality questions”,

– Summarization as pre-processing of QA?– QA for pre-processing of Abstract-type

summary generation?

Page 54: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 56

Example of Series of Questions(hidden global Q= Seiji Ozawa)

• When was Seiji Ozawa born?• Where was he born?• Which university did he graduate from?• Who did he study under?• Who recognized him?• Which orchestra was he conducting in

1998?• Which orchestra will he begin to conduct in

2002?Series 14: Strictly Gathering Type

Page 55: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 57

Example of Series of Questions(Browsing type Q= topics shifting)

• Which stadium is home to the New York Yankees?

• When was it built?• How many persons' monuments have been

displayed there?• Whose monument was displayed in 1999?• When did he come to Japan on honeymoon?• Who was the bride at that time?• Who often draws pop art using her as a motif?• What company's can did he often draw also?

Series 22: Browsing Type

Page 56: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 58

NTCIR-4: Evaluation by MMF

0.0

0.1

0.2

0.3

0.4

0.5

0.6

CRL2

CRL1

TKBQ

1

TKBQ

2

TKBQ

3CR

L3

RDND

C2RITS

E

RDND

C1

SMLA

B

MAIQA2

MAIQA1

RITS

NOKI

TotalFirstRest

Page 57: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 59

NTCIR-4: Differences on Series Type

0.0

0.1

0.1

0.2

0.2

0.3

0.3

0.4

CRL2

CRL1

TKBQ

1

TKBQ

2

TKBQ

3CR

L3

RDND

C2

RITS

E

RDND

C1

SMLA

B

MAIQA2

MAIQA1

RITS

N OKI

S-GatheringGatheringBrowsing

Page 58: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 60

Homework from NTCIR-4:Problems on Evaluation

One set of all the answers == F-measure• Multiple answers and context

Ex. • Q1-Countries in East Asia? Ans-PRC, ROC, N Korea, S Korea,

UK• Q2-Capitals of these countries? Ans- Beijing, Taipei, Pyongya

ng, Soul, Tokyo• Expression diversity and identification of the sam

e answersEx. A and B are the same or not? # of total correct an

swers and recall value depends on such decision• Major and minor answers

Wrong answer

Tokyo is not capital of UK. Correct answer for Q2 but

this system produced wrong answer for Q1.

Page 59: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

Cross-Language Question Answering

Task OrganizersKuang-hua Chen, NTU

Chuan-Jie Lin , Nat Taiwan Ocean UYutaka Sakaki, ATR

Page 60: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 62

J

Organizers :  Japanese : Yutaka Sasaki (ATR)  Chinese : Hsin-Hsi Chen Kuang-hua Chen Chuan-Jie Lin (NTU)

E

C

J E C

JEC

Question:

Answer:

“Who is Japanese Prime Minster?”“ 小泉は…”

(Koizumi …)

Newspaper articles

“ 小泉”(Koizumi)

Translation

NTCIR Cross-Lingual Question Answering (CLQA1)

Language pairs: J->E, E->J, C->E, C->C, E->CTarget: Questions about named entities (PERSON, DATE, SPEED …)

Page 61: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 63

「 who is the Priminister of Japan? 」

Q analysis

Doc Retrieval

Answer candidate

Select Answers

Q type = PERSON

Docs

= Mori, Koizumi, Bush

“ 日本の 小泉首相がブッシュ大統領と …”“森前首相 は 訪問先の…”

1. Koizumi2. Mmori

Traditional QA

Rank the answer candidates according to the relationship in the documents

Page 62: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 64

「 who is the Priminister of Japan? 」

Q analysis

Doc Retrieval

Answer candidate

Select Answers

Q type

Docs

NE

“ 日本の 小泉首相がブッシュ大統領と …”“森前首相 は 訪問先の…”

1. Koizumi2. Mmori

QA by Machine Learning

Answer classification

Q NE Answer

Classification by ML

Page 63: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 65

Doc RetAnswer extraction

Doc

“ 日本の 小泉首相がブッシュ大統領と …”“森前首相 は 訪問先の…”

1. 小泉首相2. 森

Question Biased Term ExtractionQuestion + Term Extraction = QA

QBTE:  term extraction

biased to question

「 who is the Priminister of Japan? 」

Page 64: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 66

NTCIR-5 WEB

Navigational Retrieval Task 2Navigational Retrieval Task 2• Goal: “Known Item Search”.

– To search for one or more representative Web pages on a known item.)

• Data set: NW1000G-04 (1.36TB or 1.5×1012byte Web page. crawled in 2004)

• Topics: 400+800– TITLE part (1-3 search terms) only is mandatory. – analyzing relationship among search techniques, topic types,

search item categories and relevant page styles.• Submitted runs: 35 (+28 by organizers) (+3 with trouble) • Relevance judgment: relevant, partially relevant, non-

relevant. “Representativeness” was judged based on every available information, e.g., provider of the page, content (text, images, etc.), URL, in/out-linked pages.

Page 65: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 67

NTCIR-5 WEB

Navigational Retrieval Task 2Navigational Retrieval Task 2• Evaluation measures: DCG and MRR at top-

10 doc. level• Evaluation result: Tendency on MRR &

DCG– Several anchor-base systems performed best.– Link-base method or URL-base method made

no contribution to anchor-base systems.– Several link-base systems performed fairly.– Content-base systems performed poorly.

• Future work:– Evaluate systems considering duplication of

relevant/partially-relevant documents– Verify stability of evaluation measures– Check comprehensiveness of assessment

results– Study on evaluation measures reflecting users’

overall cost– Analyze topic-by-topic behavior of each system

Performance of typical runs

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

TN

T-1

K31

00-9

OR

GR

EF-D

R-L

F2

OK

SA

T-W

EB

-F-0

7

OR

GR

EF-D

R-L

B2

TN

T-3

K31

00-7

OR

GR

EF-C

20-P

2

OK

SA

T-W

EB

-F-0

5

OR

GR

EF-G

C1-

LF1

OR

GR

EF-G

C1-

LB2

JSW

EB

-4

TN

T-5

OR

GR

EF-G

C1

OK

SA

T-W

EB

-F-0

0

JSW

EB

-3

Run ID

MR

R (rig

id)

Anchor+Link Anchor Link Content only

Page 66: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 68

HomeworksFunding

Organization

Community Crisis!!!

How to Select New tasks.

How to Terminate Old Tasks.

Advertisements

Divided to “Providers” and “users”

How to appeal the importance to the evaluation.

- Main achievement?

- publication?

- Effect and visibility?

Combine multiple fundings

Pilot tasks by NII’s Open call collaborative research grant

Results Analysis

Submission raw data.

Let’s work together!

Page 67: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 69

Contact Info & Online Proceedings

Documents used are Asian Languages but participation from all over the world is more than welcome!!

Open Submission Session for NTCIR-5

Inquiries: Noriko Kando at kando (at) nii. ac.jp

Online proceedings, application & other info: http://research.nii.ac.jp./ntcir/

Page 68: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 70

Thanks MerciDanke schön Gracie Gracia

s Ta! Tack Köszönöm KiitosTerima Kasih Khap Khun

Ahsante Tak 謝謝 ありがとう

Thanks MerciDanke schön Gracie Gracia

s Ta! Tack Köszönöm KiitosTerima Kasih Khap Khun

Ahsante Tak 謝謝 ありがとう

http://research.nii.ac.jp/ntcir/http://research.nii.ac.jp/ntcir/

Page 69: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 71

Details of relevant documents (A: rigid relevant)

citation

JIPA=ISJ* System=Pooling

19

17 25 58

400

0

total number of A-rel documents is 159

*ISJ=Interactive Search and Judgment

Page 70: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 76

Results: Subtask 1

MRR of correct ratio of 1st ranked answer and among 5th ranked ones

NC

QA

L

CR

L-2

CR

L-1

TK

BQ

-2

Fo

rest

-S

TK

BQ

-1

Fo

rest

-F

TS

B-A

TB

-B

GD

QA

NS

AT

D

RD

ND

C-1

NY

UC

RL

RD

ND

C-2

smla

b

NA

IST

Rits

QA

-E

na

k

OK

I-1

OK

I-2

MA

IQA

-1

Rits

QA

-N

KL

E

MA

IQA

-2

NU

T

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

MRRRatio.1stRatio.5

Page 71: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 77

CR

L-1

TK

BQ

-1

TK

BQ

-2

CR

L-2

RD

ND

C-1

NA

IST

RD

ND

C-2

smla

b

Rits

QA

-E

MA

IQA

OK

I-1

OK

I-2

Rits

QA

-N

iwa

te

0.0

0.1

0.2

0.3

0.4

0.5

MFMPMR

Results: Subtask 2

Average F-measure, Precision, and Recall over all Qs

Page 72: Ntcir5-clef 2005-09-22 Noriko kando1 What’s happening at NTCIR Noriko Kando National Institute of Informatics  kando (at)

ntcir5-clef 2005-09-22

Noriko kando 78

NTCIR-4 WEB(A) Informational Retrieval Task (B)Navigational Retrieval Task[Pilot](C) Geographical Task[Pilot](D) Topical Classification Task

retrieval result classification, eg.using clustering

Documents: – ‘NW100G-01’ (100GB Web pages crawled in 2

001 from “*.jp”) for Subtasks A and B– ‘Target data’ (subset of the NW100G-01) fo

r Subtasks C and D.