PASCAL PASCAL CHALLENGE ON INFORMATION EXTRACTION

PASCAL

PASCAL CHALLENGE ON INFORMATION EXTRACTION

& MACHINE LEARNING

Designing Knowledge Management using Adaptive Information Extraction from Text

PASCAL Network of Excellence on Pattern Analysis, Statistical Modelling and Computational Learning

Call for participation:

Evaluating Machine Learning for Information Extraction

July 2004 - November 2004

The Dot.Kom European project and the Pascal Network of Excellence invite you in participating in the Challenge on Evaluation of Machine Learning for Information Extraction from Documents. Goal of the challenge is to assess the current situation concerning Machine Learning (ML) algorithms for Information Extraction (IE), identifying future challenges and to foster additional research in the field. Given a corpus of annotated documents, the participants will be expected to perform a number of tasks; each examining different aspects of the learning process.

Corpus A standardised corpus of 1100 Workshop Call for Papers (CFP) will be provided. 600 of these documents will be annotated with 12 tags that relate to pertinent information (names, locations, dates, etc.). Of the annotated documents 400 will be provided to the participants as a training set, the remaining 200 will form the unseen test set used in the final evaluation. All the documents will be pre-processed to include tokenisation, part-of-speech and named-entity information.

Tasks Full scenario: The only mandatory task for participants is learning to annotate implicit information: given the 400 training documents, learn the textual patterns necessary to extract the annotated information. Each participant provides results of a four-fold cross-validation experiment using the same document partitions for pre-competitive tests. A final test will be performed on the 200 unseen documents. Active learning: Learning to select documents: the 400 training documents will be divided into fixed subsets of increasing size (e.g. 10, 20, 30, 50, 75, 100, 150, and 200). The use of the subsets for training will show effect of limited resources on the learning process. Secondly, given each subset the participants can select the documents to add to increment to the next size (i.e. 10 to 20, 20 to 30, etc.), thus showing the ability to select the most suitable set of documents to annotate. Enriched Scenario: the same procedure as task 1, except the participants will be able to use the unannotated part of the corpus (500 documents). This will show how the use of unsupervised or semi-supervised methods can improve the results of supervised approaches. An interesting variant of this task could concern the use of unlimited resources, e.g. the Web.

Participation Participants from different fields such as machine learning, text mining, natural language processing, etc. are welcome. Participation in the challenge is free. After registration, participant will receive the corpus of documents to train on and the precise instructions on the tasks to be performed. At an established date, participants will be required to submit their systems’ answers via a Web portal. An automatic scorer will compute the accuracy of extraction. A paper will have to be produced in order to describe the system and the results obtained. Results of the challenge will be discussed in a dedicated workshop.

Timetable 5th July 2004: Formal definition of the tasks, annotated corpus and evaluation server 15th October 2004: Formal evaluation November 2004: Presentation of evaluation at Pascal workshop

Organizers Fabio Ciravegna: University of Sheffield, UK; (coordinator) Mary Elaine Califf, Illinois State University, USA,

Neil Ireson

Local Challenge Coordinator

Web Intelligent GroupDepartment of Computer ScienceUniversity of Sheffield

PASCAL

Organisers• Sheffield – Fabio Ciravegna

• UCD Dublin – Nicholas Kushmerick

• ITC-IRST – Alberto Lavelli

• University of Illinois – Mary-Elaine Califf

• FairIsaac – Dayne Freitag

Website• http://tyne.shef.ac.uk/Pascal

PASCAL

Outline

• Challenge Goals

• Data

• Tasks

• Participants

• Results on Each Task

• Conclusion

PASCAL

Goal : Provide a testbed for comparative evaluation of ML-based IE

• Standardised data• Partitioning• Same set of features

– Corpus preprocessed using Gate– No features allowed other than the ones provided

• Explicit Tasks• Standard Evaluation

• Provided independently by a server

• For future use• Available for further test with same or new systems• Possible to publish and new corpora or tasks

PASCAL

Data (Workshop CFP)2005

1993

2000

Training Data

400 Workshop CFP

Testing Data

200 Workshop CFP

PASCAL


1993

2000

Training Data

400 Workshop CFP

Testing Data

200 Workshop CFP

Set0

Set1

Set2

Set3

PASCAL


1993

2000

Training Data

400 Workshop CFP

Testing Data

200 Workshop CFP

Set0

Set1

Set2

Set3

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9

PASCAL


1993

2000

Training Data

400 Workshop CFP

Testing Data

200 Workshop CFP

Enrich Data 1

250 Workshop CFP

Enrich Data 2

250 Conference CFP

WWW

PASCAL

Preprocessing

• GATE– Tokenisation– Part-Of-Speech– Named-Entities

• Date, Location, Person, Number, Money

PASCAL

Annotation Exercise

• 4+ months• Initial consultation• 40 documents – 2 annotators• Second consultation• 100 documents – 4 annotators• Determine annotation

disagreement• Full annotation – 10 annotators

AnnotatorsChristopher BrewsterSam ChapmanFabio CiravegnaClaudio GiulianoJose IriaAshred KhanVita LanfranchiAlberto LavelliBarry Norton

PASCAL

PASCAL

Annotation Slots

100.0%

2.3%

9.2%

4.5%

7.7%

8.5%

12.9%

12.8%

10.0%

8.0%

12.3%

11.8%

2274

75

187

90

163

190

316

326

224

215

243

245

conference

workshop

3.3%104homepage

100.0%4583Total

8.2%420acronym

4.0%204name

7.2%355camera-ready copy date

8.4%391notification of acceptance date

13.9%590paper submission date

14.3%586date

9.9%457location

9.5%367homepage

10.7%566acronym

10.8%543name

Test corpusTraining Corpus

PASCAL

Evaluation Tasks

• Task1 - ML for IE: Annotating implicit information – 4-fold cross-validation on 400 training documents

– Final Test on 200 unseen test documents

• Task2a - Learning Curve: – Effect of increasing amounts of training data on learning

• Task2b - Active learning: Learning to select documents – Given seed documents select the documents to add to training set

• Task3a - Enriched Data:– Same as Task1 but can use the 500 unannotated documents

• Task3b - Enriched & WWW Data:– Same as Task1 but can use all available unannotated documents

PASCAL

Evaluation

• Precision/Recall/F1Measure

• MUC Scorer

• Automatic Evaluation Server

• Exact matching

• Extract every slot occurrence

PASCAL

Participants

0

3b

0

3a

4

3

1

2b

8

3

2

1

2

2a

15

3

1

2

3

1

1

2

2

1

4-fold X-validation

SVM

SVM

CRF

LP2, BWI, ?

HMM

SVM

MaxEnt, HMM

SVM

SVM, IBL

HMM

LP2

ML

11Stanford (USA)

2TRex (Sheffield, UK)

3Sigletos (Greece)

333Yaoyong (Sheffield, UK)

Test CorpusParticipant

1151020Total

23Kerloch (France)

13ITC-IRST (Italy)

11Hachey (Edinburgh, UK)

1Finn (Dublin, Ireland)

1Canisius (Netherlands)

22Bechet (Avignon, France)

1111Amilcare (Sheffield, UK)

3b3a2b2a1

PASCAL

Task1

Information Extraction with all the available data

PASCAL

Task1: Test Corpus

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Precision

Re

call

AmilcareStanfordYaoyongITC-IRSTSigletosCanisiusTrexBechetFinnKerloch

PASCAL

Task1: Test Corpus

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Precision

Re

call

AmilcareStanfordYaoyongITC-IRSTSigletosCanisiusTrexBechetFinnKerloch

PASCAL

Task1: 4-Fold Cross-validation

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Precision

Rec

all

Amilcare

Yaoyong

ITC-IRST

Sigletos

Canisius

Bechet

Finn

Kerloch

PASCAL

Task1: 4-Fold & Test Corpus

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Precision

Rec

all

Amilcare

Yaoyong

ITC-IRST

Sigletos

Canisius

Bechet

Finn

Kerloch

PASCAL

Task1: Slot FMeasure

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

worksho

pnoti

worksho

ppape

worksho

pcam

e

worksho

phom

e

worksho

pdate

worksho

ploca

worksho

pnam

e

conf

eren

cenam

e

worksho

pacro

conf

eren

ceacr

o

conf

eren

cehom

e

Mean

Max

PASCAL

Best Slot FMeasures Task1: Test Corpus

Amilcare1 Yaoyong1 Stanford1 Yaoyong2 ITC-IRST2name 0.352 0.58 0.596 0.542 0.66acro 0.865 0.612 0.496 0.6 0.383date 0.694 0.731 0.752 0.69 0.589home 0.721 0.748 0.671 0.705 0.516loca 0.488 0.641 0.647 0.66 0.542pape 0.864 0.74 0.712 0.696 0.712noti 0.889 0.843 0.819 0.856 0.853came 0.87 0.75 0.784 0.747 0.783name 0.551 0.503 0.493 0.477 0.481acro 0.905 0.445 0.491 0.387 0.348home 0.393 0.149 0.151 0.116 0.119

workshop

conference

PASCAL

Slot Recall: All Participants

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7 8 9 10

Workshop name

Workshop acro

Workshop date

Workshop home

Workshop loca

Workshop pape

Workshop noti

Workshop came

Conference name

Conference acro

Conference home

PASCAL

Task 2a

Learning Curve

PASCAL

Task2a: Learning Curve FMeasure

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Amilcare

Yaoyong1

Yaoyong2

Yaoyong3

ITC-IRST1

Bechet2

Kerloch3

Bechet1

Kerloch2

Hachey

MEAN

PASCAL

Task2a: Learning Curve Precision

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Amilcare

Yaoyong1

Yaoyong2

Yaoyong3

ITC-IRST1

Bechet2

Kerloch3

Bechet1

Kerloch2

Hachey

MEAN

PASCAL

Task2a: Learning Curve Recall

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Amilcare

Yaoyong1

Yaoyong2

Yaoyong3

ITC-IRST1

Bechet2

Kerloch3

Bechet1

Kerloch2

Hachey

MEAN

PASCAL

Task 2b

Active Learning

PASCAL

Active Learning (1)

400 Potential Training Documents

200 Test Documents

PASCAL

Active Learning (1)


40 SelectedTraining

Document

200 Test Documents

Select

Test

PASCAL

Active Learning (2)


200 Test Documents

Subset040 Training Documents

Extract

PASCAL

Active Learning (2)


40 SelectedTraining

Documents

200 Test Documents

Select

TestSubset040 Training Documents

PASCAL

Active Learning (3)


200 Test Documents

Subset0,180 Training Documents

Extract

PASCAL

Active Learning (3)


40 SelectedTraining

Documents

200 Test Documents

Select

TestSubset0,180 Training Documents

PASCAL

Task2b: Active Learning

• Amilcare– Maximum divergence from expected number of

tags.

• Hachey– Maximum divergence between two classifiers

built on different feature sets.

• Yaoyong (Gram-Schmidt)– Maximum divergence between example subset.

PASCAL

Task2b: Active LearningIncreased FMeasure over random selection

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Amilcare

Yaoyong1

Yaoyong2

Yaoyong3

Hachey

PASCAL

Task 3

Semi-supervised learning

(not significant participation)

PASCAL

Conclusions (Task1)

• Top three (4) systems use different algorithms– Amilcare : Rule Induction– Yaoyong : SVM– Stanford : CRF– Hachey : HMM

PASCAL

Conclusions (Task1: Test Corpus)• Same algorithms (SVM) produced different results

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Precision

Yaoyong

ITC-IRST

Canisius

Trex

Finn

PASCAL

Conclusions (Task1: 4-fold Corpus)• Same algorithms (SVM) produced different results

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Precision

Yaoyong

ITC-IRST

Canisius

Finn

PASCAL

Conclusions (Task1)

• Task 1– Large variation on slot performance

• Good performance on:– “Important” dates and Workshop homepage

– Acronyms (for Amilcare)

• Poor performance on:– Workshop name and location

– Conference name and homepage

PASCAL

Conclusion (Task2 & Task3)

• Task 2a: Learning Curve– Systems’ performance is largely as expected

• Task 2b: Active Learning– Two approaches, Amilcare and Hachey,

showed benefits

• Task 3: Enrich Data– Not sufficient participation to evaluate use of

enrich data

PASCAL

Future Work

• Performance differences:– Systems: what determines good/bad performance– Slots: different systems were better/worse at identifying different

slots

• Combine approaches• Active Learning• Enrich data

– Overcoming the need for annotated data

• Extensions– Data: Use different data sets and other features, using (HTML)

structured data– Tasks: Relation extraction

PASCAL

Why is Amilcare Good?

PASCAL

Contextual Rules

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 2 3 4 5 6 7 8 9

no context-PRE

no context-REC

no context-FME

PASCAL

Contextual Rules

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 2 3 4 5 6 7 8 9

context-PRE

context-REC

context-FME

no context-PRE

no context-REC

no context-FME

PASCAL

Rule Redundancy

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

0.3 0.5 0.7 0.9

FMeasure

Nu

mb

er

of

Ru

les

Slots

Linear (Slots)

PASCAL PASCAL CHALLENGE ON INFORMATION EXTRACTION

Documents

data workshop cfp

workshop cfp testing

conference workshop

standardised data partitioning

conference cfp www

training documents final

seed documents

test corpus training