Top Banner
Ontology-Aware Information Extraction http://gate.ac.uk/ Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of Sheffield OntoWeb 4, SIG 5, 2002
12

Ontology-Aware Information Extraction Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of Sheffield OntoWeb.

Dec 24, 2015

Download

Documents

Lee Lang
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ontology-Aware Information Extraction  Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of Sheffield OntoWeb.

                                                                                                                           

Ontology-Aware Information Extraction

http://gate.ac.uk/

Hamish Cunningham, Kalina Bontcheva

Department of Computer Science, University of Sheffield

OntoWeb 4, SIG 5, 2002

Page 2: Ontology-Aware Information Extraction  Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of Sheffield OntoWeb.

2(12)

                                                                                                                           

GATE, a General Architecture for Text Engineering

GATE is….• An architectureA macro-level organisational picture for LE software systems. • A frameworkFor programmers, GATE is an object-oriented class library that implements the architecture. • A development environmentFor language engineers, computational linguists et al, GATE is a graphical development environment bundled with a set of tools for doing e.g. Information Extraction. • Free software (LGPL). Mature robust software (in development since 1995). Download at http://gate.ac.uk/download

Comes with…• Some free components... ...and wrappers for other people's components • Tools for: evaluation; visualise/edit; persistence; IR; IE; dialogue; ontologies; etc.

Page 3: Ontology-Aware Information Extraction  Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of Sheffield OntoWeb.

3(12)

                            Applications; languagesGATE has been used for a variety of applications, including:

• MUMIS: automatic creation of semantic indexes for multimedia programme material

• MUSE: a multi-genre IE system

• EMILLE: a 70 million word corpus of Indic languages

• Metadata for Medline (at Merck)

• Creation of metadata for Semantic Web Services; documentation using NLG

• HSE: summarisation of health and safety information from company reports

• OldBaileyIE: NE recognition on 17th century Old Bailey Court reports.

• AKT: language technology in knowledge management

• AMITIES: call centre automation

• Digital libraries / e-philology for ancient languages researchers

• Various Medical Informatics and database technology projects

• IE in Romanian, Bulgarian, Greek, Bengali, Spanish, Swedish, German, Italian, and

French (Arabic, Chinese and Russian next year)

Page 4: Ontology-Aware Information Extraction  Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of Sheffield OntoWeb.

4(12)

Some users…At time of writing a representative fraction of GATE users includes: • Longman Pearson publishing, UK; • BT Exact Technologies, UK;• Merck KgAa, Germany; • Canon Europe, UK; • Knight Ridder (the second biggest US news publisher); • BBN Technologies, US;• Sirma AI Ltd., Bulgaria; • Resco AB, Sweden/Finland/Germany;• Glaxo Smith Kline Plc: drug-based navigation of Medline abstracts• Master Foods NV: extraction of commodities events from news• the American National Corpus project, US; • Imperial College, London, the University of Manchester, Queen Mary

College, UMIST, the University of Karlsruhe, Vassar College, ISI / the University of Southern California and a large number of other UK, US and EU Universities;

• the Perseus Digital Library project, Tufts University, US.

Page 5: Ontology-Aware Information Extraction  Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of Sheffield OntoWeb.

5(12)

Scientific method and HLT

• How do we really know that this stuff works?!• Open source systems make experimental

repeatability easier and therefore cut down on site-specific skew effects.

• GATE's IE tools have competed in MUC, TREC (QA), ACE, and DUC. TIDES Surprise Language exercise next year.

• GATE includes markup and automated evaluation tools: easier quantitative evaluation.

Page 6: Ontology-Aware Information Extraction  Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of Sheffield OntoWeb.

6(12)

Collaboration opportunities

• Interoperation, integration, not re-invention: collaboration not competition

• Take the code, do what you like with it, perhaps contribute something back

• Involve us in your 6th Framework projects

• Join KITShare: a network of excellence in Knowledge and Interface Tool Sharing.

Page 7: Ontology-Aware Information Extraction  Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of Sheffield OntoWeb.

7(12)

                                                                                                                           

The Holy Grail

• Problem: gap between many current IE tools and SemWeb needs

Page 8: Ontology-Aware Information Extraction  Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of Sheffield OntoWeb.

8(12)

What is needed?

• Content, not Information Extraction– Identify the ontological reference, not just the

class – Maintain referential integrity (coreference)

• Ontology-aware IE tools– Use instances already in the ontology– React to changes in the ontology

• Support experienced users to change the IE tools

Page 9: Ontology-Aware Information Extraction  Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of Sheffield OntoWeb.

9(12)

GATE and Content Extraction

ANNIE - Open-source IE system in GATE, providing modules needed for content extraction

• Pre-processing• Named entity recognition• Coreference resolution

– ANNIE handles proper names, pronouns, and nominals

• Easy-to-use pattern-action rule language to enable customisation and postprocessing of the IE results

Page 10: Ontology-Aware Information Extraction  Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of Sheffield OntoWeb.

10(12)

Populating Ontologies with ANNIE

Page 11: Ontology-Aware Information Extraction  Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of Sheffield OntoWeb.

11(12)

Ontologies as explicit IE resources

• Reuse, not reinvention: – Protégé for ontology maintenance– Sesame/KAON for storage and reasoning

• Ontology-aware gazetteers– Provide the ontological class of each entry– Use instances from the ontology for IE

Page 12: Ontology-Aware Information Extraction  Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of Sheffield OntoWeb.

12(12)

Ontology-aware IE

• The IE tools can use available formal knowledge and reasoning

• Ontology-based anaphora resolution– G. Bush, G. Brown, the president

• The correct ontological classes are assigned to the recognised entities

• Changes in the ontology available to the IE tools