Top Banner
Issues in Learning an Ontology from Text Christopher Brewster, Simon Jupp, Joanne Luciano, David Shotton, Robert Stevens, and Ziqi Zhang
15

Issues in Learning an Ontology from Text

May 21, 2015

Download

Science

robertstevens65

Talk at bio-ontologies SIG at ISMB Toronto, 2008
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Issues in Learning an Ontology from Text

Issues in Learning an Ontology from

Text

Christopher Brewster, Simon Jupp, Joanne Luciano, David Shotton, Robert Stevens, and Ziqi Zhang

Page 2: Issues in Learning an Ontology from Text

The Use Case: Animal Behaviour

• Animal behaviour community recognises the need for an ontology, e.g. for video annotation/retrieval

• The community created an “Animal Behaviour Ontology” - 339 terms

• Can we (semi-) automatically build from text?

Page 3: Issues in Learning an Ontology from Text

Some Questions

• Do we get a “good ontology”?

• If not, is it useful?

• Is it low-effort?

• Should the result be “tidied up” or used as a donor?

Page 4: Issues in Learning an Ontology from Text

Methodology: Dataset

• Journal “Animal Behaviour” from Elsevier

• 623 articles from Vol 71 (2006) - Vol 74 (2007)

• 2.2 million words

• Various formats - most usefully xml

Page 5: Issues in Learning an Ontology from Text

We Want an Ontology of Green

• An ontology of “animal behaviours”

• Not an ontology of the corpus

We want the green terms in the ontology

Page 6: Issues in Learning an Ontology from Text

Processing Steps (1)

1. Text extracted from XML - excluding affiliations, acknowledgements, bibliography except for title etc.

2. Noise removed - person names, animal names, place names

3. Lemmatiser used to reduce data sparsity

4. Term extraction applied

Page 7: Issues in Learning an Ontology from Text

Processing Steps (2)5. Term selection

Regular expression used to select terms ending in behaviour, display, construction, inspection plus generic -ing, -ism, etc.

Build hierarchies using String Inclusion

6. Top level terms filtered using “Hearst Patterns” to test if X ISA behaviour/activity/etc.

WalkingRunningJumpingHuntingPeckingReed BuntingCorn BuntingHerringCourtshipStudentshipCannibalismDimorphism

Page 8: Issues in Learning an Ontology from Text

Applying String Inclusion /Rules to Terms

C

BCAC

ABC

Selection

Mate Selection

Natural Selection

Female Mate Selection

Page 9: Issues in Learning an Ontology from Text

Lexico-Syntactic Patterns

• X such as P, Q, R; X is a Y

• Grooming is a behaviour

• Copulation is an activity

• Dimorphism is a behaviour

• Calls such as trills, whistles, grunts

Page 10: Issues in Learning an Ontology from Text

Results

• 64,000 terms extracted

• The regexp selected 10,335 terms

• Step 6a resulted in an ontology with 17,776 classes and 1295 top level classes

• Step 6b resulted in an ontology with 13,058 classes and 912 top level classes

Page 11: Issues in Learning an Ontology from Text

Results (2) - Copulation Sub-tree

Page 12: Issues in Learning an Ontology from Text

Results(3)

• Evaluation of terms excluded by regexp:

• 56,000 terms excluded

• Random sample of 3140 terms evaluated by hand

• 7 verbs and 42 nouns should not have been excluded

• E.g., “interaction”

• A recall of .905

Page 13: Issues in Learning an Ontology from Text

Discussion: The problem of focus

Page 14: Issues in Learning an Ontology from Text

Other Issues

• More a vocabulary than an ontology

• SKOS-like rather than OWL-like

• Can deal with “selection”, “mate selection” and “natural selection

• Highly compositional terms “Adult male grooming behaviour”

• Cleanish list of top level terms: Canabalism, copulation, eating, foraging, fighting, grooming

Page 15: Issues in Learning an Ontology from Text

Discussion: Is it useful?

• Answers: No, yes, yes, donor

• Useful ontological fragments

• Bringing ontology to ontology learning is the research challenge

• Limitations: noise; the problem of focus; only taxonomic relations

• Advantages: speed; ease; a step towards formal ontologies