Top Banner
I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez
18

I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.

Jan 02, 2016

Download

Documents

Meghan Gibbs
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.

I2B2 Shared Task 2011

Coreference Resolution in Clinical Text

David Hinote Carlos Ramirez

Page 2: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.

What is coreference resolution?

• Nouns, pronouns, and phrases that refer to the same object, person or idea are coreferent.o Example: "Alexander was playing soccer yesterday. He

fell and broke his knee."o "Alexander", "he", and "his" refer to the same person, so

they are said to be coreferent. 

Page 3: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.

The i2b2 Challenge

  I2B2: "Informatics for Integrating Biology and the Bedside"• This program has issued a challenge in NLP involving

Coreference resolution.• The challenge is to find co-referential relations within a given

medical document. • The concepts that can be corefered are all annotated.• There are 5 classes of concepts:

o Problemo Persono Treatmento Testo Pronoun 

Page 4: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.

Concept MentionsPeople• Any mention that refers to a person, or group of people

o Dr. Lightman, The patient, cardiologyProblems• A mention that refers to the reason the subject of the

document is in the hospitalo Heart attack, blood pressure, broken leg

Tests• Tests performed by doctors

o EKG, temperature, CAT scanTreatments• Solutions to the problem mentions, or work performed to cure

patientso Brain surgery, ice pack, Tylenol

Pronouns can refer to any of the four other types of mentions 

Page 5: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.

Approaches for Competing 

• Using tools already made & publicly availableo Stanford NLPo BART Coreferenceo LingPipeo CherryPickero Reconcileo ARKRefo Apache Open NLP

• Coding our own Coreference Tool

Page 6: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.

Other Coreference Tools

• We obtained versions of other Coreference tools and tested them on our data.

• All tools we found were either still in their initial development stages, or were built for their specific purpose and left alone after. (i.e. Coreference on the MUC datasets)

• Testing shows that at best, the other tools we found do not perform acceptably with our data.

• After attempts to train other tools using our data failed, we felt it best to code our own approach.

Page 7: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.

Other Tools Statistics

Page 8: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.

Algorithm

• Because the data we are working on is so specific, we chose to use a rule based approach to coreference resolution.

• This means that we try to learn the characteristics of each coreferent link ourselves, and program a method for the link manually.

• We examine concepts in a file, and if they meet our criteria, we create a method to link them.

• The idea is to create specific rules, yet generalized enough to apply to similar mentions in all documents. 

   

Page 9: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.

Our Application

• To help visualize coreferent links and see what links our program detects, we use a GUI created with Java.

•  Our program is developed by us using the Mecurial version control system to allow us to keep each others code up to date.

• Uses our coded algorithms to determine coreferent links between the given concept mentions.

• It displays coreferent links as lines.o Blue for true links.o Red for links that are detected by our algorithm.

Page 10: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.

Our Application

Programmed in Java, our application can utilize databases, and the internet to gather information about concept mentions being tested.• We have set up a database to hold data that gives meaning

to concept mentions being tested, or to certain key words in a sentence that contains a mention.  If words or phrases meet our criteria, they can be added to the appropriate table straight from the program window.

• For each mention, information is extracted by the program from Google.com searches as well, which can give the program a wealth of information about the mentions.

Page 11: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.

Sample file• Viewing Concepts & I2B2 Chain

Page 12: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.

File with both UHD and I2B2 Links Shown

Page 13: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.

Statistics for our System

Page 14: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.

Progress

• We are currently at around 75% F1 score. (Averaged over all test files.)

• Most algorithms for resolving coreference tend to have accuracy in the 60% range.

• With the time we have left, we will definitely increase this score.

• We still haven't added detection for "Treatment" type concepts, which constitute a significant percentage of the concepts not found when computing our F1 score.

• Detection for "Test" type concepts still needs work.

Page 15: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.

Current work

Test Mentions• Precision on "Test" type concepts is relatively low (30%).• Mainly this is because many of the tests involve specific

body parts (e. g. "chest x-ray" and "chest CT" are sometimes linked by our rules).

• Tests also often involve times (e. g. "an x-ray was performed on 5 Aug." would link with "the x-ray on... December 10, 2010").

• They also involve position (e.g. "x-ray on left lung" "x-ray on right lung")

Page 16: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.

Current Work

Problem Mentions• Work on these mentions is about 50% complete• To finish, a few more database tables will need to be set up,

and certain types of medical vocabulary loaded into them.• We will also need a system for finding phrases made of

different words, but mean the same thing AKA a thesaurus 

Page 17: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.

Possible future problems

• The main risk with a rule based approach is that our rules might be too specific to work with the contest data once it's distributed.

• Given the execution speed of our program, we should have enough time to do any necessary modifications in the three days between contest data being sent and results submitted.

• There is also a slight problem with the fact that our application is made for a very specific purpose and is probably hard to generalize beyond the context of medical documents.

• Most coreference resolution tools are this way though.• Not being able to code fast enough!

Page 18: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.

Future Necessities• A reliable way to find the temporal setting of a particular

sentence.  o Did an injury described happen 20 years ago, or is the

doctor giving instructions for a future case?  These are not coreferent even though they may be the same word

• Thesaurus worko finding phrases that mean the same thing, but

use completely different words• Output

o The program will not output files in the I2B2 competition format, we will have this feature made as the competition deadline draws near.