Top Banner
Crowds & Niches Teaching Machines to Diagnose Crowds & Niches Teaching Machines to Diagnose Lora Aroyo
40

Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Nov 15, 2014

Download

Technology

Lora Aroyo

This presentation was given at the NL eSchience Center during the "De Geest Uit De Fles" event for the kick off of eHumanities project in 2014:

http://esciencecenter.nl/agenda/703-26-may-de-geest-uit-de-fles/
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Crowds & Niches ���Teaching Machines to Diagnose

Lora Aroyo

Page 2: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose IBM Confidential

•  Open Domain Question-Answering Machine, that given – Rich Natural Language Questions – Over a Broad Domain of Knowledge

•  Won a 2-game Jeopardy match against the all-time winners –  viewed by over 50,000,000

Page 3: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Watson MD

•  Adapt Watson to Medical QA •  Mainly an NLP task •  Cognitive computing systems need human-

annotated data for training, testing, evaluation

the human annotation task is one of semantic interpretation

Now answering medical questions!

Page 4: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Gadolinium agents are useful for patients with renal impairment, but in patients with severe renal failure requiring dialysis it presents a risk of nephrogenic systemic fibrosis.

Mention detection: find the spans (begin, end) of relevant medical terms (factors) in a passage. Factor Typing: find the type of each mention

substance disorder

disorder

NER

disorder

treatment

NLP Tasks

Page 5: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

NLP Tasks Gadolinium agents are useful for patients with renal

impairment, but in patients with severe renal failure requiring dialysis it presents a risk of nephrogenic systemic fibrosis.

Mention detection: find the spans (begin, end) of relevant medical terms (factors) in a passage. Factor Typing: find the type of each mention Factor (Entity) Identification: find the corresponding ids for a mentioned factor in a knowledge-base

C0016911 C1408325

C0035078

C1619692

C0019004

NLP Tasks

Page 6: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Gadolinium agents are useful for patients with renal impairment, but in patients with severe renal failure requiring dialysis it presents a risk of nephrogenic systemic fibrosis.

Mention detection: find the spans (begin, end) of relevant medical terms (factors) in a passage. Factor Typing: find the type of each mention Factor (Entity) Identification: find the corresponding ids for a mentioned factor in a knowledge-base Relation detection: find relations that are expressed in a passage between factors?

cause treats

treats

contra- indicates

NLP Tasks

Page 7: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

NLP Tasks Gadolinium agents are useful for patients with renal

impairment, but in patients with severe renal failure requiring dialysis it presents a risk of nephrogenic systemic fibrosis.

Mention detection: find the spans (begin, end) of relevant medical terms (factors) in a passage. Factor Typing: find the type of each mention Factor (Entity) Identification: find the corresponding ids for a mentioned factor in a knowledge-base Relation detection: find relations that are expressed in a passage between factors? Coreference: Find the mentions in a sentence that refer to the same factor.

Page 8: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Gold Standard Assumption

•  Cognitive systems need to be told what is right & what is wrong •  A gold standard or ground truth

•  Performance is measured on test sets vetted by human experts à never perfect, always improving against test data

•  Historically, gold standards are created assuming that for each annotated instance there is a single right answer

•  Gold standard quality is measured in inter-annotator agreement à does not account for perspectives, for reasonable alternative interpretations

Page 9: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

but people don’t always agree…

Page 10: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Disagreement

Gadolinium agents are useful for patients with renal

impairment, but in patients with severe renal failure requiring dialysis there is a risk of nephrogenic systemic fibrosis.

cause

Page 11: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Gadolinium agents are useful for patients with renal

impairment, but in patients with severe renal failure requiring dialysis there is a risk of nephrogenic systemic fibrosis.

side-effect The human annotation task is one of semantic interpretation

Disagreement

Page 12: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Position

maybe this disagreement is a signal and not noise?

can we harness it?

Page 13: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Key Question

How do we represent & measure disagreement in a

way that it can be harnessed?

Page 14: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Crowd Truth

Annotator disagreement is signal, not noise.

It is indicative of the variation in human semantic interpretation of signs, and can indicate ambiguity,

vagueness, over-generality, etc.

http://www.freefoto.com/preview/01-47-44/Flock-of-Birds

Page 15: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Position

symbiosis between humans & machines

machines learn from humans & machine help humans

Page 16: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Crowd Truth Framework

Page 17: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Human-Machine Workflows

Page 18: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Relation Extraction Crowdsourcing Ground Truth Data: CrowdTruth

Relations overlap in meaning Sentences are vague and ambiguous Experts have different interpretations

Page 19: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Page 20: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Representation Worker Vector

1 1 1

Gadolinium agents are useful for patients with renal impairment, but in patients with severe renal failure requiring dialysis there is a risk of nephrogenic systemic fibrosis.

Page 21: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Representation Sentence Vector

1 1 1

1 1

1

1 1

1 1

1 1

1

1

1

0 1 1 0 0 4 3 0 0 5 1 0

Page 22: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Feeling the way the CHEST expands (PALPATION), can identify areas of the lung that are full of fluid.

?PALPATIONIs CHEST related to

diagnose location associated with

is_a otherpart_of

0 0 02 3 0 0 0 1 0 0 44 1

Disagreement for Sentence Clarity

Unclear relationship between the two arguments reflected in the disagreement

Page 23: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

?CONJUNCTIVITISHYPERAEMIA related toIs0 0 0 1 0 0 0 013 0 0 0 0 0

symptomcause

Redness (HYPERAEMIA), irritation (chemosis) and watering (epiphora) of the eyes are symptoms common to all forms of CONJUNCTIVITIS.

Disagreement for Sentence Clarity

Clearly expressed relation between the two arguments reflected in the agreement

Page 24: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Sentence-Relation Score

Measures how clearly a sentence expresses a relation

0 1 1 0 0 4 3 0 0 5 1 0

Unit vector for relation R6

Sentence Vector

Cosine = .55

Page 25: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Worker Disagreement

Measured per worker

Worker-sentence disagreement

0 1 1 0 0 4 3 0 0 5 1 0

Worker’s sentence vector

Sentence Vector

AVG (Cosine)

Page 26: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Crowd Truth Metrics Relation Extraction

Three parts to understand human interpretations: §  Sentence

•  How good is a sentence for relation extraction task?

§  Workers •  How well does a worker understand the sentence?

§  Relations •  Is the meaning of the relation clear? •  How ambiguous/confusable is it?

Page 27: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Human-Machine Workflows

Page 28: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Crowdtruth.org

Page 29: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Crowdtruth.org

Page 30: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Provenance of Crowdsourcing

Page 31: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Watson MD

• Not every task is suitable for lay crowd, some require domain expertise

• Domain experts are busy • How to get them motivated

to perform annotation tasks? • How to make it efficient for

them and effective for annotations?

Crowd vs. Experts

Page 32: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Dr. Watson Experts Game

Page 33: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Dr. Watson Experts Game

Page 34: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Dr. Watson Experts Game

Page 35: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Dr. Watson Experts Game

Page 36: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Dr. Watson Experts Game

Page 37: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

Dr. Watson Experts Game

Page 38: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

•  Experimenting with: •  different domains, e.g. art, history, news •  different formats, e.g. text, images, videos •  different annotation tasks, e.g.

•  medical factors, relations, synonyms, negation •  events, event types, participants, locations •  flowers, birds

•  Integrating crowds from mTurk and CrowdFlower with domain experts from Dr. Detective, Waisda? and Accurator

Domain Independent

Page 39: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

The Crew

•  Lora Aroyo (VU) •  Chris Welty (IBM) •  Robert-Jan Sips (IBM) •  Anca Dumitrache (VU) •  Oana Inel (VU) •  Khalid Khamkham (VU) •  Tatiana Cristea (VU)

• Rens v. Honschooten (VU) • Benjamin Timmermans (VU) • Harriëtte Smook (VU) • Arne Rutjes (IBM) • Jelle van der Ploeg (IBM)

Page 40: Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014

Crowds & Niches Teaching Machines to Diagnose

http://crowdtruth.org