Top Banner
Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of Texas Health Science Center at Houston 1
22

Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of

Oct 15, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of

Building pipeline-based NLP systems for your applications

Hua Xu

School of Biomedical Informatics, University of Texas Health Science Center at Houston

1

Page 2: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of

Disclosure

•  Ireceivegrantfundingfrom:– NIH:NLM,NIGMS,NCI– CPRIT(CancerPreven?onandResearchIns?tuteofTexas)

•  Ihavebeenaconsultantfor:– HebtaLLC

2

Page 3: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of

What Is NLP? •  Broad Definition – any system that

manipulates text or speech. It could involve various degrees of linguistic knowledge.

•  NLP Systems –  Natural Language understanding –  Natural Language extraction –  Natural Language generation –  Machine translation –  NLP-based information retrieval –  NLP-interfaces

3

Page 4: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of

Study of Natural Language

•  Humanlanguage(vs.formalandcomputerlanguage)

•  Linguis?cs-adescrip?onoflanguage-usedbytheore?callinguists.

•  Psycholinguis?cs-acogni?vemodelofhowpeopleunderstandandgeneratelanguage.

•  Computa?onallinguis?cs-buildcomputa?onalmodelstounderstandandgeneratelanguage.

4

Page 5: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of

Computa)onalLinguis)cs

♦ An interdisciplinary field dealing with the statistical and/or rule-based modeling of natural language from a computational perspective – Driven by need to process natural language –

convert to structured form for further computerized processes

– Computational model is not necessarily same as human model - we don’t understand much about human language facility

5

Page 6: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of

Overview of Linguistic Levels •  Phonology: units of sound combine to produce

words (will not cover) •  Morphology: basic units combine to produce

words •  Lexicography: syntactic (part of speech) and

semantic categories of words •  Syntax: structures combine to produce

sentences •  Semantics: meaning/interpretations •  Discourse – previous information affects the

interpretation of the current information •  Pragmatic: context or world knowledge affects

the interpretation of meaning

6

Page 7: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of

Morphology

•  Defini?on:Thestudyofhowwordsarecomposedfromsmaller,meaning-bearingunits(morphemes)§  Inflec?on:Wordstem+gramma?calmorpheme

○  likeàlikes,liked,liking§  Deriva?on:Wordstem+syntac?c/gramma?calmorpheme○  generalizeàgeneraliza?on

§  Compounding:Twobaseformsjointoformanewword○ bed?me

•  Applica?on:spellingcheck,stemming,POStagging,speechrecogni?on

7

Page 8: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of

Lexicography-Words

♦ Recognize word – Tokenization (determine the word boundary)

♦ Identify word – Lookup (map to dictionary entry)

♦ Categorize word – Tagging – Syntactic – Assign Part-of-Speech Tags – Semantic – Assign semantic categories

8

Page 9: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of

Syntax-Sentences♦ Definition: study of the structure of a

sentence. –  Categories combine with others to produce a well-formed

structure with underlying relations ♦ Difficulties: ambiguous, nesting, omitted

structures –  pain in (hands and feet) vs. (pain in hands) and fever

♦ Parsing – determining syntax - Formalisms: regular expressions vs. context-free

grammar - Partial vs. full parsing

9

Page 10: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of

Seman)cs♦ Lexical level – to determine the meaning of

a word ♦ Semantic categories of a word

•  Abdomen – body location •  Fever – symptom •  pt – labtest (prothrombin?meassay) vs. treatment

(physical therapy) ♦ Word sense disambiguation

♦ Grammatical level - word senses in a structure combine to form a meaning of the whole structure

10

Page 11: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of

Discourse

♦ Previous information in text affects current text – Correct reference for pronouns, definite noun

phrases, bridging noun phrases. •  Mass noted in left upper lobe. It was well-

marginated. – Time of events – Determining topic – Coherence of text

11

Page 12: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of

Pragma)cs

♦ Context affect meaning – Domain: A mass was observed – Section of Report: past history vs. hospital

course – Prior information

♦ World knowledge affects interpretation - He couldn’t do any trading on the past

Monday. (Market was closed on President Day - Monday.)

12

Page 13: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of

It’s all about Ambiguity!•  POStagging-saw (noun vs. verb) •  Semantic tagging - pt (patient, physical therapy, prothrombin?meassay) •  Syntactic parsing - The patient had pain in lower extremities. vs.

The patient had pain in emergency room.

13

S

np vp

det the

n patient

v had

np

n pain

pp

p in

np

adj lower

n extremities

S

np vp

det the

n patient

v had

n pain

pp

p in

np

n emergency

n room

Page 14: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of

Most of current clinical NLP systems are information extraction systems

•  General-purpose – MedLEE – MetaMap –  cTAKES – KnowledgeMap Concept Identifier – ….

•  Specific-purpose – MIST – the MITRE identification scrubber toolkit – MedEx – medication information extraction – ……

14

Page 15: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of

Pipeline-based architecture

15

cTAKES(clinicalTextAnalysisandKnowledgeExtrac?onSystem)UIMA(UnstructuredInforma?onManagementArchitecture)annota?onflowofsideeffectpipeline.

Source:Sohnetal.JAMIA,2011

Page 16: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of

Demo of building clinical NLP pipelines using CLAMP

•  ClinicalLanguageAnnota?on,Modeling,andProcessingToolkit(CLAMP)

•  Demo1–determinesmokingstatususingrule-basedapproaches

•  Demo2–extractlabnamesusingahybridapproachthatcombinesmachinelearningandrules

16

Page 17: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of

Introduction to CLAMP•  AgeneralpurposeclinicalNLPsystembuiltonproven

methods

•  AnIDE(integrateddevelopmentenvironment)forbuilding

customizedclinicalNLPpipelinesviaGUIs–  Annota?ng/analyzingclinicaltext–  TrainingofML-basedmodules–  Specifyingrule

17

NLPTasks Ranking

Nameden?tyrecogni?on

2009i2b2,medica?on #2

2010i2b2problem,treatment,test #2

2013SHARe/CLEFabbrevia?on #1

UMLSencoding 2014SemEval,disorder #1

Rela?onextrac?on

2012i2b2Temporal #1

2015SemEvalDisease-modifier #1

2015BioCREATIVEChemical-induceddisease #1

Page 18: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of

What does CLAMP address?

•  TheTransportabilityProblemofNLP– Fromonetypeofclinicalnotestoanother– Fromoneins?tutetoanother– Fromoneapplica?ontoanother

•  Needasolu?onfornon-NLPexpertstoefficientlybuildhigh-performanceNLPmodulesforindividualapplica?ons!

18

Page 19: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of

CLAMP Demo 1

•  Buildarule-basedsystemtoextractsmokingstatusfromclinicaltext

•  Input:sentencescontainingpa?entsmokinginforma?on

•  Output:threetypesofstatusforeachsmokingmen?on:–  CurrentSmoker:Shehasapriorhistoryofsmokingalthoughnotcurrently

–  PastSmoker:Sheiscon?nuingtosmoke– Non-Smoker:Shedeniesanytobaccouse,alcoholuse

19

Page 20: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of

CLAMP Demo 2

•  Buildahybrid(machinelearning+rules)systemforextrac?nglabtestconceptsfromclinicaltext

•  Input:dischargesummaries•  Output:labtestconceptsmen?onedinthetextwithakributesof:– Offsets– Nega?on– UMLSCUIs

20

Page 21: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of

CLAMP Availability

•  CLAMPisavailableintwoversions:–  CLAMPCMD(free)–  CLAMPGUI(dependsonthelicense)hkps://sbmi.uth.edu/ccb/resources/clamp.htm

•  Itisnotanopensourcesolware,butsourcecodesareavailableforcollaboratorswithappropriatelicenses.

•  Wearelookingforcollaboratorstoco-developthesystem!Ifinterested,pleasecontact:[email protected]

Page 22: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of

Thankyou!Ques?ons?

[email protected]

22