Top Banner
Deep Distillation from Text Naveen Ashish University of Southern California & Cognie Inc., March 18 th 2014
48

Deep Distillation from Natural Language Text

May 10, 2015

Download

Technology

Naveen Ashish

Invited Talk at Text Analytics World TAW 2014
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Deep Distillation from Natural Language Text

Deep Distillation from TextNaveen Ashish

University of Southern California & Cognie Inc.,

March 18th 2014

Page 2: Deep Distillation from Natural Language Text

This is about …..“DEEP TEXT DISTILLATION”The hard nut of having computers “understand” natural

language (text) …. Pushing the boundaries of what we can achieve ….

"It's (the problem of computers understanding natural language) ambitious ...in fact there's no more important project than understanding intelligence and recreating it.“ - Ray Kurzweil (2013)

Alan Turing based the Turing Test entirely on written language….To really master natural language …that’s the key to the Turing Test–to a human requires the full scope of human intelligence. …So the point is that natural language is a very profound domain to do artificial intelligence in. - Ray Kurzweil (2013)

Page 3: Deep Distillation from Natural Language Text

Why ….

the problem is far from solved ….. !!!! unstructured data everywhere

95 % !

search

text analytics

big data analytics

health informatics

social-media intelligence

Page 4: Deep Distillation from Natural Language Text

Introduction

About myselfAssociate Professor (Informatics), Keck School of Medicine,

University of Southern CaliforniaCognie Inc.,

Work leverages Information extraction work and systems developed at UC Irvine

XAR, UCI-PEP

Advisory consulting engagements with several companies and start-ups

Page 5: Deep Distillation from Natural Language Text

Outline

Deep distillation: What is and why State-of-the-artFundamentalsApproach Details

Expressions, Entities, SentimentCase studies

Retail, Health, Risk assessmentConclusions

Page 6: Deep Distillation from Natural Language Text

What is “Deep” text distillation ?

Page 7: Deep Distillation from Natural Language Text

Data

AbstractThis paper describes the results of a study investigating ….…..We conclude that salt and diabetes are largely unrelated.

Page 8: Deep Distillation from Natural Language Text

Deep Distillation

The abstract, not explicitly mentioned !What falls in this category

ExpressionsContextual sentimentAspect classification

I think you need better chefs SUGGESTION

The mocha is too sweet NEGATIVE

I used to take Lipitor for … PERSONAL EXPERIENCE

The dim lights have a cozy effect …. AMBIENCE

Page 9: Deep Distillation from Natural Language Text

A Common Intersection

Distill at sentence level Aggregate to entire feedback, post, comment or

threadThree primary elements

Expression/IntentEntities/Aspects (and Classes) Sentiment

Page 10: Deep Distillation from Natural Language Text

Why Deeper ?

Goal: Get actionable insights from data ! Hypothesis: Deeper extraction Better insights !

The top advice items advised for skin rash are aloe vera, vitamin E oil and oatmeal

Complaints comprise 36% of the overall feedback with top issues being slow service, drinks and coffee

Page 11: Deep Distillation from Natural Language Text

Context

COGNIETM: A PLATFORM for text analytics

COGNIE TM

XAR UCI-PEP

SHIP SURVEY ANALYTICS

RETAIL ANALYTICS

RISK ASSESSMENT

Page 12: Deep Distillation from Natural Language Text

Expressions

Beyond entities and sentiment : EXPRESSSIONSEXPRESSIONS

Introduced in [Ashish et al, 2011]

Page 13: Deep Distillation from Natural Language Text

Expressions

You should try Vitamin E oil … ADVICE

..I have had arthritis since 1991… EXPERIENCE

HEALTH

..for me lipitor worked like a charm… OUTCOME

Page 14: Deep Distillation from Natural Language Text

Expressions

…showers had no hot water !… COMPLAINT

..you should have more veggie options… SUGGESTION

RETAIL/ENTERPRISE

..meats on special this weekend… ANNOUNCEMENT

..this is the best store on the west side… ADVOCACY

There is hardly any evidence to suggest a link between salt and diabetes -

This results confirm that high intake of salt leads to increase in BP +

RISK ASSESSMENT

Page 15: Deep Distillation from Natural Language Text

The Landscape

Page 16: Deep Distillation from Natural Language Text

Text Analytics Spectrum

Wide offering of Text analytics engines Text analysis tools – many open-source

Largely still for “spotting things” entities, concepts, sentiment, topics, emotions ….

Going deeper Luminoso Attensity (Intents)

Deep Learning for Sentiment Stanford

Recursive Neural Networks

Page 17: Deep Distillation from Natural Language Text

Approach

Page 18: Deep Distillation from Natural Language Text

Approach

natural language processing

machine learning

semantics

Page 19: Deep Distillation from Natural Language Text

Architecture: COGNIE TM Platform

Segmentation

POS Tagging

Entity extraction

Anaphora

Parsing

Gram analysis

Existing (DMOZ, SNOMED,UMLS)

Creation

Declarative

Naïve-Bayes

MaxEnt

TFIDF

CRF

RNN Deep Learning

ENSEMBLE

NLP

Machine Learning

Knowledge Engineering

Page 20: Deep Distillation from Natural Language Text

The Indicators: “Give Aways”

A combination of multiple types of elements !

…showers had no hot water !… COMPLAINT

(You) should have more veggie options… SUGGESTION

..i have been on lipitor… EXPERIENCE

..this is the best store on the west side… ADVOCACY

Page 21: Deep Distillation from Natural Language Text

Approach: Given Indicators

NLP Identification of individual elements

Unsupervised

Relationships between elementsSemantics

Identification of individual elements Knowledge driven

Machine Learning ClassificationCombine elements classify

Page 22: Deep Distillation from Natural Language Text

Natural Language Processing

UIMA and GATE Stanford NLP Tools

POS tagging Parsing NE Recognizer Geo-tagger ….

Page 23: Deep Distillation from Natural Language Text

Natural Language Processing

Text Segmentation In many cases the “unit” if distillation is a sentence

Segmentation UIMA (or GATE) Custom

Complex sentence segmentation Breakup into individual clauses

Page 24: Deep Distillation from Natural Language Text

NLP

Part-of-speech tags are key indicators Expression distillation

Entity extractionNames, Locations, Organizations

Parsing If required

Anaphora

Page 25: Deep Distillation from Natural Language Text

NGram Analysis

Unigram and Bigram analysisObtain

Grams FrequencyEntropy

Grams of tokens as well as POS PatternsVB VBD

Page 26: Deep Distillation from Natural Language Text

Before Automated Classification: Manual PatternsSoL: Sequences of LabelsLabels

LEX-FOODADJ spicy

LEX-EXCESS too, very

ONT-FOODPOS-NOUN

Sequences (Patterns)ANY LEX-EXCESS LEX-FOODADJ ANY POS-VB POS-MD ….

Page 27: Deep Distillation from Natural Language Text

Classification: Machine Learning

Classification tasksExpression (Contextual) SentimentAspect category

FrameworksWeka Mallet

Page 28: Deep Distillation from Natural Language Text

Baseline Classifiers

Mallet and WekaNaiveBayesMaxEntCRF

Gram-basedUni, Bi and Trigram features

Baseline~ 10% accuracy

Page 29: Deep Distillation from Natural Language Text

Expression Classification: Features

FeaturesPolar wordsPunctuationsNgramsPOS patterns Length !Beginning Ontology…

Page 30: Deep Distillation from Natural Language Text

Classifiers

TreesDecision Tree (J48)

Functions Logistic Regression SVM

Sequence TaggingCRF: Conditional Random Fields

Page 31: Deep Distillation from Natural Language Text

Expression Classification: Results

Have achieved 75% precision and recall for all expressions considered

Factors Feature engineeringClassifier selectionKnowledge engineering

Page 32: Deep Distillation from Natural Language Text

Contextual Sentiment

(Just) polar words can be misleading !Polar words many not be present at all !Combination of elements

The mocha is too sweet

Wait time is over an hour

Aisles are too narrow

Service is slow

Page 33: Deep Distillation from Natural Language Text

Semantics: Ontologies Health

Drugs Conditions Procedures Symptoms …

Retail (Dining) Food/Entrees Service Ambience ….

Page 34: Deep Distillation from Natural Language Text

Leverage Existing Knowledge Sources

Health informatics UMLS

NCI Thesaurus

SNOMEDRetail

DMOZMany other

Freebase Wikipedia, DBPedia

OpenData data.gov

Page 35: Deep Distillation from Natural Language Text

Knowledge Engineering Tools

“Mini” ontology creationAPI access

FreebaseBioPortal

WrappersDMOZ, ….

Page 36: Deep Distillation from Natural Language Text

Practical Requirements

Confidence MeasuresBelow threshold routed to manual transcription teams

PolaritySnippets

Page 37: Deep Distillation from Natural Language Text

Open-Source Leverage

Page 38: Deep Distillation from Natural Language Text

COGNIE TM : Open Source ToolsFramework

UIMAClassification

WekaMallet

NLP Stanford tools

Indexing Lucene

DatabasesMySQL, MongoDB

Knowledge EngineeringProtégé

Page 39: Deep Distillation from Natural Language Text

Select Case Studies

Page 40: Deep Distillation from Natural Language Text

Case Study: Health Informatics

Page 41: Deep Distillation from Natural Language Text

Distillation

Page 42: Deep Distillation from Natural Language Text

Case Study: Retail & Survey Analytics

Feedback Direct, device collected Social-media

Typically short, few sentencesStrong requirement for aspect classification

[Food,Service,Ambience,Pricing,Other]Negative : “Immediate” vs “Long Term” classification

…food was awesome, service needs improvement ….

you need to be open longer !

Page 43: Deep Distillation from Natural Language Text

Case Study: Risk Assessment

Biomedical Literature AbstractsCorrelation direction (+ -) SubjectArticle type

FeaturesClausesNegation and Triggers Semantic Heterogeneity

Page 44: Deep Distillation from Natural Language Text

Performance

Page 45: Deep Distillation from Natural Language Text

MapReduce

Throughput can be an issueComplex language processing algorithms Large ontologies in some cases

Hadoop MapReduce [Kahn and Ashish, 2014]

Page 46: Deep Distillation from Natural Language Text

Conclusions

Page 47: Deep Distillation from Natural Language Text

Conclusions

Deeper distillation from text is importantCan be achieved by

Detecting and combining multiple elements in text Feature engineering Knowledge engineering Classifier selection

Does not have to be perfect Every domain, dataset has its nuances