Top Banner
SEASR and UIMA National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Mike Haberman [email protected]
41
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SEASR and UIMA

SEASR and UIMA

National Center for Supercomputing Applications University of Illinois at Urbana-Champaign

Mike Haberman [email protected]

Page 2: SEASR and UIMA

UIMA

Unstructured Information Management Applications

Page 3: SEASR and UIMA

UIMA to SEASR

SEASR

Page 4: SEASR and UIMA

UIMA + P.O.S. tagging

Four Analysis Engines to analyze document to record POS information.

OpenNLP Tokenizer

OpenNLP PosTagger

OpenNLP SentanceDetector POSWriter

Serialization of the UIMA CAS

Page 5: SEASR and UIMA

UIMA Structured data

•  POSWriter is a CAS Consumer

–  Extracted data from the CAS

–  Ready for import into SEASR

Page 6: SEASR and UIMA

UIMA + P.O.S. tagging: step 1

Page 7: SEASR and UIMA

UIMA + P.O.S. tagging: step 2

Page 8: SEASR and UIMA

UIMA + P.O.S. tagging: step 3

Page 9: SEASR and UIMA

UIMA + P.O.S. tagging: step 4

Page 10: SEASR and UIMA

UIMA Structured data

•  Two SEASR examples using UIMA POS data

–  Frequent patterns (rule associations) on nouns (fpgrowth)

–  Sentiment analysis on adjectives

Page 11: SEASR and UIMA

UIMA to SEASR: Experiment I

•  Finding patterns

Page 12: SEASR and UIMA

SEASR + UIMA: Frequent Patterns

Frequent Pattern Analysis on nouns

•  Goal:

–  Discover a cast of characters within the text

–  Discover nouns that frequently occur together

•  character relationships

Page 13: SEASR and UIMA

Frequent Patterns: nouns

•  Use of item sets in fpgrowth

•  What’s new:

–  handling sparse item sets

Transac'onId ItemA

ItemB

ItemC

1 0 1 1

2 1 1 1

3 1 0 1

4 1 0 0

•••

Page 14: SEASR and UIMA

Frequent Patterns: nouns

•  What’s new:

–  handling sparse item sets

Transac'on

{A,B,C}

{X,Y}

{F,E,A,C,E}

{A,Z,X,U,I,O}

Page 15: SEASR and UIMA

http://repository.seasr.org/Datasets/POS/ tomSawyer.NN.is, tomSawyer.NNP.is uncleTom.NN.is, uncleTom.NNP.is

Reads UIMA’s CAS consumer output •  url of the UIMA data source

Frequent Patterns: nouns

SEASR Flow http://repository.seasr.org/Meandre/Locations/1.4/Demo-UIMA/repository.ttl (similar to fpgrowth demo) {word=tom}

{word=answer} {word=tom} {word=lady,word=spectacles,word=room,word=thing,word=boy,word=state,word=pair,word=pride,word=heart,word=style,word=service,word=pair,word=stove-lids,word=moment,word=furniture} {word=bed,word=broom,word=breath,word=punches,word=nothing,word=cat} {word=aunt,word=polly,word=moment,word=laugh} {word=boy,word=anything,word=aint,word=tricks,word=fools,word=fools,word=can't,word=dog,word=tricks,word=goodness,word=days,word=body,word=dander,word=minute,word=lick,word=duty,word=boy,word=lord,word=truth,word=goodness,word=spare,word=rod,word=child,word=good,word=book,word=sin,word=suffering,word=old,word=scratch,word=laws-a-me,word=sister,word=boy,word=thing,word=heart,word=conscience,word=heart,word=breaks,word=well-a-well,word=man,word=woman,word=days,word=trouble,word=scripture,word=hookey,word=evening,word=southwestern,word=afternoon,word=saturdays,word=boys,word=holiday,word=work,word=anything,word=duty,word=ruination,word=child}

Enter number of sentences to group

Enter support: 10%

Page 16: SEASR and UIMA

Frequent Patterns: visualization

Analysis of Tom Sawyer 10 paragraph window Support set to 10%

Page 17: SEASR and UIMA

Frequent Patterns: nouns

•  Recap: SEASR flow information

•  The repository location is:

–  http://repository.seasr.org/Meandre/Locations/1.4/Demo-UIMA/repository.ttl

•  Reads UIMA’s CAS consumer output

–  Select file/url of the UIMA data source

–  http://repository.seasr.org/Datasets/POS tomSawyer.NN.is, tomSawyer.NNP.is, uncleTom.NN.is, uncleTom.NNP.is

•  Similar to fpgrowth demo

Page 18: SEASR and UIMA

UIMA + SEASR: Frequent Patterns

•  Extensions

–  Analysis for separate chapters

•  Discover new relationships that occur over small windows

–  Adjectives, Adverbs

•  Common, repeating word usage, phrases

–  Entity Extraction: Dates, Locations, Geo

Page 19: SEASR and UIMA

UIMA to SEASR: Experiment II

•  Sentiment Analysis

Page 20: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Classifying text based on its sentiment

–  Determining the attitude of a speaker or a writer

–  Determining whether a review is positive/negative

Page 21: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Ask: What emotion is being conveyed within a body of text?

–  Look at only adjectives (UIMA POS)

•  lots of issues, challenges, and but’s “but … “

Page 22: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Need to Answer:

–  What emotions to track?

–  How to measure/classify an adjective to one of the selected emotions?

–  How to visualize the results

Page 23: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Which emotions:

–  http://en.wikipedia.org/wiki/List_of_emotions

–  http://changingminds.org/explanations/emotions/basic%20emotions.htm

–  http://www.emotionalcompetency.com/recognizing.htm

•  Parrot’s classification (2001)

–  six core emotions

–  Love, Joy, Surprise, Anger, Sadness, Fear

Page 24: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

Page 25: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  How to classify adjectives:

–  Lots of metrics we could use …

•  Lists of adjectives already classified

–  http://www.derose.net/steve/resources/emotionwords/ewords.html

–  Need a “nearness” metric for missing adjectives

–  How about the thesaurus game ?

Page 26: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Using only a thesaurus, find a path between two words

–  no antonyms

–  no colloquialisms or slang

Page 27: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  How to get from delightful to rainy ?

['delightful', 'fair', 'balmy', 'moist', 'rainy'].

['sexy', 'provocative', 'blue', 'joyless’]

['bitter', 'acerbic', 'tangy', 'sweet', 'lovable’]

•  sexy to joyless?

•  bitter to lovable?

Page 28: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Use this game as a metric for measuring a given adjective to one of the six emotions.

•  Assume the longer the path, the “farther away” the two words are.

•  address some of issues

Page 29: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  SynNet: a traversable graph of synonyms (adjectives)

Page 30: SEASR and UIMA

SynNet: rainy to pleasant

Page 31: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  SynNet Metrics

•  Common nodes

•  Path length

•  Symmetric: a->b->c c->b->a

•  Link strength:

•  tangy->sweet

•  sweet->lovable

•  Use of slang or informal usage

Page 32: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Common Nodes

•  depth of common

Page 33: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Symmetry of path in common nodes

Page 34: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Find the shortest path between adjective and each emotion:

•  ['delightful', 'beatific', 'joyful']

•  ['delightful', 'ineffable', 'unspeakable', 'fearful']

•  Pick the emotion with shortest path length

•  tie breaking procedures

Page 35: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Not a perfect solution

–  still need context to get quality

•  Vain –  ['vain', 'insignificant', 'contemptible', 'hateful'] –  ['vain', 'misleading', 'puzzling', 'surprising’]

•  Animal –  ['animal', 'sensual', 'pleasing', 'joyful'] –  ['animal', 'bestial', 'vile', 'hateful'] –  ['animal', 'gross', 'shocking', 'fearful'] –  ['animal', 'gross', 'grievous', 'sorrowful']

•  Negation –  “My mother was not a hateful person.”

Page 36: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  A word about WordNet

•  http://wordnetweb.princeton.edu/

•  English nouns, verbs, adjectives and adverbs organized into sets of synonyms (synsets)

Page 37: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Adjective islands

•  There is no path from delightful to happy

•  happy: {beaming, beamy, effulgent, felicitous, glad, happy, radiant, refulgent, well-chosen}

Page 38: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Process Overview

•  Extract the adjectives (UIMA POS analysis)

•  Read in adjectives (SEASR library)

•  Label each adjective (SynNet)

•  Summarize windows of adjectives

•  lots of experimentation here

•  Visualize the windows

Page 39: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Visualization

•  New SEASR visualization component

•  Based on flare ActionScript Library

•  http://flare.prefuse.org/

•  Still in development

•  http://demo.seasr.org:1714/public/resources/data/emotions/ev/EmotionViewer.html

Page 40: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

Page 41: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Extensions

•  Adverbs, nouns, verbs

•  Analysis of metrics, etc

•  Goal and Relevancy

•  Two new components

•  SynNet

•  Flash based visualization of sequential based data