Transcript

SEASR and UIMA

National Center for Supercomputing Applications University of Illinois at Urbana-Champaign

Mike Haberman mikeh@ncsa.uiuc.edu

UIMA

Unstructured Information Management Applications

UIMA to SEASR

SEASR

UIMA + P.O.S. tagging

Four Analysis Engines to analyze document to record POS information.

OpenNLP Tokenizer

OpenNLP PosTagger

OpenNLP SentanceDetector POSWriter

Serialization of the UIMA CAS

UIMA Structured data

•  POSWriter is a CAS Consumer

–  Extracted data from the CAS

–  Ready for import into SEASR

UIMA + P.O.S. tagging: step 1

UIMA + P.O.S. tagging: step 2

UIMA + P.O.S. tagging: step 3

UIMA + P.O.S. tagging: step 4

UIMA Structured data

•  Two SEASR examples using UIMA POS data

–  Frequent patterns (rule associations) on nouns (fpgrowth)

–  Sentiment analysis on adjectives

UIMA to SEASR: Experiment I

•  Finding patterns

SEASR + UIMA: Frequent Patterns

Frequent Pattern Analysis on nouns

•  Goal:

–  Discover a cast of characters within the text

–  Discover nouns that frequently occur together

•  character relationships

Frequent Patterns: nouns

•  Use of item sets in fpgrowth

•  What’s new:

–  handling sparse item sets

Transac'onId ItemA

ItemB

ItemC

1 0 1 1

2 1 1 1

3 1 0 1

4 1 0 0

•••

Frequent Patterns: nouns

•  What’s new:

–  handling sparse item sets

Transac'on

{A,B,C}

{X,Y}

{F,E,A,C,E}

{A,Z,X,U,I,O}

http://repository.seasr.org/Datasets/POS/ tomSawyer.NN.is, tomSawyer.NNP.is uncleTom.NN.is, uncleTom.NNP.is

Reads UIMA’s CAS consumer output •  url of the UIMA data source

Frequent Patterns: nouns

SEASR Flow http://repository.seasr.org/Meandre/Locations/1.4/Demo-UIMA/repository.ttl (similar to fpgrowth demo) {word=tom}

{word=answer} {word=tom} {word=lady,word=spectacles,word=room,word=thing,word=boy,word=state,word=pair,word=pride,word=heart,word=style,word=service,word=pair,word=stove-lids,word=moment,word=furniture} {word=bed,word=broom,word=breath,word=punches,word=nothing,word=cat} {word=aunt,word=polly,word=moment,word=laugh} {word=boy,word=anything,word=aint,word=tricks,word=fools,word=fools,word=can't,word=dog,word=tricks,word=goodness,word=days,word=body,word=dander,word=minute,word=lick,word=duty,word=boy,word=lord,word=truth,word=goodness,word=spare,word=rod,word=child,word=good,word=book,word=sin,word=suffering,word=old,word=scratch,word=laws-a-me,word=sister,word=boy,word=thing,word=heart,word=conscience,word=heart,word=breaks,word=well-a-well,word=man,word=woman,word=days,word=trouble,word=scripture,word=hookey,word=evening,word=southwestern,word=afternoon,word=saturdays,word=boys,word=holiday,word=work,word=anything,word=duty,word=ruination,word=child}

Enter number of sentences to group

Enter support: 10%

Frequent Patterns: visualization

Analysis of Tom Sawyer 10 paragraph window Support set to 10%

Frequent Patterns: nouns

•  Recap: SEASR flow information

•  The repository location is:

–  http://repository.seasr.org/Meandre/Locations/1.4/Demo-UIMA/repository.ttl

•  Reads UIMA’s CAS consumer output

–  Select file/url of the UIMA data source

–  http://repository.seasr.org/Datasets/POS tomSawyer.NN.is, tomSawyer.NNP.is, uncleTom.NN.is, uncleTom.NNP.is

•  Similar to fpgrowth demo

UIMA + SEASR: Frequent Patterns

•  Extensions

–  Analysis for separate chapters

•  Discover new relationships that occur over small windows

–  Adjectives, Adverbs

•  Common, repeating word usage, phrases

–  Entity Extraction: Dates, Locations, Geo

UIMA to SEASR: Experiment II

•  Sentiment Analysis

UIMA + SEASR: Sentiment Analysis

•  Classifying text based on its sentiment

–  Determining the attitude of a speaker or a writer

–  Determining whether a review is positive/negative

UIMA + SEASR: Sentiment Analysis

•  Ask: What emotion is being conveyed within a body of text?

–  Look at only adjectives (UIMA POS)

•  lots of issues, challenges, and but’s “but … “

UIMA + SEASR: Sentiment Analysis

•  Need to Answer:

–  What emotions to track?

–  How to measure/classify an adjective to one of the selected emotions?

–  How to visualize the results

UIMA + SEASR: Sentiment Analysis

•  Which emotions:

–  http://en.wikipedia.org/wiki/List_of_emotions

–  http://changingminds.org/explanations/emotions/basic%20emotions.htm

–  http://www.emotionalcompetency.com/recognizing.htm

•  Parrot’s classification (2001)

–  six core emotions

–  Love, Joy, Surprise, Anger, Sadness, Fear

UIMA + SEASR: Sentiment Analysis

UIMA + SEASR: Sentiment Analysis

•  How to classify adjectives:

–  Lots of metrics we could use …

•  Lists of adjectives already classified

–  http://www.derose.net/steve/resources/emotionwords/ewords.html

–  Need a “nearness” metric for missing adjectives

–  How about the thesaurus game ?

UIMA + SEASR: Sentiment Analysis

•  Using only a thesaurus, find a path between two words

–  no antonyms

–  no colloquialisms or slang

UIMA + SEASR: Sentiment Analysis

•  How to get from delightful to rainy ?

['delightful', 'fair', 'balmy', 'moist', 'rainy'].

['sexy', 'provocative', 'blue', 'joyless’]

['bitter', 'acerbic', 'tangy', 'sweet', 'lovable’]

•  sexy to joyless?

•  bitter to lovable?

UIMA + SEASR: Sentiment Analysis

•  Use this game as a metric for measuring a given adjective to one of the six emotions.

•  Assume the longer the path, the “farther away” the two words are.

•  address some of issues

UIMA + SEASR: Sentiment Analysis

•  SynNet: a traversable graph of synonyms (adjectives)

SynNet: rainy to pleasant

UIMA + SEASR: Sentiment Analysis

•  SynNet Metrics

•  Common nodes

•  Path length

•  Symmetric: a->b->c c->b->a

•  Link strength:

•  tangy->sweet

•  sweet->lovable

•  Use of slang or informal usage

UIMA + SEASR: Sentiment Analysis

•  Common Nodes

•  depth of common

UIMA + SEASR: Sentiment Analysis

•  Symmetry of path in common nodes

UIMA + SEASR: Sentiment Analysis

•  Find the shortest path between adjective and each emotion:

•  ['delightful', 'beatific', 'joyful']

•  ['delightful', 'ineffable', 'unspeakable', 'fearful']

•  Pick the emotion with shortest path length

•  tie breaking procedures

UIMA + SEASR: Sentiment Analysis

•  Not a perfect solution

–  still need context to get quality

•  Vain –  ['vain', 'insignificant', 'contemptible', 'hateful'] –  ['vain', 'misleading', 'puzzling', 'surprising’]

•  Animal –  ['animal', 'sensual', 'pleasing', 'joyful'] –  ['animal', 'bestial', 'vile', 'hateful'] –  ['animal', 'gross', 'shocking', 'fearful'] –  ['animal', 'gross', 'grievous', 'sorrowful']

•  Negation –  “My mother was not a hateful person.”

UIMA + SEASR: Sentiment Analysis

•  A word about WordNet

•  http://wordnetweb.princeton.edu/

•  English nouns, verbs, adjectives and adverbs organized into sets of synonyms (synsets)

UIMA + SEASR: Sentiment Analysis

•  Adjective islands

•  There is no path from delightful to happy

•  happy: {beaming, beamy, effulgent, felicitous, glad, happy, radiant, refulgent, well-chosen}

UIMA + SEASR: Sentiment Analysis

•  Process Overview

•  Extract the adjectives (UIMA POS analysis)

•  Read in adjectives (SEASR library)

•  Label each adjective (SynNet)

•  Summarize windows of adjectives

•  lots of experimentation here

•  Visualize the windows

UIMA + SEASR: Sentiment Analysis

•  Visualization

•  New SEASR visualization component

•  Based on flare ActionScript Library

•  http://flare.prefuse.org/

•  Still in development

•  http://demo.seasr.org:1714/public/resources/data/emotions/ev/EmotionViewer.html

UIMA + SEASR: Sentiment Analysis

UIMA + SEASR: Sentiment Analysis

•  Extensions

•  Adverbs, nouns, verbs

•  Analysis of metrics, etc

•  Goal and Relevancy

•  Two new components

•  SynNet

•  Flash based visualization of sequential based data

top related