Top Banner
Introduction to Artificial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classifier Janyl Jumadinova November 18, 2016
34

Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Jul 19, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Introduction to Artificial IntelligenceCoreNLP, Semantic Analysis, Naives

Bayes Classifier

Janyl JumadinovaNovember 18, 2016

Page 2: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

CoreNLP

I Reference: http://stanfordnlp.github.io/CoreNLP/

I Package available in /opt/corenlp/

I Run: java -cp

"/opt/corenlp/stanford-corenlp-3.7.0/*" -Xmx2g

edu.stanford.nlp.pipeline.StanfordCoreNLP

-annotators tokenize,ssplit,pos,lemma,ner -file

input.txt

2/24

Page 3: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

CoreNLP Annotators

http://stanfordnlp.github.io/CoreNLP/annotators.html

I tokenize: Creates tokens from the given text.

I ssplit: Separates a sequence of tokens into sentences.

I pos: Creates Parts of Speech (POS) tags for tokens.

I ner: Performs Named Entity Recognition classification.

3/24

Page 4: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

CoreNLP Annotators

http://stanfordnlp.github.io/CoreNLP/annotators.html

I tokenize: Creates tokens from the given text.

I ssplit: Separates a sequence of tokens into sentences.

I pos: Creates Parts of Speech (POS) tags for tokens.

I ner: Performs Named Entity Recognition classification.

3/24

Page 5: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

CoreNLP Annotators

http://stanfordnlp.github.io/CoreNLP/annotators.html

I tokenize: Creates tokens from the given text.

I ssplit: Separates a sequence of tokens into sentences.

I pos: Creates Parts of Speech (POS) tags for tokens.

I ner: Performs Named Entity Recognition classification.

3/24

Page 6: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

CoreNLP Annotators

http://stanfordnlp.github.io/CoreNLP/annotators.html

I lemma: Creates word lemmas for tokens.

– The goal of lemmatization (as of stemming) is to reducerelated forms of a word to a common base form.– Lemmatization usually uses a vocabulary and morphologicalanalysis of words to:- remove inflectional endings only, and- to return the base or dictionary form of a word, which isknown as the lemma.

4/24

Page 7: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

CoreNLP Annotators

http://stanfordnlp.github.io/CoreNLP/annotators.html

I lemma: Creates word lemmas for tokens.– The goal of lemmatization (as of stemming) is to reducerelated forms of a word to a common base form.

– Lemmatization usually uses a vocabulary and morphologicalanalysis of words to:- remove inflectional endings only, and- to return the base or dictionary form of a word, which isknown as the lemma.

4/24

Page 8: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

CoreNLP Annotators

http://stanfordnlp.github.io/CoreNLP/annotators.html

I lemma: Creates word lemmas for tokens.– The goal of lemmatization (as of stemming) is to reducerelated forms of a word to a common base form.– Lemmatization usually uses a vocabulary and morphologicalanalysis of words to:- remove inflectional endings only, and- to return the base or dictionary form of a word, which isknown as the lemma.

4/24

Page 9: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Sentiment Analysis

5/24

Page 10: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Sentiment Analysis

I https://www.csc.ncsu.edu/faculty/healey/tweet_viz/

tweet_app/

I http://www.alchemyapi.com/developers/

getting-started-guide/twitter-sentiment-analysis

I www.sentiment140.com 6/24

Page 11: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Sentiment analysis has many other names

I Opinion extraction

I Opinion mining

I Sentiment mining

I Subjectivity analysis

7/24

Page 12: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Sentiment analysis is the detection of attitudes

I “enduring, affectively colored beliefs, dispositions towardsobjects or persons”

8/24

Page 13: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Attitudes

I Holder (source) of attitude

I Target (aspect) of attitude

I Type of attitude- From a set of types:Like, love, hate, value, desire, etc.- Or (more commonly) simple weighted polarity:positive, negative, neutral, together with strength

I Text containing the attitude- Sentence or entire document

9/24

Page 14: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Attitudes

I Holder (source) of attitude

I Target (aspect) of attitude

I Type of attitude- From a set of types:Like, love, hate, value, desire, etc.- Or (more commonly) simple weighted polarity:positive, negative, neutral, together with strength

I Text containing the attitude- Sentence or entire document

9/24

Page 15: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Attitudes

I Holder (source) of attitude

I Target (aspect) of attitude

I Type of attitude- From a set of types:Like, love, hate, value, desire, etc.- Or (more commonly) simple weighted polarity:positive, negative, neutral, together with strength

I Text containing the attitude- Sentence or entire document

9/24

Page 16: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Sentiment analysis

I Simplest task:Is the attitude of this text positive or negative?

I More complex:Rank the attitude of this text from 1 to 5

I Advanced:Detect the target, source, or complex attitude types

10/24

Page 17: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Sentiment analysis

I Simplest task:Is the attitude of this text positive or negative?

I More complex:Rank the attitude of this text from 1 to 5

I Advanced:Detect the target, source, or complex attitude types

10/24

Page 18: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Sentiment analysis

I Simplest task:Is the attitude of this text positive or negative?

I More complex:Rank the attitude of this text from 1 to 5

I Advanced:Detect the target, source, or complex attitude types

10/24

Page 19: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Baseline Algorithm

I Tokenization

I Feature Extraction

I Classification using different classifiers– Naive Bayes– MaxEnt– SVM

11/24

Page 20: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Sentiment Tokenization Issues

I Deal with HTML and XML markup

I Twitter/Facebook/... mark-up (names, hash tags)

I Capitalization (preserve for words in all caps)

I Phone numbers, dates

I Emoticons

12/24

Page 21: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Extracting Features for Sentiment

Classification

I How to handle negation:I didn’t like this movie vs. I really like this movie

I Which words to use?–Only adjectives–All words

13/24

Page 22: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Extracting Features for Sentiment

Classification

I How to handle negation:I didn’t like this movie vs. I really like this movie

I Which words to use?–Only adjectives–All words

13/24

Page 23: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Negation

Add NOT to every word between negation and followingpunctuation

14/24

Page 24: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Naive Bayes Algorithm

I Simple (“naive”) classification method based on Bayes rule

I Relies on very simple representation of document:- Bag of words

15/24

Page 25: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Naive Bayes Algorithm

16/24

Page 26: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Naive Bayes Algorithm

17/24

Page 27: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Naive Bayes Algorithm

18/24

Page 28: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Naive Bayes Algorithm

For a document d and a class c

19/24

Page 29: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Naive Bayes Algorithm

20/24

Page 30: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Naive Bayes Algorithm

21/24

Page 31: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Naive Bayes Algorithm

22/24

Page 32: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Binarized (Boolean feature) Multinomial

Naive Bayes

Intuition:

I Word occurrence may matter more than word frequency

I The occurrence of the word fantastic tells us a lot

I The fact that it occurs 5 times may not tell us much more.

Boolean Multinomial Naive Bayes

Clips all the word counts in each document at 1

23/24

Page 33: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Binarized (Boolean feature) Multinomial

Naive Bayes

Intuition:

I Word occurrence may matter more than word frequency

I The occurrence of the word fantastic tells us a lot

I The fact that it occurs 5 times may not tell us much more.

Boolean Multinomial Naive Bayes

Clips all the word counts in each document at 1

23/24

Page 34: Introduction to Artificial Intelligence CoreNLP, Semantic ... · Introduction to Arti cial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classi er Janyl Jumadinova November

Neural Networks and Deep Learning: Next!

I http://nlp.stanford.edu/sentiment/

I java -cp "/opt/corenlp/stanford-corenlp-3.7.0/*"

-Xmx2g edu.stanford.nlp.sentiment.SentimentPipeline

-file input.txt

24/24