Top Banner
DCNN for text B01902004 蔡蔡蔡
59

Dcnn for text

Aug 06, 2015

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dcnn for text

DCNN for text

B01902004 蔡捷恩

Page 2: Dcnn for text

A CNN for modeling Sentences

Kalchbrenner, Nal, Edward Grefenstette, and Phil Blunsom. "A convolutional neural network for modelling sentences."arXiv:1404.2188 (2014).

Page 3: Dcnn for text

Sentence model

• Sentence -> feature vector, that’s all !• However, it is the core of:• Sentiment analysis, paraphrase detection,

entailment recognition, summarisation, discourse analysis, machine translation, grounded language learning, image retrieval …

Page 4: Dcnn for text

contribution

• Does not rely on parse tree• Easily applicable to any language ?

Page 5: Dcnn for text

How to model a sentence?

• Composition based method• Need human knowledge to compose

• Automatically extracted logical forms• Ex. RNN, TDNN

Page 6: Dcnn for text

Brief network structure

• Interleaving k-max pooling & 1-dim-conv. + TDNN => generate a sentence graph

A kind of syntax tree ?

Page 7: Dcnn for text

NN sentence model with syntax tree(Recursive NN, RecNN)

Reference syntax treewhile training

Share weightand stack up to form the network

Page 8: Dcnn for text

RNN for sentence modelLinear “structure”

Page 9: Dcnn for text

Back to DCNN

• Convolution• TDNN• K-max pooling( Dynamic k-max pooling)

Page 10: Dcnn for text

ConvolutionNarrow type, win=5

wide type, win=5 (0-padding)

Page 11: Dcnn for text

Max-TDNNGOAL: recognize features independent of time-shift

(i.e. sequence position)

Page 12: Dcnn for text

Take a look at DCNN

Need to be optimized during training

If we use Max-TDNN

Page 13: Dcnn for text

K-max pooling

• Given k, no matter how many dimension an input get, pool the top-k ones as output, “the order of output corresponds to their input”

• Better than max-TDNN by:– Preserve the order of features– Discern more finely how high activated feature

react• Guarantee the length of input to FC

independent of sentence length

Page 14: Dcnn for text

Only fully connected need fix length

• Intermediate layers can be more flexible• Dynamic k-max Pooling !

Page 15: Dcnn for text

Dynamic k-max Pooling

• K is a function of length of the input sentence and depth of the network

The k of currently concerned layer

Fixed k-max pooling’s k at the top

Total # of conv. in the network ( the depth)

Input sentence length

Page 16: Dcnn for text

Folding

• Feature detectors in different rows are independent of each other until the top fully connected layer

• Simply do vector sum

Page 17: Dcnn for text

+

Page 18: Dcnn for text

Properties

• Sensitive to the order of words• Filters of the first layer model n-grams, n ≤m• Properties invariance of absolute position

captured by upper layer convs.• Induce feature graph property

Page 19: Dcnn for text

ExperimentsSentiment analysisStanford Sentiment TreebankMovie review, 5 scense, +/- label

Page 20: Dcnn for text

Experiments Question type predictionon TREC

Page 21: Dcnn for text

Experiments Twitter sentiment dataset, binary label

Page 22: Dcnn for text

Experiments

• Visualizing feature detectors

Page 23: Dcnn for text

Think about it

• Can this kind of k-max pooling apply to image tasks ?

Page 24: Dcnn for text

A CNN for matching nature language sentences

Hu, Baotian, et al. "Convolutional neural network architectures for matching natural language sentences." Advances in Neural Information Processing Systems. 2014

Page 25: Dcnn for text

Why convolution approach

• No need prior knowledge

Page 26: Dcnn for text

Contribution

• Hierarchical sentence modeling

• The capturing of rich matching patterns at different levels of abstraction

Page 27: Dcnn for text

Convolutional Sentence Modeling

Word2vec pre-trained

2-window max poolingFixed input len

Page 28: Dcnn for text

A trick on zero-padding

• The variable length of sentence may be in a fairly broad range

• Introduce gate operation

• g(z) = <0> while z = <0>, otherwise, <1>• No bias !

Page 29: Dcnn for text

Conv + Max poolComposition

Page 30: Dcnn for text

RNN vs ConvNet

ConvNet RNN

Hierarchical structure

W L

Parallelism W L

Capture far away information

- -

Explainable W L

Variety L W

Page 31: Dcnn for text

Architecture-I

• Drawback: in forward phase, the representation of each sentenceIs built without knowledge of each other

Page 32: Dcnn for text

Architecture-II

• Build directly on the interaction space between 2 sentences• From 1D to 2D convolution

Good trick at pooling

Page 33: Dcnn for text

2D max-pooling

Page 34: Dcnn for text

Model Generality

• Arc-II subsumes Arc-I as a special case

Page 35: Dcnn for text

Cost function

• Large margin objective

e(.)

Page 36: Dcnn for text

Experiment – Sentence Completion

Page 37: Dcnn for text

Experiment – Matching Response to Tweet

Page 38: Dcnn for text

Experiment – Paraphrase Identification

• Determine whether two sentences have the same meaning

Page 39: Dcnn for text

Discussion

• Sequence is important

Page 40: Dcnn for text

Zhang, Xiang, and Yann LeCun. "Text Understanding from Scratch." arXiv preprint arXiv:1502.01710 (2015)

Text Understanding from Scratch

Page 41: Dcnn for text

Contribution

• Character-level input• No OOV• Work for both English and Chinese

Page 42: Dcnn for text

The model

character encoding spaceNot encoded character or space=> All-zero vector

Fixed length window

H e l l o w o r l

Page 43: Dcnn for text

More detail

Page 44: Dcnn for text

What about various input length?

• Set to the longest sentence we are going to see (1014 character used in their experiments)

Page 45: Dcnn for text

Data augmentation - Thesaurus

• Thesaurus: “a book that lists words in groups of synonyms and related concepts”

• http://www.libreoffice.org/

Page 46: Dcnn for text

Comparison models

• Bag-of-word: 5000 most freq. words

• Bag-of-centroids: 5000-means word vectors on Google News corpus

Page 47: Dcnn for text

DBpedia Ontology Classification

Page 48: Dcnn for text

DBpedia Ontology Classification

Page 49: Dcnn for text

Amazon review sentiment analysis

• 1~5 indicating user’s subjective rating of a product.

• Collected by SNAP project

Page 50: Dcnn for text

Amazon review sentiment analysis

Page 51: Dcnn for text

Amazon review sentiment analysis

Page 52: Dcnn for text

Yahoo! Answer Topic Classification

Page 53: Dcnn for text

Yahoo! Answer Topic Classification

Page 54: Dcnn for text

News Categorization in English

Page 55: Dcnn for text

News Categorization in English

Page 56: Dcnn for text

News Categorization in Chinese

• SogouCA and SogouCS• pypinyin package + jieba Chinese

segmentation system

Page 57: Dcnn for text

News Categorization in Chinese

Page 58: Dcnn for text

Conclusion

• We can play a lot of trick with Pooling

Page 59: Dcnn for text

Thank you