Top Banner
@graphific Roelof Pieters Guest Lecture: Deep Learning for Informa8on Retrieval 28 April 2015 www.csc.kth.se/~roelof/ [email protected] roelof@graph- technologies.com Gve Systems Graph Technologies R&D DD2476 Search Engines and Information Retrieval Systems https://www.kth.se/social/course/DD2476/ slides online at h4p://www.slideshare.net/roelofp/deeplearningforinforma=onretrieval
93
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Deep Learning for Information Retrieval

@graphificRoelof Pieters

Guest  Lecture:  Deep  Learning  for  Informa8on  Retrieval28  April  2015

www.csc.kth.se/~roelof/ [email protected]

[email protected]

Gve Systems Graph Technologies R&D

DD2476 Search Engines and Information Retrieval Systemshttps://www.kth.se/social/course/DD2476/

slides  online  at  h4p://www.slideshare.net/roelofp/deep-­‐learning-­‐for-­‐informa=on-­‐retrieval  

Page 2: Deep Learning for Information Retrieval

2

About Me • (-10y) CS dropout (Amsterdam Technical Univ.)• (2y) Msc Social Anthropology, Stockholm

University• Current: PhD candidate at KTH/CSC with focus

on:• Deep Learning for Natural Language

Processing (Distributed Semantics) • Graph-based approaches for Knowledge

Representation• Multi-modal models

• Current: Data Science Consultant at Graph Technologies RD & Gve-Systems• Recommender Systems• Deep Learning• Realtime Graph-based Search Engines

Page 3: Deep Learning for Information Retrieval

3

Information Retrieval (IR)

- Hedvig Kjellström, lecture 1

Page 4: Deep Learning for Information Retrieval

4

Data landscape is changing

1. Amount of digital data is growing at increasing rate (IOT, digitalization, wearables, phones/tablets)

2. Data types are shifting as well:

1. from text to audio-visual

2. from professional to personal/social (social media)

3. from semi-structured to unstructured

Page 5: Deep Learning for Information Retrieval

[Jussi Karlgren, NLP Sthlm Meetup 2014]

Page 6: Deep Learning for Information Retrieval

6

Data landscape is changing

Triple V’s of Big Data:1. Volume2. Velocity3. Variety

Page 7: Deep Learning for Information Retrieval

7

Making sense of DataTypical ML Regression

Page 8: Deep Learning for Information Retrieval

8

Making sense of DataNeural NetTypical ML Regression

Page 9: Deep Learning for Information Retrieval

Degrees of Complexity

9perceptron demo

Page 10: Deep Learning for Information Retrieval

Neural Net

10

(figure from Lior Rokach, Ben-Gurion University)

Page 11: Deep Learning for Information Retrieval

Neural Net

11

(figure from Lior Rokach, Ben-Gurion University)

Page 12: Deep Learning for Information Retrieval

Neural Net

12

(figure from Lior Rokach, Ben-Gurion University)

Page 13: Deep Learning for Information Retrieval

Neural Net

13

(figure from Lior Rokach, Ben-Gurion University)

Page 14: Deep Learning for Information Retrieval

Neural Net

14

(figure from Lior Rokach, Ben-Gurion University)

Page 15: Deep Learning for Information Retrieval

Neural Net

15

multilayer nn demo

Page 16: Deep Learning for Information Retrieval

Deep Learning ??

16

Page 17: Deep Learning for Information Retrieval

Deep Learning ??

17

• Learning multiple layers• “Back propagation”• Can “theoretically” learn any function!

Prior to 2006:• Very slow and inefficient• SVMs, random forests, etc. SOTA

Page 18: Deep Learning for Information Retrieval

18

2006+: the 3 Deep Learning Conspirators

Page 19: Deep Learning for Information Retrieval

19

Page 20: Deep Learning for Information Retrieval

20

Page 21: Deep Learning for Information Retrieval

— Andrew Ng

“I’ve worked all my life in Machine Learning, and I’ve

never seen one algorithm knock over benchmarks like Deep

Learning”

Deep Learning: Why?

21

Page 22: Deep Learning for Information Retrieval

Different Levels of Abstraction

22

Page 23: Deep Learning for Information Retrieval

Hierarchical Learning• Natural progression

from low level to high level structure as seen in natural complexity

Different Levels of AbstractionFeature Representation

23

Page 24: Deep Learning for Information Retrieval

Hierarchical Learning• Natural progression

from low level to high level structure as seen in natural complexity

• Easier to monitor what is being learnt and to guide the machine to better subspaces

Different Levels of AbstractionFeature Representation

24

Page 25: Deep Learning for Information Retrieval

Hierarchical Learning• Natural progression

from low level to high level structure as seen in natural complexity

• Easier to monitor what is being learnt and to guide the machine to better subspaces

• A good lower level representation can be used for many distinct tasks

Different Levels of AbstractionFeature Representation

25

Page 26: Deep Learning for Information Retrieval

Hierarchical Learning• Natural progression

from low level to high level structure as seen in natural complexity

Different Levels of AbstractionFeature Representation

2626

• Easier to monitor what is being learnt and to guide the machine to better subspaces

• A good lower level representation can be used for many distinct tasks

Page 27: Deep Learning for Information Retrieval

Classic Deep Architecture

Input layer

Hidden layers

Output layer

27

Page 28: Deep Learning for Information Retrieval

Modern Deep Architecture

Input layer

Hidden layers

Output layer

movie time:http://www.cs.toronto.edu/~hinton/adi/index.htm

28

Page 29: Deep Learning for Information Retrieval

[Kudos to Richard Socher, for this eloquent summary :) ]

• Manually designed features are often over-specified, incomplete and take a long time to design and validate

• Learned Features are easy to adapt, fast to learn

• Deep learning provides a very flexible, (almost?) universal, learnable framework for representing world, visual and linguistic information.

• Deep learning can learn unsupervised (from raw text/audio/images/whatever content) and supervised (with specific labels like positive/negative)

Why Deep Learning ?

29

Page 30: Deep Learning for Information Retrieval

Word Embeddings

30

Page 31: Deep Learning for Information Retrieval

31

What about NLP ?

1. Language is ambiguous:Every sentence has many possible interpretations.

2. Language is productive:We will always encounter new words or new constructions

3. Language is culturally specific

Some of the challenges in Language Understanding:

Page 32: Deep Learning for Information Retrieval

• NLP treats words mainly (rule-based/statistical approaches at least) as atomic symbols:

• or in vector space:

• also known as “one hot” representation.

• Its problem ?

Language Representation

Love Candy Store

[0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 …]

Candy [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 …] ANDStore [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 …] = 0 !

32

Page 33: Deep Learning for Information Retrieval

Language Representation

33

- Johan Boye, lecture 2

Term-document matrix = Sparse!

Page 34: Deep Learning for Information Retrieval

Distributional representations

“You shall know a word by the company it keeps” (J. R. Firth 1957)

One of the most successful ideas of modern statistical NLP!

these words represent banking

• Hard (class based) clustering models

• Soft clustering models34

Page 35: Deep Learning for Information Retrieval

Distributional hypothesis

He filled the wampimuk, passed it around and we all drunk some

We found a little, hairy wampimuk sleeping behind the tree

(McDonald & Ramscar 2001)35

Page 36: Deep Learning for Information Retrieval

Distributional semantics

Landauer and Dumais (1997), Turney and Pantel (2010), …36

Page 37: Deep Learning for Information Retrieval

Distributional semanticsDistributional meaning as co-occurrence vector:

37

Page 38: Deep Learning for Information Retrieval

Distributional representations

• Taking it further:

• Continuous word embeddings

• Combine vector space semantics with the prediction of probabilistic models

• Words are represented as a dense vector:

Candy =

38

Page 39: Deep Learning for Information Retrieval

Word Embeddings: SocherVector Space Model

adapted rom Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA

In a perfect world:

39

Page 40: Deep Learning for Information Retrieval

Word Embeddings: SocherVector Space Model

adapted rom Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA

In a perfect world:

the country of my birththe place where I was born

40

Page 41: Deep Learning for Information Retrieval

• Can theoretically (given enough units) approximate “any” function

• and fit to “any” kind of data

• Efficient for NLP: hidden layers can be used as word lookup tables

• Dense distributed word vectors + efficient NN training algorithms:

• Can scale to billions of words !

Why Neural Networks for NLP?

41

Page 42: Deep Learning for Information Retrieval

Word Embeddings: SocherVector Space Model

Figure (edited) from Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA

In a perfect world:

the country of my birththe place where I was born ?

42

Page 43: Deep Learning for Information Retrieval

Compositionality

Principle of compositionality:

the “meaning (vector) of a complex expression (sentence) is determined by:

— Gottlob Frege (1848 - 1925)

- the meanings of its constituent expressions (words) and

- the rules (grammar) used to combine them”

43

Page 44: Deep Learning for Information Retrieval

• How do we handle the compositionality of language in our models?

44

Compositionality

Page 45: Deep Learning for Information Retrieval

• How do we handle the compositionality of language in our models?

• Recursion :the same operator (same parameters) is applied repeatedly on different components

45

Compositionality

Page 46: Deep Learning for Information Retrieval

• How do we handle the compositionality of language in our models?

• Option 1: Recurrent Neural Networks (RNN)

46

RNN 1: Recurrent Neural Networks

(we ignore recurrent NN’s for this talk)

Page 47: Deep Learning for Information Retrieval

• How do we handle the compositionality of language in our models?

• Option 2: Recursive Neural Networks (also sometimes called RNN)

47

RNN 2: Recursive Neural Networks

Page 48: Deep Learning for Information Retrieval

Recursive Neural Tensor Network

48

Page 49: Deep Learning for Information Retrieval

Recursive Neural Tensor Network

49

code & info: http://www.socher.org/index.php/Main/ParsingNaturalScenesAndNaturalLanguageWithRecursiveNeuralNetworks

Socher, R., Liu, C.C., NG, A.Y., Manning, C.D. (2011) Parsing Natural Scenes and Natural Language with Recursive Neural Networks

Page 50: Deep Learning for Information Retrieval

NP

PP/IN

NP

DT NN PRP$ NN

Parse TreeRecurrent NN for Vector Space

50

Page 51: Deep Learning for Information Retrieval

NP

PP/IN

NP

DT NN PRP$ NN

Parse Tree

INDT NN PRP NN

Compositionality

51

Recurrent NN: CompositionalityRecurrent NN for Vector Space

Page 52: Deep Learning for Information Retrieval

NP

IN

NP

PRP NN

Parse Tree

DT NN

Compositionality

52

Recurrent NN: CompositionalityRecurrent NN for Vector Space

Page 53: Deep Learning for Information Retrieval

NP

IN

NP

DT NN PRP NN

PP

NP (S / ROOT)

“rules” “meanings”

Compositionality

53

Recurrent NN: CompositionalityRecurrent NN for Vector Space

Page 54: Deep Learning for Information Retrieval

Vector Space + Word Embeddings: Socher

54

Recurrent NN: CompositionalityRecurrent NN for Vector Space

Page 55: Deep Learning for Information Retrieval

Vector Space + Word Embeddings: Socher

55

Recurrent NN for Vector Space

Page 56: Deep Learning for Information Retrieval

Word Embeddings: Turian (2010)

Turian, J., Ratinov, L., Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning

code & info: http://metaoptimize.com/projects/wordreprs/ 56

Page 57: Deep Learning for Information Retrieval

Word Embeddings: Turian (2010)

Turian, J., Ratinov, L., Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning

code & info: http://metaoptimize.com/projects/wordreprs/ 57

Page 58: Deep Learning for Information Retrieval

Word Embeddings: Demo

Page 59: Deep Learning for Information Retrieval

Word Embeddings: Collobert & Weston (2011)

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P. (2011) . Natural Language Processing (almost) from Scratch

59

Page 60: Deep Learning for Information Retrieval

Polysemous-embeddings: Stanford (2012)

Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng (2012)Improving Word Representations via Global Context and Multiple Word Prototypes

60

Page 61: Deep Learning for Information Retrieval

Linguistic Regularities: Mikolov (2013)

code & info: https://code.google.com/p/word2vec/ Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic Regularities in Continuous Space Word Representations

61

Page 62: Deep Learning for Information Retrieval

Word Embeddings for MT: Mikolov (2013)

Mikolov, T., Le, V. L., Sutskever, I. (2013) . Exploiting Similarities among Languages for Machine Translation

62

Page 63: Deep Learning for Information Retrieval

Word Embeddings for MT: Kiros (2014)

Kiros, R., Zemel, R. S., Salakhutdinov, R. (2014) . A Multiplicative Model for Learning Distributed Text-Based Attribute Representations

63

Page 64: Deep Learning for Information Retrieval

Recursive Deep Models & Sentiment: Socher (2013)

Socher, R., Perelygin, A., Wu, J., Chuang, J.,Manning, C., Ng, A., Potts, C. (2013) Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank.

code & demo: http://nlp.stanford.edu/sentiment/index.html64

Page 65: Deep Learning for Information Retrieval

Paragraph Vectors: Le & Mikolov (2014)

Le, Q., Mikolov,. T. (2014) Distributed Representations of Sentences and Documents65

• add context (sentence, paragraph, document) to word vectors during training

!

Results on Stanford Sentiment Treebank dataset:

Page 66: Deep Learning for Information Retrieval

Paragraph Vectors: Dai et al. (2014)

Dai, A., Olah,. C., Le, Q., Corrado, G. (2014) Document Embedding with Paragraph Vectors66

Page 67: Deep Learning for Information Retrieval

Paragraph Vectors: Dai et al. (2014)

Dai, A., Olah,. C., Le, Q., Corrado, G. (2014) Document Embedding with Paragraph Vectors67

Page 68: Deep Learning for Information Retrieval

Paragraph Vectors: Dai et al. (2014)

Dai, A., Olah,. C., Le, Q., Corrado, G. (2014) Document Embedding with Paragraph Vectors68

Nearest neighbours to the machine learning paper “Distributed Representations of Sentences and Documents” in arXiv.

Page 69: Deep Learning for Information Retrieval

Joint Image-Word Embeddings

69

Page 70: Deep Learning for Information Retrieval

1. Multimodal representation learning

2. Generating descriptions of images

3. Ranking images and captions (“image-sentence ranking”)

Some Current Approaches

70

Page 71: Deep Learning for Information Retrieval

Bags of Visual Words

71

Source credit : K. Grauman, B. Leibe

Page 72: Deep Learning for Information Retrieval

Bags of Visual Words (Sivic & Zisserman 2003)

standard BoW issues however

What we get:

But we want:• visual word order/relations• location• scale/viewpoint invariance• …

72

Page 73: Deep Learning for Information Retrieval

Zero-shot Learning

• skip-gram text model on wikipedia corpus of 5.7 million documents (5.4 billion words) - approach from (Mikolov et al. ICLR 2013)

73

Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Mikolov, T., Ranzato, M.A. (2013) Devise: A deep visual-semantic embedding model

DeViSE model

Page 74: Deep Learning for Information Retrieval

Encoder: A deep convolutional network (CNN) and long short-term memory recurrent network (LSTM) for learning a joint image-sentence embedding. Decoder: A new neural language model that combines structure and content vectors for generating words one at a time in sequence.

Encoder-Decoder pipeline

74

Kiros, R., Salakhutdinov, R., Zemerl, R. S. (2014) Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models

(Kiros et al 2014)

Page 75: Deep Learning for Information Retrieval

• captures Multimodal linguistic regularities

Encoder-Decoder pipeline

75

Page 76: Deep Learning for Information Retrieval

• captures Multimodal linguistic regularities

Encoder-Decoder pipeline

76

(PCA projection of (300-dimensional) word and image representations)

Page 77: Deep Learning for Information Retrieval

77

Vinyals, O., Toshev, A., Bengio, S., Erhan. D. (2015) Show and Tell: A Neural Image Caption Generator

Joint Visual-Semantic embedding

Karpathy, A., Fei Fei, L. (2015) Deep Visual-Semantic Alignments for Generating Image Descriptions

CNN+LSTM

CNN+RNN

Page 78: Deep Learning for Information Retrieval

78

Karpathy, A., Fei Fei, L. (2015) Deep Visual-Semantic Alignments for Generating Image Descriptions

Joint Visual-Semantic embedding

Page 79: Deep Learning for Information Retrieval

79

Joint Visual-Semantic embedding

Karpathy, A., Fei Fei, L. (2015) Deep Visual-Semantic Alignments for Generating Image Descriptions

Page 80: Deep Learning for Information Retrieval

80

Joint Visual-Semantic embedding

Karpathy, A., Fei Fei, L. (2015) Deep Visual-Semantic Alignments for Generating Image Descriptions

Page 81: Deep Learning for Information Retrieval

Joint Visual-Semantic embedding

81

Karpathy, A., Fei Fei, L. (2015) Deep Visual-Semantic Alignments for Generating Image Descriptions

demo

Page 82: Deep Learning for Information Retrieval

Any Questions?

Page 83: Deep Learning for Information Retrieval

Download example code samples fromhttps://github.com/graphific/DL-Meetup-intro

83

git clone --recursive https://github.com/graphific/DL-Meetup-intro.git

Wanna Play ? Code!

(more at http://deeplearning.net/ )

Page 84: Deep Learning for Information Retrieval

• Theano - CPU/GPU symbolic expression compiler in python (from LISA lab at University of Montreal). http://deeplearning.net/software/theano/

• Pylearn2 - library designed to make machine learning research easy. http://deeplearning.net/software/pylearn2/

• Torch - Matlab-like environment for state-of-the-art machine learning algorithms in lua (from Ronan Collobert, Clement Farabet and Koray Kavukcuoglu) http://torch.ch/

• more info: http://deeplearning.net/software links/

Wanna Play ?

Wanna Play ? General Deep Learning

84

Page 85: Deep Learning for Information Retrieval

• RNNLM (Mikolov) http://rnnlm.org

• NB-SVM https://github.com/mesnilgr/nbsvm

• Word2Vec (skipgrams/cbow)https://code.google.com/p/word2vec/ (original) http://radimrehurek.com/gensim/models/word2vec.html (python)

• GloVehttp://nlp.stanford.edu/projects/glove/ (original) https://github.com/maciejkula/glove-python (python)

• Socher et al / Stanford RNN Sentiment code:http://nlp.stanford.edu/sentiment/code.html

• Deep Learning without Magic Tutorial: http://nlp.stanford.edu/courses/NAACL2013/

Wanna Play ? NLP

85

Page 86: Deep Learning for Information Retrieval

• cuda-convnet2 (Alex Krizhevsky, Toronto) (c++/CUDA, optimized for GTX 580) https://code.google.com/p/cuda-convnet2/

• Caffe (Berkeley) (Cuda/OpenCL, Theano, Python)http://caffe.berkeleyvision.org/

• OverFeat (NYU) http://cilvr.nyu.edu/doku.php?id=code:start

Wanna Play ? Computer Vision

86

Page 87: Deep Learning for Information Retrieval

87

Page 88: Deep Learning for Information Retrieval

Impact on Computer Vision

88

Page 89: Deep Learning for Information Retrieval

Impact on Computer Vision

(from Clarifai)89

Page 90: Deep Learning for Information Retrieval

Impact on Audio ProcessingSpeech Recognition

90

Page 91: Deep Learning for Information Retrieval

Impact on Audio ProcessingTIMIT Speech Recognition

(from: Clarifai)91

Page 92: Deep Learning for Information Retrieval

C&W 2011

Impact on Natural Language Processing

Pos: Toutanova et al. 2003)Ner: Ando & Zhang 2005

C&W 2011

92

Page 93: Deep Learning for Information Retrieval

Impact on Natural Language Processing

Named Entity Recognition:

93