Top Banner
Topic Models Recommendations Morten Arngren Senior Data Scientist [ ]
23

Issuu Talk on Topic Models and Recommendation Systems

Mar 31, 2016

Download

Documents

Arngren

Issuu gave a talk on the Data Science and Machine Learning Meetup in Copenhagen, Nov. 2013.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Issuu Talk on Topic Models and Recommendation Systems

Topic Models Recommendations

Morten Arngren Senior Data Scientist[ ]

Page 2: Issuu Talk on Topic Models and Recommendation Systems

About Topic Recommendations

๐Ÿ’ก !

Recommendations

Modelling

Page 3: Issuu Talk on Topic Models and Recommendation Systems

โ€œโ€ฆYouTube for Publicationsโ€ฆ

Page 4: Issuu Talk on Topic Models and Recommendation Systems

IStarted in 2006 by 5 dudes.

15M. publications (free)๐Ÿ“–

๐Ÿ‘€ 7.5B. page views / month

340M. pages - (25 km2)

2013

๐Ÿ‘ฅ 83M. unique visitors / month

""

Page 5: Issuu Talk on Topic Models and Recommendation Systems

Data Science Team (Copenhagen)

12x 2.6GHz

96GB Ram

2TB SSD

2TB HardDrive

Morten Arngren Ph.D. in Machine Learning and AI (2011) M.Sc.A.M. (2007) B.Sc.E.E. (1997) !ISSUU, Data Scientist (2011 - present) DTU & FOSS Analytical, Machine Learning in Food Quality (2008-2011) Nokia Mobile Phones, Digital Signal Processing (2000-2007) Alcatel Space Denmark, Building Rockets (1997-2000)

Andrius Butkus Ph.D. in Digital Media Personalisation (2009) M.Sc.E.E. (2004) B.Sc.E.E. (2002) !ISSUU, Data Scientist (2011 - present) DTU External Lecturer, Human Computer Interaction (2010 - present) DTU Assistant Professor, Digital Media Engineering (2008-2010) โ˜ Amazon Web

Services

ML Gadgets

Page 6: Issuu Talk on Topic Models and Recommendation Systems

๐Ÿ“ˆData๐Ÿ“ˆData

Page 7: Issuu Talk on Topic Models and Recommendation Systems

๐Ÿ“ˆData

Page 8: Issuu Talk on Topic Models and Recommendation Systems

๐Ÿ“–Layout

(Quantify text and image boxes)

๐Ÿš€

๐Ÿš€

Article Extraction

)OCR

๐Ÿš€

Image

Cover Analysis

#

Explicit Detection

Doc. Type Classification

$

Text

Detect Language (56)

Translate to English (from 24 languages) LDA Topics

(โš›

๐Ÿš€

๐Ÿ”Ž

Page

Content

*DB

&40k

Pubs / Day

Page 9: Issuu Talk on Topic Models and Recommendation Systems

time

Reader Activity

+!

,

๐Ÿ‘

- -

๐Ÿ‘

,

,,

-

N NSession

""

"" "

"

"

*DB

๐Ÿ” ๐Ÿ”๐ŸŽฌ

๐ŸŽง1

2๐Ÿ“น

โ€œBirdie Nam Namโ€

200GB / Day

Page 10: Issuu Talk on Topic Models and Recommendation Systems

Topic Modelling

Page 11: Issuu Talk on Topic Models and Recommendation Systems

LATENT DIRICHLET ALLOCATION

150 topics (preset parameter)

Topic model based on Bag-of-Words Data

http://radimrehurek.com/gensim/

Wikipedia Training Data ~4.5M Single Articles

(Pure Topics)

arabicAustralia history business

islands environment

hotels

poetic

food design arts

plants animals

Topic Distribution

1501

LDA ๐ŸŒด

D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993โ€“1022, January 2003.[ ]

Page 12: Issuu Talk on Topic Models and Recommendation Systems

๐Ÿš€

โœˆ

(

๐Ÿ“น

5

๐ŸŒด

LATENT DIRICHLET ALLOCATION

Properties ฮฃ[0:1] โˆง = 1

LDA SpacePC 4

the real

5+

Issuu Publications

Page 13: Issuu Talk on Topic Models and Recommendation Systems

TOPIC CATEGORIES

(

๐Ÿธ

โœˆ โœˆ

(

๐Ÿ“น

~4.5 Mio.

Density distr ibution not the same

I๐ŸŒด

8๐Ÿธ

~9 Mio.

Empty locations in LDA space.

Travel

Cocktails

Chemistry

0.5 Travel 0.4 Spor ts 0.1

Botanics

Drinks

(Learning from Wikipedia Dataset)

Dancing

Page 14: Issuu Talk on Topic Models and Recommendation Systems

Recommendation System!

Page 15: Issuu Talk on Topic Models and Recommendation Systems

๐ŸŽฌ

READER ACTIVITY

๐Ÿ” ๐Ÿ”๐ŸŽง1

2๐Ÿ“น

Extract Implic it Ratingโ€ฆ.?

No Explic it Ratingโ€ฆ.

Timeโ€œBirdie Nam Namโ€

Page 16: Issuu Talk on Topic Models and Recommendation Systems

Session { UserName: โ€˜Birdie-Nam-Namโ€™ DocID: xxx-xxxxx Pages: 1: [250, 725, 569, 134, ...] 2: [1056, 1259, ...] 3: [1056, 1259, ...] 4: [102, 356, 208, 438] 5: [102, 356, 208, 438] 6: [5250, 3567, 809] 7: [5250, 3567, 809] ... TimeStamp: 1378935850 DocID: yyy-yyyyy }

Pages: [1,2,3,6,7] ReadTime: 25789 ms. TimeStamp: 1378935850

Browsing or Reading?Time

Readers

Publ

icat

ions

๐Ÿ”

๐ŸŽฌ

2

๐ŸŽง

๐Ÿธ

Page 17: Issuu Talk on Topic Models and Recommendation Systems

Item2Item Matrix

๐Ÿ”

๐ŸŽฌ

2

๐ŸŽง

๐Ÿธ

๐Ÿ” ๐ŸŽฌ 2 ๐ŸŽง ๐Ÿธ

12๐Ÿ“น๐ŸŽฌ๐ŸŽง ๐Ÿ”๐Ÿ”

Reader indexed learning

To

Pages: [1,6,7,10,11] ReadTime: 11250 ms. TimeStamp: 1385437850

Time

568525081065

850 11509860

3690

in weeks

decay per week= 850

Decay function

Page 18: Issuu Talk on Topic Models and Recommendation Systems

RECOMMENDING

Item2Item Matrix

8

๐Ÿ”

๐ŸŽฌ

๐Ÿ€

๐Ÿธ

1 ๐ŸŸ 5 ๐ŸŽง ๐ŸŽฑ

1 ๐ŸŸ 5 ๐ŸŽง

Item Matrix Weight Mapping Function

๐ŸŽง๐ŸŽฌ๐Ÿ“น ๐Ÿ”

Time

25081065850 1150

N

๐Ÿ‘๐ŸŒด< ๐Ÿš€

11 1

Read History

๐Ÿ“–

Likes

Stacks

Page 19: Issuu Talk on Topic Models and Recommendation Systems

RECOMMENDING

+5

๐Ÿ” I

1 ๐Ÿ•

๐Ÿ“น

โ™ซ8

๐ŸŽฌ

๐ŸŽง

๐Ÿ€

๐Ÿ๐ŸŸ

E

๐Ÿธ๐Ÿ”ˆ

๐ŸŽค

๐ŸŽฑ

๐Ÿ“ทC

๐Ÿท

๐Ÿบ๐ŸŽพ

F

๐Ÿ‘ฝ

๐ŸŽฑ

Item Matrix Weight Mapping Function

1

Item Weights

1 ๐ŸŸ 5 ๐ŸŽง ๐ŸŽฑ 1๐ŸŸ5 ๐ŸŽง ๐ŸŽฑ

๐Ÿ”€Weighted Sampling

1๐ŸŸ5 ๐ŸŽง ๐ŸŽฑ

Page 20: Issuu Talk on Topic Models and Recommendation Systems

Max. Rank

Page 21: Issuu Talk on Topic Models and Recommendation Systems

Tuned Parameters

Page 22: Issuu Talk on Topic Models and Recommendation Systems

Deep Belief Network Model

Bag-of-Words modelTraining Data

I

Lars Maal

2000

500

20

2

Kasper Johansen

! "

Collaborate Fi lter ing Using Social Media Knowledge

Master Student Project

LLรธe

Page 23: Issuu Talk on Topic Models and Recommendation Systems

Master Student Project

LLMorten Arngren

Senior Data Scientist[ ]