Topic Models Recommendations Morten Arngren Senior Data Scientist [ ]
Mar 31, 2016
Topic Models Recommendations
Morten Arngren Senior Data Scientist[ ]
About Topic Recommendations
๐ก !
Recommendations
Modelling
โโฆYouTube for Publicationsโฆ
IStarted in 2006 by 5 dudes.
15M. publications (free)๐
๐ 7.5B. page views / month
340M. pages - (25 km2)
2013
๐ฅ 83M. unique visitors / month
""
Data Science Team (Copenhagen)
12x 2.6GHz
96GB Ram
2TB SSD
2TB HardDrive
Morten Arngren Ph.D. in Machine Learning and AI (2011) M.Sc.A.M. (2007) B.Sc.E.E. (1997) !ISSUU, Data Scientist (2011 - present) DTU & FOSS Analytical, Machine Learning in Food Quality (2008-2011) Nokia Mobile Phones, Digital Signal Processing (2000-2007) Alcatel Space Denmark, Building Rockets (1997-2000)
Andrius Butkus Ph.D. in Digital Media Personalisation (2009) M.Sc.E.E. (2004) B.Sc.E.E. (2002) !ISSUU, Data Scientist (2011 - present) DTU External Lecturer, Human Computer Interaction (2010 - present) DTU Assistant Professor, Digital Media Engineering (2008-2010) โ Amazon Web
Services
ML Gadgets
๐Data๐Data
๐Data
๐Layout
(Quantify text and image boxes)
๐
๐
Article Extraction
)OCR
๐
Image
Cover Analysis
#
Explicit Detection
Doc. Type Classification
$
Text
Detect Language (56)
Translate to English (from 24 languages) LDA Topics
(โ
๐
๐
Page
Content
*DB
&40k
Pubs / Day
time
Reader Activity
+!
,
๐
- -
๐
,
,,
-
N NSession
""
"" "
"
"
*DB
๐ ๐๐ฌ
๐ง1
2๐น
โBirdie Nam Namโ
200GB / Day
Topic Modelling
LATENT DIRICHLET ALLOCATION
150 topics (preset parameter)
Topic model based on Bag-of-Words Data
http://radimrehurek.com/gensim/
Wikipedia Training Data ~4.5M Single Articles
(Pure Topics)
arabicAustralia history business
islands environment
hotels
poetic
food design arts
plants animals
Topic Distribution
1501
LDA ๐ด
D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993โ1022, January 2003.[ ]
๐
โ
(
๐น
5
๐ด
LATENT DIRICHLET ALLOCATION
Properties ฮฃ[0:1] โง = 1
LDA SpacePC 4
the real
5+
Issuu Publications
TOPIC CATEGORIES
(
๐ธ
โ โ
(
๐น
~4.5 Mio.
Density distr ibution not the same
I๐ด
8๐ธ
~9 Mio.
Empty locations in LDA space.
Travel
Cocktails
Chemistry
0.5 Travel 0.4 Spor ts 0.1
Botanics
Drinks
(Learning from Wikipedia Dataset)
Dancing
Recommendation System!
๐ฌ
READER ACTIVITY
๐ ๐๐ง1
2๐น
Extract Implic it Ratingโฆ.?
No Explic it Ratingโฆ.
TimeโBirdie Nam Namโ
Session { UserName: โBirdie-Nam-Namโ DocID: xxx-xxxxx Pages: 1: [250, 725, 569, 134, ...] 2: [1056, 1259, ...] 3: [1056, 1259, ...] 4: [102, 356, 208, 438] 5: [102, 356, 208, 438] 6: [5250, 3567, 809] 7: [5250, 3567, 809] ... TimeStamp: 1378935850 DocID: yyy-yyyyy }
Pages: [1,2,3,6,7] ReadTime: 25789 ms. TimeStamp: 1378935850
Browsing or Reading?Time
Readers
Publ
icat
ions
๐
๐ฌ
2
๐ง
๐ธ
Item2Item Matrix
๐
๐ฌ
2
๐ง
๐ธ
๐ ๐ฌ 2 ๐ง ๐ธ
12๐น๐ฌ๐ง ๐๐
Reader indexed learning
To
Pages: [1,6,7,10,11] ReadTime: 11250 ms. TimeStamp: 1385437850
Time
568525081065
850 11509860
3690
in weeks
decay per week= 850
Decay function
RECOMMENDING
Item2Item Matrix
8
๐
๐ฌ
๐
๐ธ
1 ๐ 5 ๐ง ๐ฑ
1 ๐ 5 ๐ง
Item Matrix Weight Mapping Function
๐ง๐ฌ๐น ๐
Time
25081065850 1150
N
๐๐ด< ๐
11 1
Read History
๐
Likes
Stacks
RECOMMENDING
+5
๐ I
1 ๐
๐น
โซ8
๐ฌ
๐ง
๐
๐๐
E
๐ธ๐
๐ค
๐ฑ
๐ทC
๐ท
๐บ๐พ
F
๐ฝ
๐ฑ
Item Matrix Weight Mapping Function
1
Item Weights
1 ๐ 5 ๐ง ๐ฑ 1๐5 ๐ง ๐ฑ
๐Weighted Sampling
1๐5 ๐ง ๐ฑ
Max. Rank
Tuned Parameters
Deep Belief Network Model
Bag-of-Words modelTraining Data
I
Lars Maal
2000
500
20
2
Kasper Johansen
! "
Collaborate Fi lter ing Using Social Media Knowledge
Master Student Project
LLรธe
Master Student Project
LLMorten Arngren
Senior Data Scientist[ ]