Top Banner
Old and New Building Blocks Come Together For Big Data
53

Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$!...

Jul 16, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

1  ©MapR  Technologies  -­‐  Confiden6al  

Old  and  New  Building  Blocks  Come  Together  For  Big  Data  

Page 2: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

2  ©MapR  Technologies  -­‐  Confiden6al  

§  Contact:  [email protected]  @ted_dunning  

§  Slides  and  such    hAp://slideshare.net/tdunning    

§  Hash  tags:  #mapr  #goto  #d3  #node      

Page 3: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

3  ©MapR  Technologies  -­‐  Confiden6al  

Embarrassment  of  Riches  

§  d3.js  allows  really  preAy  pictures  §  node.js  allows  simple  (not  just  web)  servers  §  Storm  does  real-­‐6me  §  Hadoop  does  big  data  §  d3  allows  very  cool  visualiza6ons  

Page 4: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

4  ©MapR  Technologies  -­‐  Confiden6al  

D3 demo!

Page 5: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

5  ©MapR  Technologies  -­‐  Confiden6al  

node demo!

Page 6: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

6  ©MapR  Technologies  -­‐  Confiden6al  

Hadoop demo!

Page 7: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

7  ©MapR  Technologies  -­‐  Confiden6al  

But  …  

§  Web  camp  –  everything  is  a  service  with  a  URL  or  a  DOM  

§  Big  data  camp  –  non-­‐tradi6onal  file  systems  

§  Everybody  else  –  files  and  databases  

§  They  don’t  like  to  talk  to  each  other  

Page 8: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

8  ©MapR  Technologies  -­‐  Confiden6al  

Why  Not  Tiered  Architectures?  

§  Tiered  architectures    –  transla6ons  between  services  and  cultures    –  standard  corporate  answer  

§  Feels  like  molasses  

Page 9: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

9  ©MapR  Technologies  -­‐  Confiden6al  

The  Vision  

§  Integrate    – mul6ple  compu6ng  paradigms  – many  compu6ng  communi6es  

§  How?  –  common  storage,  queuing  and  data  plaborms  

Page 10: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

10  ©MapR  Technologies  -­‐  Confiden6al  

For  Example,  …  

§  Incoming  documents  with  text  –  store  in  file-­‐based  queues  –  index  in  real-­‐6me  using  Storm  and  Solr  –  add  ini6al  engagement  class,  “don’t-­‐know”  

§  Search  for  documents  using  original  text  –  add  random  noise,  small  for  well  understood  docs,  large  for  “don’t-­‐know”  docs  

§  Record  engagement  

Page 11: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

11  ©MapR  Technologies  -­‐  Confiden6al  

Add  Analysis  

§  Process  engagement  logs  –  item-­‐item  cooccurrence  –  user-­‐item  histories  

§  Update  search  index  –  indicator  items  –  decrease  uncertainty  on  well  understood  docs  

§  Update  user  profile  –  item  history  

Page 12: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

12  ©MapR  Technologies  -­‐  Confiden6al  

Search  Again  

§  Now  searches  use  recent  views  +  text  –  recent  views  query  indicator  fields  –  text  queries  normal  text  data  –  add  noise  as  appropriate  

Page 13: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

13  ©MapR  Technologies  -­‐  Confiden6al  

And  Draw  a  Picture  

§  Searches  and  clicks  can  be  logged  –  real-­‐6me  metrics  –  real-­‐6me  trending  topics  

§  What’s  hot,  what’s  not  

§  Popular  searches  §  Document  clusters  §  Word  clouds  

Page 14: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

14  ©MapR  Technologies  -­‐  Confiden6al  

In  Pictures  

Page 15: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

15  ©MapR  Technologies  -­‐  Confiden6al  

In  Pictures  

Doc  queue  

Search  index  

Real-­‐6me  indexing  

Doc  sources  

Page 16: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

16  ©MapR  Technologies  -­‐  Confiden6al  

In  Pictures  

Doc  queue  

Search  index  

Real-­‐6me  indexing  

Doc  sources  

User  queries  

Search  engine  

Page 17: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

17  ©MapR  Technologies  -­‐  Confiden6al  

In  Pictures  

Doc  queue  

Search  index  

Real-­‐6me  indexing  

Doc  sources  

User  queries  

Search  engine   Logs  

Recommenda6on  analysis  

Page 18: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

18  ©MapR  Technologies  -­‐  Confiden6al  

In  Pictures  

Doc  queue  

Search  index  

Real-­‐6me  indexing  

Doc  sources  

User  queries  

Search  engine   Logs  

Recommenda6on  analysis  

Usage  analysis  Rendering  Admin  

queries  

Page 19: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

19  ©MapR  Technologies  -­‐  Confiden6al  

Which  Technology?  

Doc  queue  

Search  index  

Real-­‐6me  indexing  

Doc  sources  

User  queries  

Search  engine   Logs  

Recommenda6on  analysis  

Usage  analysis  

Admin  queries  

Rendering  

Storm/node  

Solr  

MapR  

D3/node  

Other  

Page 20: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

20  ©MapR  Technologies  -­‐  Confiden6al  

Yeah,  But  …  

§  This  isn’t  as  easy  as  it  looks  

§  Take  the  real-­‐6me  /  long-­‐6me  part  

Page 21: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

21  ©MapR  Technologies  -­‐  Confiden6al  

t  

now  

Hadoop  is  Not  Very  Real-­‐Mme  

UnprocessedData  

Fully  processed  

Latest  full  period  

Hadoop  job  takes  this  long  for  this  data  

Page 22: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

22  ©MapR  Technologies  -­‐  Confiden6al  

t  

now  

Hadoop  works  great  back  here  

Storm  works  here  

Real-­‐Mme  and  Long-­‐Mme  together  

Blended  view  

Blended  view  

Blended  View  

Page 23: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

23  ©MapR  Technologies  -­‐  Confiden6al  

SolR  Indexer  SolR  

Indexer  Solr  indexing  

Cooccurrence  (Mahout)  

Item  meta-­‐data  

Index  shards  

Complete  history  

Page 24: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

24  ©MapR  Technologies  -­‐  Confiden6al  

SolR  Indexer  SolR  

Indexer  Solr  search  Web  6er  

Item  meta-­‐data   Index  

shards  

User  history  

Page 25: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

25  ©MapR  Technologies  -­‐  Confiden6al  

Users  

Catcher   Storm        

Topic  Queue  

Web-­‐server  

hAp  

Web  Data  

MapR  

Page 26: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

26  ©MapR  Technologies  -­‐  Confiden6al  

Closer  Look  –  Catcher  Protocol  

Data  Sources  

Catcher  Cluster  Catcher  Cluster  

Data  Sources  

The  data  sources  and  catchers  communicate  with  a  very  simple  protocol.    Hello()  =>  list  of  catchers  Log(topic,message)  =>            (OK|FAIL,  redirect-­‐to-­‐catcher)  

Page 27: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

27  ©MapR  Technologies  -­‐  Confiden6al  

Closer  Look  –  Catcher  Queues  

Catcher  Cluster  

Catcher  Cluster  

The  catchers  forward  log  requests  to  the  correct  catcher  and  return  that  host  in  the  reply  to  allow  the  client  to  avoid  the  extra  hop.    

Each  topic  file  is  appended  by  exactly  one  catcher.    

Topic  files  are  kept  in  shared  file  storage.  

Topic  File  

Topic  File  

Page 28: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

28  ©MapR  Technologies  -­‐  Confiden6al  

Closer  Look  –  ProtoSpout  

The  ProtoSpout  tails  the  topic  files,  parses  log  records  into  tuples  and  injects  them  into  the  Storm  topology.    Last  fully  acked  posi6on  stored  in  shared  file  system.  

Topic  File  

Topic  File  

ProtoSpout  

Page 29: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

29  ©MapR  Technologies  -­‐  Confiden6al  

Yeah,  But  …  

§  What  was  that  about  adding  noise  in  scoring?  

§  Why  would  I  do  that??  

§  Is  there  a  simple  answer?  

Page 30: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

30  ©MapR  Technologies  -­‐  Confiden6al  

Thompson  Sampling  

§  Select  each  shell  according  to  the  probability  that  it  is  the  best  

§  Probability  that  it  is  the  best  can  be  computed  using  posterior  

§  But  I  promised  a  simple  answer  

P(i is best) = I E[ri |θ ]=maxj E[rj |θ ]!"#

$%&∫ P(θ |D) dθ

Page 31: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

31  ©MapR  Technologies  -­‐  Confiden6al  

Thompson  Sampling  –  Take  2  

§  Sample  θ  

§  Pick  i  to  maximize  reward  

§  Record  result  from  using  i  

θ ~P(θ |D)

i = argmaxj

E[r |θ ]

Page 32: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

32  ©MapR  Technologies  -­‐  Confiden6al  

Nearly  ForgoRen  unMl  Recently  

§  Cita6ons  for  Thompson  sampling  

Page 33: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

33  ©MapR  Technologies  -­‐  Confiden6al  

Bayesian  Bandit  for  the  Search  

§  Compute  distribu6ons  based  on  data  so  far  §  Sample  scores  s1,  s2    …    –  based  on  actual  score  –  plus  per  doc  noise  from  these  distribu6ons  

§  Rank  docs  by  si  

§  Lemma  1:  The  probability  of  showing  doc  i  at  first  posi6on  will  match  the  probability  it  is  the  best  

§  Lemma  2:  This  is  as  good  as  it  gets  

Page 34: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

34  ©MapR  Technologies  -­‐  Confiden6al  

And  it  works!  

11000 100 200 300 400 500 600 700 800 900 1000

0.12

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

n

regr

et

ε-greedy, ε = 0.05

Bayesian Bandit with Gamma-Normal

Page 35: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

35  ©MapR  Technologies  -­‐  Confiden6al  

Yeah,  But  …  

§  Isn’t  recommenda6ons  complicated?  

§  How  can  I  implement  this?  

Page 36: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

36  ©MapR  Technologies  -­‐  Confiden6al  

RecommendaMon  Basics  

§  History:  

User   Thing  

1   3  

2   4  

3   4  

2   3  

3   2  

1   1  

2   1  

Page 37: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

37  ©MapR  Technologies  -­‐  Confiden6al  

RecommendaMon  Basics  

§  History  as  matrix:  

 

 §  (t1,  t3)  cooccur  2  6mes,    §  (t1,  t4)  once,    §  (t2,  t4)  once,    §  (t3,  t4)  once  

t1   t2   t3   t4  

u1   1   0   1   0  

u2   1   0   1   1  

u3   0   1   0   1  

Page 38: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

38  ©MapR  Technologies  -­‐  Confiden6al  

A  Quick  SimplificaMon  

§  Users  who  do  h  

§  Also  do  r  

Ah

AT Ah( )

ATA( )hUser-­‐centric  recommenda6ons  

Item-­‐centric  recommenda6ons  

Page 39: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

39  ©MapR  Technologies  -­‐  Confiden6al  

RecommendaMon  Basics  

§  Coocurrence  

t1   t2   t3   t4  

t1   2   0   2   1  

t2   0   1   0   1  

t3   2   0   1   1  

t4   1   1   1   2  

Page 40: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

40  ©MapR  Technologies  -­‐  Confiden6al  

Problems  with  Raw  Cooccurrence  

§  Very  popular  items  co-­‐occur  with  everything  – Welcome  document  –  Elevator  music  

§  That  isn’t  interes6ng  – We  want  anomalous  cooccurrence  

Page 41: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

41  ©MapR  Technologies  -­‐  Confiden6al  

RecommendaMon  Basics  

§  Coocurrence  

t1   t2   t3   t4  

t1   2   0   2   1  

t2   0   1   0   1  

t3   2   0   1   1  

t4   1   1   1   2  t3   not  t3  

t1   2   1  

not  t1   1   1  

Page 42: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

42  ©MapR  Technologies  -­‐  Confiden6al  

Spot  the  Anomaly  

§  Root  LLR  is  roughly  like  standard  devia6ons  

A   not  A  

B   13   1000  

not  B   1000   100,000  

A   not  A  

B   1   0  

not  B   0   2  

A   not  A  

B   1   0  

not  B   0   10,000  

A   not  A  

B   10   0  

not  B   0   100,000  

0.44   0.98  

2.26   7.15  

Page 43: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

43  ©MapR  Technologies  -­‐  Confiden6al  

Root  LLR  Details  

§  In  R  entropy  =  function(k)  {      -­‐sum(k*log((k==0)+(k/sum(k))))  }  rootLLr  =  function(k)  {      sign  =  …      sign  *  sqrt(          (entropy(rowSums(k))+entropy(colSums(k))          -­‐  entropy(k))/2)  }  

§  Like  sqrt(mutual  informa6on  *  N/2)  See  http://bit.ly/16DvLVK  

Page 44: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

44  ©MapR  Technologies  -­‐  Confiden6al  

Threshold  by  Score  

§  Coocurrence  

t1   t2   t3   t4  

t1   2   0   2   1  

t2   0   1   0   1  

t3   2   0   1   1  

t4   1   1   1   2  

Page 45: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

45  ©MapR  Technologies  -­‐  Confiden6al  

Threshold  by  Score  

§  Significant  cooccurrence  =>  Indicators  

t1   t2   t3   t4  

t1   1   0   0   1  t2   0   1   0   1  t3   0   0   1   1  t4   1   0   0   1  

Page 46: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

46  ©MapR  Technologies  -­‐  Confiden6al  

Yeah,  But  …  

§  Why  go  to  all  this  trouble?  

§  Does  it  really  help?  

Page 47: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

47  ©MapR  Technologies  -­‐  Confiden6al  

Real-­‐life  example  

Page 48: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

48  ©MapR  Technologies  -­‐  Confiden6al  

The  Real  Life  Issues  

§  Explora6on  §  Diversity  §  Speed  

§  Not  the  last  percent  

Page 49: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

49  ©MapR  Technologies  -­‐  Confiden6al  

The  Second  Page  

0 20 40 60 80

0

0.02

0.04

0.06

0.08

0.1

0.12

rank

ctr

Page 50: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

50  ©MapR  Technologies  -­‐  Confiden6al  

Make  it  Worse  to  Make  It  BeRer  

§  Add  noise  to  rank  

1    2    8    7    6    3    5    4  10  13  21  18  12    9  14  24  34  28  32  17  11  27  40    30  41  49  16  15  35  23  19  22  26  31  20  43  25  29  33  62  38  60  74  53  36  37  39  70  45  44  46  71  42  69  47  63  52  57  51  48  

§  Results  are  worse  today  §  But  beAer  tomorrow  

Page 51: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

51  ©MapR  Technologies  -­‐  Confiden6al  

AnM-­‐Flood  

§  200  of  the  same  result  is  no  beAer  than  2  

§  The  recommender  list  is  a  porbolio  of  results  –  If  probability  of  success  is  highly  correlated,  then  probability  of  at  least  one  success  is  much  lower  

§  Suppressing  items  similar  to  higher  ranking  items  helps  

Page 52: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

52  ©MapR  Technologies  -­‐  Confiden6al  

The  Punchline  

§  Hybrid  systems  really  can  work  today  

§  Middle  6ers  aren’t  as  interes6ng  as  they  used  to  be  –  No  need  for  Flume  …  queue  directly  in  big  data  system  –  No  need  for  external  queues,  tail  the  data  directly  with  Storm  –  No  need  for  query  systems  for  presenta6on  data  …  read  it  directly  with  node  

§  Absolutely  require  common  frameworks  and  standard  interfaces  

§  You  can  do  this  today!  

Page 53: Old$and$New$Building$Blocks$ Come$Together$For$Big$Data$gotocon.com/dl/goto-amsterdam-2013/slides/Ted... · ©MapRTechnologies"2"Confiden6al" 52" The Punchline$! Hybrid"systems"really"can"work"today"!

53  ©MapR  Technologies  -­‐  Confiden6al  

§  Contact:  [email protected]  @ted_dunning  

§  Slides  and  such    hAp://slideshare.net/tdunning    

§  Hash  tags:  #mapr  #goto  #d3  #node