Top Banner
1 ©MapR Technologies 2013- Confidential Introduction to Mahout And How To Build a Recommender
61

Mahout and Recommendations

May 10, 2015

Download

Technology

Ted Dunning

These are the slides from my talk at DFW Big Data. This includes the first version of the Mahout dog and pony show.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mahout and Recommendations

1©MapR Technologies 2013- Confidential

Introduction to MahoutAnd How To Build a Recommender

Page 2: Mahout and Recommendations

2©MapR Technologies 2013- Confidential

Me, Us

Ted Dunning, Chief Application Architect, MapRCommitter PMC member, Mahout, Zookeeper, DrillBought the beer at the first HUG

MapRDistributes more open source components for HadoopAdds major technology for performance, HA, industry standard API’s

TonightHash tag - #dfwbd #maprSee also - @ApacheMahout @ApacheDrill

@ted_dunning and @mapR

Page 3: Mahout and Recommendations

3©MapR Technologies 2013- Confidential

Requested Topic For Tonight

What is Mahout? What makes it different? How can big data technology solve impossible problems? How is big data affecting the world?

Page 4: Mahout and Recommendations

4©MapR Technologies 2013- Confidential

Also

What is MapR? What is MapR doing? How does MapR’s technology work? How are customers making use of MapR? How can anyone make use of MapR to solve problems?

Page 5: Mahout and Recommendations

5©MapR Technologies 2013- Confidential

Oh … Also This

Detailed break-down of a live machine learning system running with Mahout on MapR

With code examples

Page 6: Mahout and Recommendations

6©MapR Technologies 2013- Confidential

I may have to summarize

Page 7: Mahout and Recommendations

7©MapR Technologies 2013- Confidential

I may have to summarize

just a bit

Page 8: Mahout and Recommendations

8©MapR Technologies 2013- Confidential

Part 1:5 minutes of math

Page 9: Mahout and Recommendations

9©MapR Technologies 2013- Confidential

Part 2:12 minutes: I want a pony

Page 10: Mahout and Recommendations

10©MapR Technologies 2013- Confidential

Part 3:A working example

Page 11: Mahout and Recommendations

11©MapR Technologies 2013- Confidential

What Does Machine Learning Look Like?

Page 12: Mahout and Recommendations

12©MapR Technologies 2013- Confidential

What Does Machine Learning Look Like?

O(κ k d + k3 d) = O(k2 d log n + k3 d) for small k, high qualityO(κ d log k) or O(d log κ log k) for larger k, looser quality

But tonight we’re going to show you how to keep it simple yet powerful…

Page 13: Mahout and Recommendations

13©MapR Technologies 2013- Confidential

Comparison of Three Main ML Topics

Recommendation: – Involves observation of interactions between people taking action (users)

and items for input data to the recommender model– Goal is to suggest additional appropriate or desirable interactions– Applications include: movie, music or map-based restaurant choices;

suggesting sale items for e-stores or via cash-register receipts

Page 14: Mahout and Recommendations

14©MapR Technologies 2013- Confidential

Page 15: Mahout and Recommendations

15©MapR Technologies 2013- Confidential

Page 16: Mahout and Recommendations

16©MapR Technologies 2013- Confidential

Part 1:A bit of math

(the math of bits)

Page 17: Mahout and Recommendations

17©MapR Technologies 2013- Confidential

Mahout Math

Goals are– basic linear algebra,– and statistical sampling,– and good clustering,– decent speed,– extensibility,– especially for sparse data

But not – totally badass speed– comprehensive set of algorithms– optimization, root finders, quadrature

Page 18: Mahout and Recommendations

18©MapR Technologies 2013- Confidential

Matrices and Vectors

At the core:– DenseVector, RandomAccessSparseVector– DenseMatrix, SparseRowMatrix

Highly composable API

Important ideas: – view*, assign and aggregate– iteration

m.viewDiagonal().assign(v)

Page 19: Mahout and Recommendations

19©MapR Technologies 2013- Confidential

Assign? View?

Why assign?– Copying is the major cost for naïve matrix packages– In-place operations critical to reasonable performance– Many kinds of updates required, so functional style very helpful

Why view?– In-place operations often required for blocks, rows, columns or diagonals– With views, we need #assign + #views methods– Without views, we need #assign x #views methods

Synergies– With both views and assign, many loops become single line

Page 20: Mahout and Recommendations

24©MapR Technologies 2013- Confidential

Examples

double alpha; a.assign(alpha);

a.assign(b, Functions.chain( Functions.plus(beta), Functions.times(alpha));

Page 21: Mahout and Recommendations

26©MapR Technologies 2013- Confidential

More Examples

The trace of a matrix

Set diagonal to zero

Set diagonal to negative of row sums

Page 22: Mahout and Recommendations

27©MapR Technologies 2013- Confidential

Examples

The trace of a matrix

Set diagonal to zero

Set diagonal to negative of row sums

m.viewDiagonal().zSum()

Page 23: Mahout and Recommendations

28©MapR Technologies 2013- Confidential

Examples

The trace of a matrix

Set diagonal to zero

Set diagonal to negative of row sums

m.viewDiagonal().zSum()

m.viewDiagonal().assign(0)

Page 24: Mahout and Recommendations

29©MapR Technologies 2013- Confidential

Examples

The trace of a matrix

Set diagonal to zero

Set diagonal to negative of row sums excluding the diagonal

m.viewDiagonal().zSum()

m.viewDiagonal().assign(0)

Vector diag = m.viewDiagonal().assign(0);diag.assign(m.rowSums().assign(Functions.MINUS));

Page 25: Mahout and Recommendations

32©MapR Technologies 2013- Confidential

Clustering and Such

Streaming k-means and ball k-means– streaming reduces very large data to a cluster sketch– ball k-means is a high quality k-means implementation– the cluster sketch is also usable for other applications– single machine threaded and map-reduce versions available

SVD and friends– stochastic SVD has in-memory, single machine out-of-core and map-reduce

versions– good for reducing very large sparse matrices to tall skinny dense ones

Spectral clustering– based on SVD, allows massive dimensional clustering

Page 26: Mahout and Recommendations

33©MapR Technologies 2013- Confidential

Mahout Math Summary

Matrices, Vectors– views– in-place assignment– aggregations– iterations

Functions– lots built-in– cooperate with sparse vector optimizations

Sampling– abstract samplers– samplers as functions

Other stuff … clustering, SVD

Page 27: Mahout and Recommendations

34©MapR Technologies 2013- Confidential

Part 2:How recommenders work

(I still want a pony)

Page 28: Mahout and Recommendations

35©MapR Technologies 2013- Confidential

Recommendations

Behavior of a crowd helps us understand what individuals will do

Page 29: Mahout and Recommendations

36©MapR Technologies 2013- Confidential

Recommendations

Alice got an apple and a puppy

Charles got a bicycle

Alice

Charles

Page 30: Mahout and Recommendations

37©MapR Technologies 2013- Confidential

Recommendations

Alice got an apple and a puppy

Charles got a bicycle

Bob got an apple

Alice

Bob

Charles

Page 31: Mahout and Recommendations

38©MapR Technologies 2013- Confidential

Recommendations

What else would Bob like??

Alice

Bob

Charles

Page 32: Mahout and Recommendations

39©MapR Technologies 2013- Confidential

Recommendations

What if everybody gets a pony?

Now what does Bob want??

Alice

Bob

Charles

Page 33: Mahout and Recommendations

40©MapR Technologies 2013- Confidential

Log Files

Alice

Bob

Charles

Alice

Bob

Charles

Alice

Page 34: Mahout and Recommendations

41©MapR Technologies 2013- Confidential

Log Files

u1

u3

u2

u1

u3

u2

u1

t1

t2

t3

t4

t3

t3

t1

Page 35: Mahout and Recommendations

42©MapR Technologies 2013- Confidential

Log Files and Dimensions

u1

u3

u2

u1

u3

u2

u1

t1

t2

t3

t4

t3

t3

t1

t1

t2

t3

t4

Things u1 Alice

BobCharles

u3u2

Users

Page 36: Mahout and Recommendations

43©MapR Technologies 2013- Confidential

History Matrix

Alice

Bob

Charles

✔ ✔ ✔

✔ ✔

✔ ✔

Page 37: Mahout and Recommendations

44©MapR Technologies 2013- Confidential

Cooccurrence Matrix

1 2

1 1

1

1

2 1

Page 38: Mahout and Recommendations

45©MapR Technologies 2013- Confidential

Indicator Matrix

Page 39: Mahout and Recommendations

46©MapR Technologies 2013- Confidential

Indicator Matrix

id: t4title: puppydesc: The sweetest little puppy ever.keywords: puppy, dog, pet

indicators: (t1)

Page 40: Mahout and Recommendations

47©MapR Technologies 2013- Confidential

Problems with Raw Cooccurrence

Very popular items co-occur with everything– Welcome document– Elevator music

That isn’t interesting– We want anomalous cooccurrence

Page 41: Mahout and Recommendations

48©MapR Technologies 2013- Confidential

Recommendation Basics

Coocurrence

t3 not t3

t1 2 1

not t1 1 1

Page 42: Mahout and Recommendations

49©MapR Technologies 2013- Confidential

Spot the Anomaly

Root LLR is roughly like standard deviations

A not A

B 13 1000

not B 1000 100,000

A not A

B 1 0

not B 0 2

A not A

B 1 0

not B 0 10,000

A not A

B 10 0

not B 0 100,000

0.44 0.98

2.26 7.15

Page 43: Mahout and Recommendations

50©MapR Technologies 2013- Confidential

A Quick Simplification

Users who do h (a vector of things a user has done)

Also do r

User-centric recommendations(transpose translates back to things)

Item-centric recommendations(change the order of operations)

A translates things into users

Page 44: Mahout and Recommendations

51©MapR Technologies 2013- Confidential

Symmetry Gives Cross Recommentations

Conventional recommendations with off-line learning

Cross recommendations

Page 45: Mahout and Recommendations

52©MapR Technologies 2013- Confidential

For example

Users enter queries (A)– (actor = user, item=query)

Users view videos (B)– (actor = user, item=video)

ATA gives query recommendation– “did you mean to ask for”

BTB gives video recommendation– “you might like these videos”

Page 46: Mahout and Recommendations

53©MapR Technologies 2013- Confidential

The punch-line

BTA recommends videos in response to a query– (isn’t that a search engine?)– (not quite, it doesn’t look at content or meta-data)

Page 47: Mahout and Recommendations

54©MapR Technologies 2013- Confidential

Real-life example

Query: “Paco de Lucia” Conventional meta-data search results:– “hombres del paco” times 400– not much else

Recommendation based search:– Flamenco guitar and dancers– Spanish and classical guitar– Van Halen doing a classical/flamenco riff

Page 48: Mahout and Recommendations

55©MapR Technologies 2013- Confidential

Real-life example

Page 49: Mahout and Recommendations

56©MapR Technologies 2013- Confidential

Hypothetical Example

Want a navigational ontology? Just put labels on a web page with traffic– This gives A = users x label clicks

Remember viewing history– This gives B = users x items

Cross recommend– B’A = label to item mapping

After several users click, results are whatever users think they should be

Page 50: Mahout and Recommendations

57©MapR Technologies 2013- Confidential

Nice. But we can do better?

Page 51: Mahout and Recommendations

58©MapR Technologies 2013- Confidential

users

things

Page 52: Mahout and Recommendations

59©MapR Technologies 2013- Confidential

users

thingtype 1

thingtype 2

Page 53: Mahout and Recommendations

60©MapR Technologies 2013- Confidential

Page 54: Mahout and Recommendations

61©MapR Technologies 2013- Confidential

Part 3:What about that worked example?

Page 55: Mahout and Recommendations

62©MapR Technologies 2013- Confidential

http://bit.ly/18vbbaT

Page 56: Mahout and Recommendations

63©MapR Technologies 2013- Confidential

SolRIndexerSolR

IndexerSolrindexing

Cooccurrence(Mahout)

Item meta-data

Indexshards

Complete history

Analyze with Map-Reduce

Page 57: Mahout and Recommendations

64©MapR Technologies 2013- Confidential

SolRIndexerSolR

IndexerSolrsearchWeb tier

Item meta-data

Indexshards

User history

Deploy with Conventional Search System

Page 58: Mahout and Recommendations

65©MapR Technologies 2013- Confidential

Objective Results

At a very large credit card company

History is all transactions

Development time to minimal viable product about 4 months

General release 2-3 months later

Search-based recs at or equal in quality to other techniques

Page 59: Mahout and Recommendations

66©MapR Technologies 2013- Confidential

Summary

Input: Multiple kinds of behavior on one set of things

Output: Recommendations for one kind of behavior with a different set of things

Cross recommendation is a special case

Page 60: Mahout and Recommendations

67©MapR Technologies 2013- Confidential

Objective Results

At a very large credit card company

History is all transactions

Development time to minimal viable product about 4 months

General release 2-3 months later

Search-based recs at or equal in quality to other techniques

Page 61: Mahout and Recommendations

68©MapR Technologies 2013- Confidential

Me, Us

Ted Dunning, Chief Application Architect, MapRCommitter PMC member, Mahout, Zookeeper, DrillBought the beer at the first HUGtdunning@{apache.org,maprtech.com} [email protected]

MapRDistributes more open source components for HadoopAdds major technology for performance, HA, industry standard API’s

TonightHash tag - #dfwbd #maprSee also - @ApacheMahout @ApacheDrill

@ted_dunning and @mapR