Top Banner
follow the Hippo trail Hippo GetTogether 2014 Big Data @ Hippo Hippo GetTogether 2014 - Trouw Frank van Lankvelt follow the Hippo trail
33

Big Data @ Hippo - GetTogether 2014

Jan 28, 2018

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Big Data @ Hippo

Hippo GetTogether 2014 - Trouw

Frank van Lankvelt

follow the Hippo trail

Page 2: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Co-occurrence

Relating Attributes

Page 3: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Scary Math

Page 4: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Contingency TableA not A

B x 20 - x 20

not B 40 - x 140 + x 180

40 160 200

Documents A, B

total # visitors

visitors of B

visitors of A

x P(x >= 8) ≈ 3%

visitors of A & B

Page 5: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Co-occurrence InsightsInsight: a high cohesion of page visits in the partner section

standing out from the regular ‘.com’ visitor cluster suggests that visitors looking for a partner go through every single page and probably can’t find what they’re looking for.

Action: Hippo suggests to improve navigation, search or filtering.

● attribute / url

relatedness

find partner

/fr

.com.orggenericrelease

notes

Page 6: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Recommendations

Alice Bob Charlie

Star Wars 3 4

Finding

Nemo3 4

Sound of

Music5 1 2

genre stars

Star Wars sci-fi Portman

Finding

Nemoanimation DeGeneres

Sound of

Musicmusical Andrews

user - item (rating)

collaborative filtering

content

(meta) data

which documents are interesting for ME?

find docs similar to visited documents find docs co-occurring with visited documents

Page 7: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Implementation

combine in search index:

Recommendation Query

Content-based:

(meta) data

Collaborative Filtering:

co-occurrence

Page 8: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Page 9: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Recommended For You

1.Collect ID of viewed content

2.Calculate co-occurrences

3.Index, along with contentIDs of co-viewed documents

4.Search with recent IDs, similarity

5.Repeat with other collected data

Page 10: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Patterns

Beyond Co-occurrence

Page 11: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Patterns in the Data

customers that buy diapers often buy beer as well

(young dads rewarding themselves?)

Page 12: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Itemsets Rules

Find the patterns (association rule mining):1.sets of items that are bought togetherP(beer,diapers) > 1%(support)

1.subsets that are good predictors

> 4 (lift)P(beer,diapers)P(beer) P(diapers)

Page 13: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

http://www.onehippo.com/en/thankyou - Thank You

Beer? Diapers? Conversions!!!

Page 14: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

http://www.onehippo.com/en/thankyou

will a visitor go there?P(conversion|request log)

what are the relevant “signals”?

which configuration performs best?

Page 15: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Patterns For Conversion

single item:referrer www.google.com

pattern/itemset:visited demo2014 week 4

correlations

Page 16: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Scary Data Structure

Page 17: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

1.Build Frequent Prefix Tree(FPGrowth)

2.Extract patterns relevant for conversion(using contingencies)

Finding Frequent Itemsets

Page 18: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Pattern Contingency Table

converted not converted

patternmatches

pattern does not

match

converted● visited /thankyou

sample pattern● visited demo● in 2014 week 4

Page 19: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Sub-Pattern Filtering

Problem:when pattern (A, B, C) is relevant, patterns

(A), (B), (C), (A, B), (A, C), (B, C)(likely) also match. E.g. with C meta-data on page B.

Solution:test for independence using contingency!

Page 20: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Actionable Insights?

The found itemsets are quite numerous and seem to contain a lot of redundancy.

But they are certainly interesting, e.g. for a periodic evaluation.

Page 21: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Personalization

Putting Patterns to Use

Page 22: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Naive A/B Testing

The naive solution:route some traffic to alternative configuration

A (old config): 80%B (new config): 20%

run for some time

see if B has relatively more conversions

Page 23: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Problems With Naive Solution

if B is drastically worse,20% of traffic is LOST

marketer must regularly check and decidewhen has a new config PROVEN itself?

number of concurrent experiments is LOW

no user context

Page 24: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Scary Math

Page 25: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Predict Conversion

Conversion rate depends on context:

x the patterns

w the “weights”

ϕ cdf of normal dist.

Page 26: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Experimental Setup

Split data set (.org + .com)

1.training set189660 visitors, 435 conversions

2.test set27013 visitors, 40 conversions

Page 27: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Can We Predict Conversion?

1260 itemsets

ROC curveTPR versus FPR

@ false positive rate 10%: 96% true positive rate

Page 28: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Towards Actionable Insights

UseA utomaticR elevanceD etermination

to prune the patterns(optimize the prior)

σ

μ

relevant

irrelevant

weights (w)

Page 29: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Top 20 Patterns For Conversionreferer.go.onehippo.com.pathInfo./resources/whitepapers/forrester-market-overview-web-content-management-systems.html.pathInfo./resources/whitepapers/cms---a-critical-solution-for-todays-ecommerce.html.pathInfo./resources/whitepapers/hippo-cms-for-the-enterprise.html.pathInfo./resources/whitepapers/web-content-management-in-the-cloud.html.collectorData.channel.One Hippo English Site .collectorData.audience.terms. referer.www.onehippo.com.collectorData.categories.terms.cms .pathInfo./mobile-cms.collectorData.channel.One Hippo English Site .pathInfo./ressourcen/demo.pathInfo./resources/videos/hippo-cms-grand-tour.html.collectorData.channel.One Hippo English Site .collectorData.audience.terms. .collectorData.categories.terms.cms

.pathInfo./ressources/demo

.pathInfo./what_to_buy/compare.htmlreferer.www.cmswire.com.pathInfo./resources/demo .collectorData.categories.terms.mobile.pathInfo./resources/whitepapers/understanding-hippo-cms-7-software-architecture.html.pathInfo./resources/whitepapers/selecting-today’s-enterprise-web-content-management-system.html.collectorData.channel.One Hippo English Site referer.www.google.nlreferer.www.onehippo.com .pathInfo./resources/videos/a-quick-overview-of-hippo-cms-in-just-under-3-minutes.html.collectorData.categories.terms.repository .pathInfo./resources/whitepapers/selecting-today’s-enterprise-web-content-management-system.html.collectorData.categories.terms. .collectorData.categories.terms.relevance

Page 30: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Actionable Insights!

we can find asmall model

that can be used forhuman interpretation

andautomated personalization

Page 31: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Product Challenge

KISS# parameters should be minimal

Page 32: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Parameters

Recommendations1 hyper-param

Personalizationidem

NICE!

Page 33: Big Data @ Hippo - GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Questions?