CC 2.0 by Horia Varlan | http://flic.kr/p/7vjmof
Jan 15, 2015
CC 2.0 by Horia Varlan | http://flic.kr/p/7vjmof
2
Agenda
• What are Product Recommenders • Introducing Recommenders • A Simple Example • Recommender Evaluation
• How do they work? • Machine learning tool – Apache
Mahout
Namics Conference 2012
September 1, 2012
3
About Sentric
• Spin-off of MeMo News AG, the leading provider for Social Media Monitoring & Analytics in Switzerland
• Big Data expert, focused on Hadoop, HBase and Solr
• Objective: Transforming data into insights
Intro
September 1, 2012
CC 2.0 by Dennis Wong | http://flic.kr/p/6C3RuV
5
The Patterns
• Each day we form opinions about things we like, don’t like, and don’t even care about.
• People tend to like things … • that similar people like • that are similar to other things they like
• These patterns can be used to predict such likes and dislikes.
Introducing Recommenders
September 1, 2012
6
Strategies for Discovering New Things
user-based – Look to what people with similar tastes seem to like Example:
Introducing Recommenders
September 1, 2012
7
Strategies for Discovering New Things
item-based – Figure out what items are like the ones you already like (again by looking to others’ apparent preferences)
Example:
Introducing Recommenders
September 1, 2012
8
Strategies for Discovering New Things
content-based – Suggest items based on particular attribute (again by looking to others’ apparent preferences)
Example:
September 1, 2012
Introducing Recommenders
September 1, 2012
9
Recommendation is all about predicting patterns of taste, and using them to discover new and desirable things you didn’t already know about.
Collaborative Filtering – Producing recommendations based on, and only based on, knowledge of users’ relationships to items.
Recommenders
User-based
Item-based
Content-based
The Definition of Recommendation
Introducing Recommenders
September 1, 2012
CC 2.0 by Will Scullin | http://flic.kr/p/6K9jb8
11
The Workflow
• Let’s start with a simple example
A Simple user-based Example
September 1, 2012
Create Input Data
Create a Recommender
Analyse the Output
12
Input Data
• Recommendations will base on input-data
• Data takes the form of preferences –associations from users to items
Example: These values might be ratings on a scale of 1 to 5, where 1 indicates items the user can’t stand, and 5 indicates favorites.
A Simple user-based Example
September 1, 2012
1,101,5.0 1,102,3.0 1,103,2.5 2,101,2.0 2,102,2.5 2,103,5.0 2,104,2.0 3,101,2.5 3,104,4.0 3,105,4.5 3,107,5.0 4,101,5.0 4,103,3.0"4,104,4.5"4,106,4.0"5,101,4.0"5,102,3.0"5,103,2.0"5,104,4.0"5,105,3.5"5,106,4.0 "
User 1 has a preference 3.0 for item 102
13
Trend Visualization
• Trend visualization for positive users preferences (in petrol)
• All other preferences are recognized as negative – the user doesn’t seem to like the item that much (red, dotted)
A Simple user-based Example
September 1, 2012
1,101,5.0 1,102,3.0 1,103,2.5 2,101,2.0 2,102,2.5 2,103,5.0 2,104,2.0 3,101,2.5 3,104,4.0 3,105,4.5 3,107,5.0 4,101,5.0 4,103,3.0"4,104,4.5"4,106,4.0"5,101,4.0"5,102,3.0"5,103,2.0"5,104,4.0"5,105,3.5"5,106,4.0 "
1 5 3
101 102 103 104 105 106 107
2 4
14
Trend Visualization
Users 1 and 5 seem to have similar tastes. Both like 101, like 102 a little less, and like 103 less still
A Simple user-based Example
September 1, 2012
1 5
101 102 103 104 105 106 107
4 2
Users 1 and 4 seem to have similar tastes. Both seem to like 101 and 103 identically
Users 1 and 2 have tastes that seem to run counter to each other
15
Analyzing the Output
So what product might be recommended to user 1?
A Simple user-based Example
September 1, 2012
1 5 3
101 102 103 104 105 106 107
2 4
Obviously not 101, 102 or 103. User 1 already knows about these.
16
Analyzing the Output
The output could be: [item:104, value:4.257081]" The recommender engine did so because it estimated user 1’s preference for 104 to be about 4.3, and that was the highest among all the items eligible for recommendation. Questions: • Is this the best recommendation for user 1? • What exactly is a good recommendation?
A Simple user-based Example
September 1, 2012
CC 2.0 by larsaaboe | http://flic.kr/p/7nJpV8
18
Evaluating a Recommender
Goal: Evaluate how closely the estimated preferences match the actual preferences.
How?
A Simple user-based Example
September 1, 2012
Prepare Split Run Analyse Reasonable data set
30% for test 70 % for training
Compare estimates with test data à Calculate a score
Produce estimate preferences with training data
Experiment with other recommenders
19
Evaluating a Recommender
Example evaluation output for a particular recommender engine
Note: A score of 0.0 would mean perfect estimation
A Simple user-based Example
September 1, 2012
Item 1 Item 2 Item 3
Actual 3.0 5.0 4.0
Estimate 3.5 2.0 5.0
Difference 0.5 3.0 1.0
Average distance = (0.5+3.0+1.0)/3=1.5
Root-mean-square =√((0.52+3.02+1.02)/3)=1.8484
CC 2.0 by amtrak_russ | http://flic.kr/p/6fAPej
21
Apache Mahout
• Mahout … • Open-source machine learning library from
Apache (Java) • Can be used for large data collections – it’s
scalable, build upon Apache Hadoop • Implements algorithms such as
Classification, Recommenders, Clustering • Incubates a number of techniques and
algorithms
• ML it’s a hype! But …
In a Nutshell
September 1, 2012
22
Create a Recommender
A Simple Recommender
A Simple user-based Example
September 1, 2012
class RecommenderExample {" … main(String[] args) throws … {" DataModel model = new FileDataModel(new File(“examle.csv")); " UserSimilarity similarity = " new PearsonCorrelationSimilarity(model);" UserNeighborhood neighborhood = " new NearestNUserNeighborhood(2, similarity, model);" Recommender recommender = " new GenericUserBasedRecommender(model, neighborhood, similarity);" List<RecommendedItem> recommendations = recommender.recommend(1, 1);"" for (RecommendedItem recommendation : recommendations) {" System.out.println(recommendation);" }"}}"
23
Component Interaction
A user-based Recommender
September 1, 2012
<<interface>> Recommender
<<interface>> UserSimilarity
<<interface>> UserNeighborhood
<<interface>> DataModel ApplicaAon
24
UserNeighborhood
NearestNUserNeighborhood
A neighborhood around user 1 is chosen to consist of the three most similar users: 5, 4, and 2
ThresholdUserNeighborhood Defining a neighborhood of most-similar users with a similarity threshold
Algorithms
September 1, 2012
2
1 5
4 3
2
1 5
4 3
25
User Similarity
Implementations of this interface define a notion of similarity between two users. Implementations should return values in the range -1.0 to 1.0, with 1.0 representing perfect similarity.
Algorithms
September 1, 2012
<<interface>> UserSimilarity"
LogLikelihoodSimilarity"
TanimotoCoefficientSimilarity" ..."
EuclideanDistanceSimilarity"
PearsonCorrelationSimilarity"
UncenteredCosineSimilarity"
26
User Similarity
Similarity between data objects can be represented in a variety of ways: • Distance between data objects is sum of the
distances of each attribute of the data objects (i.e. Euclidean Distance)
• Measuring how the attributes of both data objects change with respect to the variation of the mean value for the attributes (Pearson Correlation coefficient)
• Using the word frequencies for each document, the normalized dot product of the frequencies can be used as a measure of similarity (cosine similarity)
• An a few more ..
Algorithms
September 1, 2012
27
Euclidean Distance
Similarity between two data objects:
Mathematically & Plot
September 1, 2012
User 1
User 2
User 3 User 4
User 5
0
1
2
3
4
5
0 1 2 3 4 5
102
101
28
Pearson Correlation
Similarity between two data objects:
Mathematically & Plot
September 1, 2012
101
102
103
104
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0 1 2 3 4 5
Use
r 5
User 1
29
Thank you!
Questions? Jean-Pierre König, [email protected]
Namics Conference 2012
September 1, 2012
30
Literatur & Links
• References The content of this presentation is based on: • Chapter 1, 2 and 4 of the following book:
Owen, Anil, Dunning, Friedman. Mahout in Action. Shelter Island, NY: Manning Publications Co., 2012.
• Chapter “Discussion of Similarity Metrics” of the following publication: Shanley Philip. Data Mining Portfolio.
• Links http://bitly.com/bundles/jpkoenig/1
A Simple user-based Example
September 1, 2012