Recommender Engines Seminar Paper

RWTH Aachen UniversityUniversity of BonnFraunhofer FITE-Commerce Seminar WT 08/09

Recommender Engines

Seminar Paper

Thomas Hess (289222)

February 1, 2009

Abstract Recommender engines are used by more and more e-commerce businesses to help con-sumers finding products they are interested in. The paper describes what recommender engines areand what role they play in e-commerce. Recommender engines use various techniques that use dif-ferent knowledge sources to make recommendations. The paper explains these techniques and theirstrengths and weaknesses. Some of the common issues that recommender systems face are discussedand possible solutions presented. Concluding examples of recommender engines in e-commerce aredescribed. It is shown what techniques they use and how the e-businesses utilize recommendations ontheir websites.

Contents

1 Introduction 5

2 Recommender Techniques 6

2.1 Non-Personalized Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Demographic Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Content-Based Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4 Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4.1 User-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.4.2 Item-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.4.3 Model-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.5 Hybrid Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Issues And Solutions 14

3.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.2 Cold Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.3 Stability vs. Plasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.4 Sparsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.5 Performance & Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.6 User Input Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.7 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Recommender Engine Examples 19

4.1 ChoiceStream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.2 Amazon.com . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.3 Digg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5 Conclusion 36

3

List of Figures

2.1 Knowledge Sources of Recommender Engines . . . . . . . . . . . . . . . . . . . . . 62.2 Non-Personalized Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Demographic Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4 Content-Based Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.5 User-Based Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.6 User-Based Collaborative Filtering Example . . . . . . . . . . . . . . . . . . . . . . 112.7 Item-Based Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.8 Item-Based Collaborative Filtering Example . . . . . . . . . . . . . . . . . . . . . . 122.9 Model-Based Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.1 ChoiceStream Recommender Engine . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2 Amazon – Item With Recommendations . . . . . . . . . . . . . . . . . . . . . . . . 234.3 Amazon – Shopping Cart With Recommendations . . . . . . . . . . . . . . . . . . . 244.4 Amazon – Your Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.5 Amazon – Recommendation Details . . . . . . . . . . . . . . . . . . . . . . . . . . 274.6 Amazon – Your Purchases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.7 Digg – Story . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.8 Digg – Topic Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.9 Digg – Homepage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.10 Digg – Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.11 Digg – Correlated User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4

1 Introduction

Recommender engines are personalized information agents that attempt to predict which items out ofa large pool a user may be interested in. These items can be of any type, like movies, music, books,websites, or news articles. The user’s interest in an item is expressed through the rating the user givesthe item. A recommendation system has to predict the ratings for items that the user has not yet seen.With these estimated ratings the system can recommend the items that have the highest estimatedrating.

Recommender engines have become an integral part of many e-commerce businesses [1, 2]. They area serious business tool that gets used by an ever-increasing number of online stores. Recommendersystems are an unique feature of e-commerce, as websites are able to track everything their customersdo, in contrast to real stores. The knowledge learned from the customers’ behaviour is the basis forthe recommendations. Because online businesses have no real space constraint, they can offer muchlarger stocks, providing their customers with more choices. These large stocks become impossible tostack search, so e-commerce stores must provide personalized versions with reduced choices to theindividual users. One way to achieve this is the use of recommender engines.

For e-commerce vendors, recommender engines provide multiple benefits. Good recommender sys-tems present customers products they are interested in but did not plan to buy, making them purchasemore items [2, 3, 4]. These unplanned purchases are not yet happening as often in online stores as intraditional stores [2]. Recommender engines can help to gain consumers’ loyalty, which is a essentialbusiness strategy in e-commerce as the competitor is always just “one click away” [4]. Because rec-ommender systems make it easier und faster to find new items, customers come back more often [2].The more a user uses a website and purchases items, the more the recommender engine learns aboutthe user and the better the recommendations get. This helps to build a “value-added relationship”between the website and the user [4]. Recommender systems are also a way to promote older orlow-demand items, such as niche products [2].

5

2 Recommender Techniques

The techniques used by recommender engines can be classified based on the information sources theyuse [5, 2]. The available sources are the user features (demographics) (e.g. age, gender, profession,income, location), the item features (e.g. keywords, genres), and the user-item ratings (gatheredthrough questionnaires, explicit ratings, transaction data). See figure 2.1.

2.1 Non-Personalized Recommendation

Non-personalized recommendations are identical for each user. The recommendations are either man-ually selected (e.g. editor choices) or based on the popularity of items (e.g. average ratings, sales data).See figure 2.2.

Figure 2.1: Knowledge Sources of Recommender Engines (From [5])

6


Figure 2.2: Non-Personalized Recommendation (From [5])

Because non-personalized recommendations are easy to compute, they are popular among e-commercebusinesses. They are also an option for websites that offer no personalization.

2.2 Demographic Recommendation

Demographic recommendation methods uses only the information about the users. The users arecategorized based on the attributes of their demographic profiles in order to find users with similarfeatures. The engine then recommends items that are preferred by these similar users. See figure 2.3.

Advantages

• Because user-item ratings are not used, new users can get recommendations before they haverated any item.

• Knowledge about the items and their features is not needed, therefore the technique is domain-independent.

Figure 2.3: Demographic Recommendation (From [5])

7


Figure 2.4: Content-Based Recommendation (From [5])

Problems

• Gathering the required demographic data leads to privacy issues, see 3.7.• Demographic classification is too crude for highly personalized recommendations [5, 3]. The

generalisations created from the classification are often false, especially when it comes to cul-tural items like books, music, or movies [6, 3].

• Users with an unusual taste may not get good recommendations (“gray sheep” problem, see 3.6).• Once established user preferences do not change easily (stability vs. plasticity problem, see 3.3).

2.3 Content-Based Recommendation

Content-based recommendation methods use the information about item features and the ratings auser has given to items. The technique combines these ratings to a profile of the user’s interests basedon the features of the rated items. The engine then can find items with the preferred features andrecommend the items with the highest similarity to the ones preferred in the past. See figure 2.4. Therecommendations of a content-based system are based on individual information and ignore contribu-tions from other users.

The profiles of the users’ interests are often represented as vectors of weights on item features. But ifautomatic learning methods, like a rule induction algorithm, are used to generate them, they can alsobe rule-based [7].

Content-based recommendation works well if the items can be properly represented as a set of fea-tures. The quality of the recommendations depends directly on the quality of the available descriptivedata. In order to have a sufficient set of features, the item descriptions must either be in a form fromwhich features can be extracted automatically with information retrieval techniques (e.g. text), or

8


the features must be assigned manually, which takes a lot of resources [8]. Besides objective cate-gorizations, systems can also use (user-generated) tags associated to items that provide a subjectiveview.

Problems

• Content analysis is necessary to determine the item features.• The technique depends not only on the quality of the item metadata but also on the homogeneity

of the stock, so items can be categorized.• The quality of items cannot be evaluated. The similarity computation is limited to the item

features [5].• The technique suffers from the cold start problem for new users, see 3.2.• Once established user preferences do not change easily (stability vs. plasticity problem, see 3.3).

2.4 Collaborative Filtering

Collaborative filtering techniques use the user behaviour in form of the user-item ratings as their in-formation source. The concept is to make correlations between users or between items.Collaborativefiltering is widely implemented and the most mature recommendation technique. Three main ap-proaches of collaborative filtering can be distinguished: user-based, item-based, and model-basedapproaches.

Advantages

• Like for demographic recommendations no knowledge about the item features is needed. Col-laborative filtering works completely independent of machine-readable item representations. Itis therefore domain independent.

• The quality (not just the relevancy) of items can be evaluated, as it is also expressed throughuser-item ratings [5].

• Collaborative filtering techniques are able to make recommendations “outside the box” becausethey look outside the preferences of the individual user [1].

9


Figure 2.5: User-Based Collaborative Filtering (From [5])

Problems

• The quality of the recommendations depends on the size of the historical rating data set.• The technique suffers from the cold start problem for new users and new items, see 3.2.• Users with an unusual taste may not get good recommendations (“gray sheep” problem, see 3.6).• Once established user preferences do not change easily (stability vs. plasticity problem, see 3.3).

2.4.1 User-Based Approach

The user-based approach is based on the assumption that users that rated the same items similarlyprobably have the same taste. It make user-to-user correlations by using the rating profiles of differentusers to find highly correlated users. These users form like-minded neighbourhoods based on theirshared item preferences. The engine then can recommend the items preferred by the other users in theneighbourhood. See figure 2.5.

Figure 2.6 shows an example of user-based collaborative recommendation.

But if there are little overlapping ratings across users in the data set, the user-based approach runs intothe sparsity problem, see 3.4.

User-based collaborative filtering does not scale well for many users and items, because the analysisand comparison processes become more complex, see 3.5.

2.4.2 Item-Based Approach

The item-based approach focuses on items, assuming that items rated similarly are probably similar. Itcompares items based on the shared appreciation of users, in order to create neighbourhoods of similar

10


Figure 2.6: User-Based Collaborative Filtering Example (From [5])

items. The engine then recommends the neighbouring items of the user’s know preferred ones. Seefigure 2.7.

Figure 2.8 shows an example of item-based collaborative recommendation.

Item-based collaborative filtering is more scalable than the user-based approach, as the correlationsare drawn among a limited number of products, instead of a potentially very large number of users.Items are also easy to categorize, while users’ activities must be examined and analyzed. See 3.5.

Also because the number of items is naturally smaller than the number of users, the item-based ap-proach has a reduced sparsity problem (see 3.4) in comparison to the user-based approach.

Figure 2.7: Item-Based Collaborative Filtering (From [5])

11


Figure 2.8: Item-Based Collaborative Filtering Example (From [5])

2.4.3 Model-Based Approach

For huge data sets, the quadratic complexity of the user-item rating matrix gets very high [7]. But inreal applications predictions must me made quickly. Model-based approaches address this problemby deriving a model for prediction from historical user-item rating data, in order to make the onlineprediction process faster. To build the model learning techniques like bayesian networks, neural net-works, or latent semantic indexing are used. For an accurate model a large amount of data must beavailable. The engine then makes the online recommendations by using the model. See figure 2.9.

As the model is build in advance of the online recommendation processes, this approach has a higherperformance than the memory-based approaches and avoids the scalability problem, see 3.5. Depend-ing on the learning techniques used to create the model, this approach can lead to a higher recommen-dation accuracy and a reduced sparsity problem [5].

The major drawback of the model-based approach is that the recommendation results do not adapt

Figure 2.9: Model-Based Collaborative Filtering (From [5])

12


automatically to data changes. Instead the model must be re-build to reflect updated data.

2.5 Hybrid Approaches

Hybrid approaches combine collaborative and demographic or content-based methods in order to over-come their drawbacks. Collaborative filtering systems often result in better predictive performancebut have problems when limited user-item ratings are available [7]. Demographic and content-basedrecommendation systems work without rating data and therefore can compensate for the cold startproblem [1].

There are various methods to combine recommender techniques in a hybrid system [1, 9]:

Weighted Hybridization The scores of the different recommendation components are combinednumerically. Each component of the hybrid system scores a given item and the scores arecombined using a linear formula.

Switching Hybridization The system chooses among recommendation components based on thesituation and applies the selected one. Some reliable criterion must be available on which tobase the switching decision.

Mixed Hybridization Recommendations from different recommenders are presented side-by-sidein a combined list. The results of the recommender systems are not combined.

Feature Combination Features derived from different knowledge sources are combined togetherand then injected into a single recommendation algorithm.

Feature Augmentation One recommendation technique is used to compute a feature or set of fea-tures, which is then part of the input to the next technique.

Cascaded Hybridization Recommenders are given strict priority, with the lower priority ones break-ing ties in the scoring of the higher ones.

Meta-Level Hybridization One recommendation technique is applied to produce a model, which isthen used as the input for another technique.

13

3 Issues And Solutions

3.1 Data Collection

The data used by recommender engines can be categorized into explicit and implicit data [2].

Explicit is all data that users themselves feed into the system. Like demographic data, informationabout their preferences (e.g. collected through questionnaires), search terms, explicit ratings andreviews of items (wisdom of the crowds). The collection of explicit data must not be intrusive or timeconsuming. The way the explicit data is collected can affect the quality and amount of data the userswill provide [10].

Recommendation systems should not rely completely on explicit data. Websites are able to tracktheir user’s activities in order to acquire implicit data. The most important implicit data source ine-commerce is the transaction data including the purchase information. Other sources are web usagepatterns like click sequences or reading times, or search engine referrers. Implicit data needs to beanalyzed first before it can be used to describe user features or user-item ratings.

3.2 Cold Start

The cold start problem occurs when too little rating data is available in the initial state. The rec-ommender system then lacks data to produce appropriate recommendations. A distinction is madebetween the new user and new item problem.

New User Problem When recommendations follow from user-to-user correlations based on theaccumulation of ratings, a user with few ratings is difficult to categorize.

14


New Item Problem A item with few ratings cannot easily be recommended. This problem occursparticularly in domains with many new items (e.g. news articles). As the problem also occurs for longtail items, it is also called “long tail problem” [10].

A solution to the cold start problem is the combination of the collaborative technique with demo-graphic (for the new user problem) or content-based (for the new item problem) techniques in a hybridrecommender engine, see 2.5. That way the cold start problem gets compensated by techniques thatdon’t rely on user-item ratings.

Other solutions to reduce the cold start problem are the use of default ratings (e.g. from the averagerating of all users) [6, 10] or the use of active learning techniques in model-based recommendationtechniques [5].

3.3 Stability vs. Plasticity

The converse of the cold start problem is the stability vs. plasticity problem. When users have rated alot of items, their preferences in the established user profiles are difficult to change [1, 9]. But becausein reality taste evolves, this becomes a problem.

The solution for this is to gradually discount older ratings to have less influence. But by doing soengines risk to loose information about long-term interests [1, 9].

Related to this problem is that users may use a website with different intentions. For example one daya customer buys books for himself, but the next day he is looking for a present for someone else.

3.4 Sparsity

In most use cases for recommender systems, due to the catalog sizes of e-business vendors, the numberof ratings already obtained is usually very small compared to the number of ratings that need to bepredicted. But collaborative filtering techniques depend on an overlap in ratings across users and havedifficulties when the space of ratings is sparse (few users have rated the same items). Sparsity in theuser-item rating matrix degrades the quality of the recommendations.

15


To reduce the sparsity the rating data needs to be adjusted by either adding additional ratings orreducing the dimensionality of the matrix. Ratings can be augmented by inserting simulated valueson behalf of the users. These can be ratings derived from other (implicit) data sources, like item viewsor clicks, or default values [6].

The dimensionality of the rating matrix can be reduced by techniques such as singular value decompo-sition [1]. Singular value decomposition is a well-known method for matrix factorization that providesthe best lower rank approximations of the original matrix. Dimensionality reduction techniques areoften used in model-based collaborative filtering approaches [1].

3.5 Performance & Scalability

Performance and scalability are important issues for recommender systems as e-commerce websitesmust be able to determine recommendations in real-time and often deal with huge data sets of millionsof customers and items. The big growth rates of e-businesses are making the sets even larger in theuser dimension [6].

Definitive for the performance is the computational complexity of a recommendation technique. Tech-niques that calculate correlation coefficients for M users over N items have a complexity of O(M×N)in the worst case. Due to the common sparsity of the user-item rating matrix the performance tendsto be closer to O(M +N) [11]. However for large data sets this still leads to performance and scalingissues.

Techniques that can perform the most expensive calculations offline scale better than techniques whereeverything must be calculated online, in real time [11]. Demographic and content-based recommen-dation as well as item- and model-based collaborative filtering can utilize offline computation. Butuser-based collaborative filtering can do little or no offline computing, which makes it impractical forlarge data sets [11].

Additionally to performing calculations offline, all methods that help reducing the size of the dataset improve performance and scalability of a recommendation technique [6]. For example users withvery few ratings or very popular or unpopular items could be discarded [11]. But these methods alsoreduce the recommendation quality.

16


3.6 User Input Consistency

Recommender techniques that work with user-to-user correlations, like demographic or collaborativefiltering, depend on high correlation coefficients between the users in a data set.

Users can be split into three classes based on their correlation coefficients with other users [6]. Themajority of users fall into the class of “white sheep”, which have a high rating correlation with manyother users. Engines can easily find recommendations for these users. The opposite type are the“black sheep”. For them there are only few or no correlating users. This makes it very difficult to findrecommendations for them. But when the number of overall users in a data set increases, the chanceto find similar users increases as well.

The bigger problem is the “gray sheep” problem. These users have different opinions or an unusualtaste, that results in low correlation coefficients with many users. They fall on a border between usercliques. Recommendations for them are very difficult to find and they also cause odd recommenda-tions for their correlated users.

3.7 Privacy

Privacy is an important issue in recommender systems. In order to provide personalized recommen-dations, recommender systems must know something about the users. In fact, the more the systemsknow, the more accurate the recommendations can get. Users are reasonably concerned about whatinformation is collected, how it is used, and if it is stored.

These privacy concerns affect both, the collection of explicit and implicit data. Regarding explicitdata, users are reluctant to disclose information about themselves and their interests [2, 4]. If ques-tionnaires get too personal, users may provide false information in order to protect their privacy [4].Recommender engines should be able to deal with privacy concerned users and not solely rely onexplicit data or recommender techniques that do, like demographic recommendation.

Regarding implicit data that gets acquired by tracking users’ behaviour, there are concerns that per-sonal taste or private actions get revealed through the recommendations [5]. Users fear that extensiveconsumer profiles get created.

17


To confront these concerns e-commerce businesses muss provide privacy protection mechanisms [5]and make transparent which data gets acquired and analyzed. Usage und storage restrictions must beassured through privacy policies [4].

18

4 Recommender Engine Examples

Recommender engines are developed and run by independent technology vendors and by e-commercebusinesses themselves.

The business model of recommendation technology vendors is either to offer the recommender engineas a hosted service or to license their engines to e-commerce businesses. Examples for technologyvendors are: ChoiceStream1, Baynote2, ExpertMaker3, Loomia4, Criteo5, SourceLight6, and Collar-ity7.

Especially bigger e-commerce businesses develop their own recommender solutions because theyhave unique requirements, want unique features, or deal with items that third-party products are notsuited for. Examples are: Amazon.com8, Netflix9, Digg10, The Internet Movie Database (IMDb)11,Pandora12, and Last.fm13.

In the following the techniques and usages of the recommender engines of ChoiceStream, Ama-zon.com, and Digg are described in detail.

1http://www.choicestream.com2http://www.baynote.com3http://www.expertmaker.com4http://www.loomia.com5http://www.criteo.com6http://www.sourcelight.com7http://www.collarity.com8http://www.amazon.com9http://www.netflix.com

10http://digg.com11http://www.imdb.com12http://www.pandora.com13http://www.last.fm

19

http://www.choicestream.com

http://www.baynote.com

http://www.expertmaker.com

http://www.loomia.com

http://www.criteo.com

http://www.sourcelight.com

http://www.collarity.com

http://www.amazon.com

http://www.netflix.com

http://digg.com

http://www.imdb.com

http://www.pandora.com

http://www.last.fm


4.1 ChoiceStream

ChoiceStream is a personalisation company that offers their recommendation technology “RealRele-vance Recommendations” as a fully-hosted service for e-commerce vendors.

Because the different recommendation techniques all have their drawbacks and are not suited for allfields of application, ChoiceStream is using a hybrid system based on a variety of techniques that arechosen and combined depending on the concrete recommendation use case on hand [10]. The usecases that ChoiceStream distinguishes are listed in table 4.1.

The recommendation techniques used by the ChoiceStream recommender engine are [10]:

Collaborative Filtering Both, user-based and item-based collaborative filtering are used.

Collaborative Filtering Using Multiple Correlation Tables Use of multiple correlation tables(e.g. item views or clicks in addition to transactions) to overcome the cold start problem(see 3.2).

Cohort Analysis Creation of groups of similar users, called cohorts, in order to make better recom-mendations for users with sparse rating data.

Use Case DefinitionRich Profile User Users for whom you have a lot of data (e.g. more than 5 transac-

tions).Sparse Profile User Users for whom you have little data (e.g. fewer than 1 to 4 trans-

actions).Anonymous / New User Users for whom you have no data.Popular Content Items in your catalog that you can determine are “most popular”.

Typically these will be few in number, but very high volume.Mainstream Content Items for which you have recorded patterns of behavior (e.g. more

than 20 transactions per the items).New Content Items for which there are no past transactions.Long Tail Content Items in a catalog which are less well known, but still profitable,

and for which there are few past transactions.Business Goal Optimization The requirement to maximize a metric other than the number of

transactions, such as revenue, margin, or order size.

Table 4.1: ChoiceStream – Common Use Cases Requiring Different Algorithms (From [10])

20


Selective Filtering By selective filtering the most popular items are taken out of the recommenda-tions, so they don’t dominate and customers can find less popular items.

Attribute Correlations Item attributes are used to make content-based recommendations to over-come the cold start problems of collaborative filtering.

Default Recommendations Default recommendations are the fallback function if all other tech-niques fail to determine recommendations.

Business Goal Optimization With a multi-term scoring function the recommendation algorithmcan be adjusted to for example preferably recommend higher-priced items in order to increaserevenue.

Figure 4.1 shows what techniques are used for which use cases by the ChoiceStream recommenderengine.

Figure 4.1: ChoiceStream Recommender Engine (From [10])

21


4.2 Amazon.com

Amazon.com, founded in 1994, is the largest online retailer worldwide and one of the most well knowexample of e-commerce businesses utilizing a recommender engine. Amazon uses it’s recommenda-tion engines extensively to personalize its website.

Amazon’s recommender engine is based on item-based collaborative filtering [5, 6, 11]. It looks foritems correlating to the ones purchased and rated and combines the highly correlated items into arecommendation list [11].

The recommendation engine consists of an online and an offline component. The offline componentcreates an item-to-item matrix with all similar items. The online component can then lookup recom-mendations in the matrix when they are needed [11]. To build the item-to-item matrix a similarityfunction is used that determines the correlation coefficient between item pairs that customers tend topurchase together. This expensive calculation is done offline [11, 6]. The online component then onlyhas to lookup similar items to the ones a user already has purchased or rated. This is a very easy andfast operation that can be done online in real-time. Its complexity only depends on the number ofitems a customer is associated with [11].

By performing the most expensive calculations offline Amazon’s recommendation system can dealwith the huge data set of approximately 50 million customers per month (only from the U.S.) andseveral million catalog items. The online component scales independently of the catalog size and thenumber of customers [11]. Another benefit of the created similar-items table is that the algorithmproduces higher quality recommendations for users with little user-item rating data than traditionalcollaborative filtering [11].

Customers Who Bought On the information page for every item, Amazon shows the “CustomersWho Bought” feature that recommends items frequently purchased by customers who purchased theselected item, see Figure 4.2.

As figure 4.3 shows, the feature is also used on the shopping cart page. This works as the equivalentto the impulse items in a supermarket checkout line [11], but here the impulse items are personalizedfor each customer.

22


Figure 4.2: Amazon – Item With Recommendations

23


Figure 4.3: Amazon – Shopping Cart With Recommendations

24


Your Recommendations On the page “Your Recommendations” all recommendations are listedwith the ones derived from recent purchases in front, see Figure 4.4. They can be filtered by productline and subject area. Users can mark the recommended items as already owned or as not interestingas well as rate them in order to provide the recommender engine with further rating data to influencewhat gets recommended. It is also shown why an item is recommended, that is which purchased itemis correlated to the recommended item.

Additionally the user can view a detail page for every recommendation that lists all correlations topurchased or otherwise rated items, see Figure 4.5.

Amazon encourages users to refine their user-item rating data by giving the option to rate purchaseditems on a 5-point scale. On a page that lists all previous purchases the items can be rated and alsoexcluded from the recommendation calculation, see Figure 4.6.

25


Figure 4.4: Amazon – Your Recommendations1Recommended items can be marked as owned or not interested in and be rated2It is shown why items are recommended.

26


Figure 4.5: Amazon – Recommendation Details

27


Figure 4.6: Amazon – Your Purchases1Items can be rated2Items can be excluded from the recommendation engine

28


4.3 Digg

Digg is social news site, launched in 2004, where users can submit links to websites. Users can ratethese links, called stories, by “digging” or “burying” them. Stories can also be favorited, shared,and commented on. See figure 4.7. The stories are categorized into various topics. A user canconfigure which topics he is interested in and will then only see stories in these categories throughoutthe website, see Figure 4.8.

On the Digg homepage the most popular stories are shown, see Figure 4.9. The popularity is measuredby the number of recent “diggs”. Thereby the homepage utilizes non-personalized recommendation.

For registered users Digg provides personalized recommendations through their own recommendationengine, which is based on user-based collaborative filtering. The engine relies solely on the user-itemratings express by the the “digg” function. It works without knowledge about the content of thestories [12].

The recommendation engine uses the user’s history of “dugg” stories in the last thirty days to makerecommendations [13]. This short time span is appropriate for fast moving internet news, avoids thestability vs. plasticity problem, and helps to keep the size of the ratings matrix within limits.

Every time a user “diggs” a story, the engine associates the user with all other users who also have“dugg” the story. Out of these associations the recommender system calculates a correlation coef-ficient between the users. The coefficient is based on the number of “dugg” stories in common inrelation to the total number of stories “dugg” by each of the associated users [13]. The coefficient hasa value between one zero and one. Zero if both users have never “dugg” the same story. One if theusers share all their “dugg” stories. The coefficient calculation automatically accounts for the overalllevel of user activity. If a user “diggs” a lot of stories, the number of common “dugg” stories must behigh to get a high correlation coefficient. If a user “diggs” rarely, a small amount of agreement cansuffice.

The users highly correlated to a user are called “Diggers Like You”. The engine recommends theupcoming stories that have been “dugg” by these users, minus the stories the user has already “dugg”or buried. Stories are upcoming if they are newly submitted and have not made it to the homepage yet.The “Diggers Like You” therefore work as a filter for all the upcoming stories. In average numbersthis means that more than 17,000 submissions per day get boiled down to about 300 recommenda-tions [12].

29


Figure 4.7: Digg – Story1Users can “Digg” Stories2Users can Share and Favorite Stories3Recommendations by the Recommender Engine

30


Figure 4.8: Digg – Topic Settings

31


Figure 4.9: Digg – Homepage1Non-Personalized Recommendations2Personalized Recommendations from the Recommender Engine

32


A user’s recommended upcoming stories are displayed on the recommendations page, see Figure 4.10.On the right pane of the page a list of the most highly correlated users with their compatibility per-centage is shown. The compatibility percentage represents the correlation coefficient. This allows theuser to explore the correlated users. Also for every recommended story the correlated users, that have“dugg” this story, are shown including their compatibility percentage. By clicking on the compatibil-ity percentage of a correlated user a page is shown, that displays the correlation to this user in detail,see Figure 4.11. It is listed which stories both users have “dugg” and which stories are at the momentrecommended through this correlation. The user is also able to remove the correlation to this userfrom his recommendation calculation.

The recommender engine works in real-time without prediction models or batch processing. In orderto achieve this for more than 2 million users, Digg is using their own graph-database [12].

As a social platform Digg enables users to create social networks by designating other users as friends.Users can explore the stories their friends found interesting, which makes Digg also a social recom-mendation engine.

33


Figure 4.10: Digg – Recommendations1Recommendations by the Recommender Engine2Correlated User with Compatibility Percentage3Highly Correlated Users with Compatibility Percentage

34


Figure 4.11: Digg – Correlated User1Remove User from the Recommender Engine2Shared “Dugg” Stories

35

5 Conclusion

Recommender systems are a powerful technology for personalization. Used in the right way, theycan benefit both consumers and businesses. Consumers profit by finding new interesting products andbusinesses can increase their sales.

As e-commerce continues to grow the technologies of recommender engines are challenged to dealwith greater amounts of data. Therefore systems must be developed further to meet this challenge interms of recommendation accuracy, scalability and performance.

Item-based collaborative filtering proves to be the best recommendation technique in terms of recom-mendation quality, scalability, performance, and learning capability [7]. Combined in a hybrid systemwith content-based techniques in order to overcome the cold start problem, this is the state of the artof recommender systems used today.

There are many fields of application for recommender engines and many have their own requirementsthat get fulfilled by different techniques. So which recommendation technique works best alwaysdepends on the concrete use case.

36

Bibliography

[1] Burke, R. (2002): Hybrid Recommender Systems: Survey and Experiments.In: User Modeling and User-Adapted Interaction, Volume 12, Issue 4 (November 2002), KluwerAcademic Publishers, pp. 331–370

[2] Leavitt, N. (2006): Recommendation Technology: Will It Boost E-Commerce?.In: Computer Journal, Volume 39, Issue 5 (May 2006), IEEE Computer Society Press, pp. 13–16

[3] Thompson, C. (2008): If You Liked This, You’re Sure to Love That.In: The New York Times Magazine (November 21, 2008), http://www.nytimes.com/2008/11/23/magazine/23Netflix-t.html

[4] Schafer, J. B. et al. (2001): E-Commerce Recommendation Applications.In: Data Mining and Knowledge Discovery, Volume 5, Issue 1-2 (January–April 2001), pp. 115–153

[5] Kim, J. (2006): What is a recommender system?.In: Proceedings of Recommenders06.com (2006), pp. 1-21

[6] McCrae, J. et al. (2004): Collaborative Filtering.http://www.imperialviolet.org/suprema.pdf

[7] Candillier, L. et al. (2009): State-of-the-Art Recommender Systems.In: Collaborative and Social Information Retrieval and Access (2009), Idea Group Inc, pp. 1–22

[8] Adomavicius, G.; Tuzhilin, A. (2004): Recommendation Technologies: Survey of Current Meth-

ods and Possible Extensions.Working paper, Stern School of Business, New York University

37

http://www.nytimes.com/2008/11/23/magazine/23Netflix-t.html

http://www.nytimes.com/2008/11/23/magazine/23Netflix-t.html

http://www.imperialviolet.org/suprema.pdf

Bibliography

[9] Burke, R. (2007): Hybrid Web Recommender Systems.In: Lecture Notes in Computer Science (2007), Springer Berlin/Heidelberg, pp. 377–408

[10] ChoiceStream, Inc.: Personalization Technology Brief.http://www.choicestream.com/resources/

[11] Linden, G. et al. (2003): Amazon.com Recommendations: Item-to-Item Collaborative Filtering.In: IEEE Internet Computing, Volume 7, Issue 1 (January/February 2003), pp. 76–80

[12] Rose, K. (2008): Recommendation Engine Announcement.http://blog.digg.com/?p=127

[13] Kast, A. (2008): Digg Recommendation Engine White Paper.http://digg.com/whitepapers/recommendationengine

38

http://www.choicestream.com/resources/

http://blog.digg.com/?p=127

http://digg.com/whitepapers/recommendationengine

Recommender Engines Seminar Paper

Technology

use of recommender engines

abstract recommender

recommender techniques

userbased approach

recommender systems

useritem ratings

recommender engine examples19

digg recommendations