Building Recommender Systems - Mendeley and Science Direct

| 0

Daniel Kershaw (@danjamker)

Building Recommenders

20th September 2017

| 1

Mendeley

• Reference Manager

• Social Network

• Publication Catalogue

| 2

Science Direct

• Scientific publication database

• Used by the majority of

university and research

institutions

• Contains 12 million articles of

content from 3,500 academic

journals and 34,000 e-books

| 3

Why Recommendations

Pull

Allow users to discover more content

Make it easier to navigate catalogue

| 4

Why Recommendations

Pull

Allow users to discover more content

Make it easier to navigate catalogue

Push

Highlight new content to users

Bring users back to service

| 5

The five core components

Data Collection

Recommender Model

Recommendation Post Processing

Online Modules

User Interface

| 6

Outline

Developed Algorithms – keeping it simple

Practical Considerations – don’t look stupid

Implementation – how to scale a system

Evaluation – what is good enough

Evolution – what’s changed over time

Future Direction – the future’s bright the future’s is deep

| 7

Developed Algorithms

| 8

Available Data

Implicit

User libraries (Mendeley)

User article interactions (Science Direct)

Content

Abstracts

Titles

References

| 9

Content Based

Similarity between what users

have read

Similarity in references

Collaborative Collaborative

Matrix Factorization

KNN

LDA

Potential Methods

| 10

User item interaction matrix

User base CF – (kNN)

https://buildingrecommenders.wordpress.com/2015/11/18/overview-of-recommender-algorithms-part-2/

| 11

Similarity between query users and other readers



| 12

Similarity between all users



| 13

Generating recommendations for user



| 14

• Ability to scale

• Matrix incredibly sparse

Why not Matrix Factorization

| 15

Practical Considerations

| 16

Explore/Exploit (Dithering)

Recommendations generated in batch

Users want an interactive experience

Slight shuffles give the impression of

freshness

Allow for the exploration of the list if only

a proportion shown

𝑠𝑐𝑜𝑟𝑒𝑑𝑖𝑡ℎ𝑒𝑟𝑒𝑑 = log 𝑟𝑎𝑛𝑘 + 𝑁 0, log 𝜖

where 𝜀 =∆ 𝑟𝑎𝑛𝑘

𝑟𝑎𝑛𝑘and tipically 𝜀 ∈ [1.5,2]

| 17

Impression Discounting

• Experience deteriorates if exposed to the same information

• Push recommendations seen before down the list

Rank

Impressions

| 18

Impression Discounting

• Experience deteriorates if exposed to the same information

• Push recommendations seen before down the list

𝑠𝑐𝑜𝑟𝑒𝑛𝑒𝑤 = scoreoriginal ∗ (w1 ∗ g impCount + w2 ∗ g lastSeen )

See Lee, P. et. al

| 19

Business Logic (Pre and Post Filtering)

Don’t show items they already have (bought, added, consumed)

Don’t feed the recommender positive feedback from recommender

Don’t recommend out of stock items

• A bad recommender has a cost

- Can be greater than not receiving a recommendation

| 20

Implementation

| 21

Systems Architecture

Impression

Discounting

API

Front End

AWS

Dithering

Candidate Selection

Conte

nt

Based

Item

2Ite

m

CF

Online

Offline

Logs

| 22

The unbundled mess

| 23

System

• Which run generated the

recommendation

• What was served to the user

• How was the score modified

• What was removed from the

recommendations

User (Feedback loop)

• What was displayed

• What was clicked

• When were they served

• Where the recommendations

displayed

Logging

Used for both debugging and feeding information to recommender

| 24

Evolutions

| 25

• User to Item CF

• Impression Discounting

Mendeley – Desktop Application

| 26

Mendeley – Online

• Implicit – serves recommendations based on user libraries

• Recent Activity – based off recent additions to a users library

• Research Interests - based on user generated tags

• Discipline – based on their self identified discipline

Most Personalized

Least Personalized

See Hristakeva, M et. Al (2017)

| 27

• Remove carousels

• Focus on implicit

recommendations

• Fall back to content based

solution

Mendeley – Online

| 28

• Recommendation based of the

complete library of the user

• Don’t send the same

recommendations twice

Mendeley - Email

| 29

• Item to Item

• Take user reading history

• Get recommendations for each

item

• Interleave recommendations

• Don’t send same

recommendations twice

Science Direct - Email

| 30

Science Direct – Article Page

Item to Item

Dither

recommendations

every 30 minutes

| 31

Evaluation

| 32

Off-line Methodology

Train model QueryGround

truth

Time, user interactions

Test

| 33

Off-line evaluation - Mendeley

From Hristakeva, M et. al

| 34

Science Direct – Item-to-item

| 35

• Infrastructure takes a long time to build

• Need feedback from users to learn

1. Generate recommendations off-line

2. Send to users via email (A/A)

3. Modify method based on feedback

4. Send second set of users split into A/B buckets

Static Recommendations for quick learnings

Email to users

Modify Recommender

Email to users

| 36

Future Direction

| 37

Learning to rank (LtR)

Currently only using implicit feedback

No content used

Use CF as candidate selection

Re-rank results based on learnt model

optimised for CtR

Use item and user features

| 38

Deep Learning

Use to learn more complex features

Use as features in LtR

Build on the existing framework developed

Use pre-trained models before developing own

| 39

Conclusion (Take Homes)

• Log EVERYTHING

• Start Simple

• Iterate quickly

• Get recommendations out quickly to learn

• Don’t look stupid

• CTR ≇ Off-line Evaluation

| 40

www.elsevier.com/rd-solutions

Thank you,

Book chapter being written based on the content in this presentation

| 41

References

Hristakeva, M., Kershaw, D., Rossetti, M., Knoth, P., Pettit, B., Vargas, S., & Jack, K. (2017). Building recommender systems for scholarly information. the 1st Workshop (pp. 25–32). New York, New York, USA: ACM. http://doi.org/10.1145/3057148.3057152

Rossetti, M., Stella, F., & Zanker, M. (2016). Contrasting Offline and Online Results when Evaluating Recommendation Algorithms (pp. 31–34). Presented at the Proceedings of the 10th ACM Conference on Recommender Systems, New York, NY, USA: ACM. http://doi.org/10.1145/2959100.2959176

Lee, P., Lakshmanan, L. V. S., Tiwari, M., & Shah, S. (2014). Modeling impression discounting in large-scale recommender systems (pp. 1837–1846). Presented at the Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, New York, USA: ACM Press. http://doi.org/10.1145/2623330.2623356

Koren, Y. (2010). Collaborative filtering with temporal dynamics. Communications of the ACM, 53(4), 89–97. http://doi.org/10.1145/1721654.1721677

http://doi.org/10.1145/3057148.3057152

http://doi.org/10.1145/2959100.2959176

http://doi.org/10.1145/2623330.2623356

http://doi.org/10.1145/1721654.1721677

Building Recommender Systems - Mendeley and Science Direct

Technology