Presented By : Ranjith Kumar Bodla 1 Authors: Deepak Agarwal Bee-Chung Chen Liang Zhang et al Activity Ranking in LinkedIn Feed
1
Presented By : Ranjith Kumar Bodla
Authors: Deepak Agarwal
Bee-Chung Chen
Liang Zhang et al
Activity Ranking in LinkedIn Feed
2 LinkedIn Feed Straw man approaches Activities on LinkedIn Feed Freshness System Architecture Model and Features Desktop Bucket Test Results
Outline
3 Professional network Heterogeneous updates
More than 40 types. Shared articles, job changes
connection updates etc.
Challenges Large scale (313M+ members) Relevance & Personalization Freshness, diversity, user
fatigue
How do we rank activities?
LinkedIn Feed
4 Reverse chronological
ranking(recency) Fresh but not relevant
Ranking by social popularity Likes, a useful signal CTR not monotonically
related Not all activities have
likes
Straw man approaches
5 Taxonomy
Each activity represented as a triple (actor type, verb type, object type)
Connection : (member, connect, member) Opinion: (member, like, article)
What happens if we simply rank by CTR?
Activities on LinkedIn Feed
6 Connection activities
Symmetric connection (Member connection) Asymmetric connection (Following)
Informational activities (Messages, Articles, Pictures) Profile activities (Profile pictures, job position, contact
info) Opinion activities (like and comment) Site-specific activities
Unique about LinkedIn - job anniversaries, endorsements of users’ skills and job recommendations.
Activities on LinkedIn Feed
7
Taxonomy of Activities on LinkedIn Feed
8
Two aspects of freshness - First, the age of an activity since its
creation. Second, the number of times a user has
seen a particular item in the past. Differ significantly for different users.
For example, a heavy user might see an activity several times within the first hour of its lifetime.
Freshness
9
LinkedIn are connected via a professional network, we can study how the connection relationship between a viewer and an actor affects whether the viewer would click on an activity of the actor.
Demographic similarities based on age, gender and education, etc.
Experience similarities based on the common companies, job positions, etc.
Skill similarities based on the common skills and similar skills.
Geo similarities based on the locations of the viewer and the actor at different resolutions (e.g., city, state and country).
Social network similarities based on connections of the viewer and the actor.
Connection Relationship
10
11 Training data collection
Requires randomization
Personalization features E.g., viewer type affinity, viewer actor affinity
Large scale logistic regression via Alternating Direction Method of Mulitpliers (ADMM) Scalable, distributed algorithm
Offline evaluation Unbiased estimation via replay
Relevance via CTR prediction
12
System Architecture
13 A score is required to rank the activities in the
candidate pool. To predict the CTR of each activity, learn a logistic
regression model. y denote the click on the activity described by
feature vector x then the corresponding Bernoulli random variable Y is modeled as
θ is the parameter vector which we want to learn.
Model and Features
14
Large Scale Logistic Regression Via ADMM
15
Large Scale Logistic Regression Via ADMM
16
Large Scale Logistic Regression Via ADMM
This approach typically require greedily
picking the next activity based on the current set of items.
It is also important to keep the feed fresh. we perform an additional exponential decay on
the score based on the age of activities. Repeated impressions of the same activity is
also discouraged by introducing an additional decay factor that takes into account past impression counts of the activity.
17Reranking
18 Effect of freshness. Impression Discounting. Model Personalization. Combining All The Features.
Desktop Bucket Test Results
19
Good ranking model are found by continuously testing of different models with different features.
The models learned from data may not have all the characteristics we desire
Different platforms (such as mobile vs desktop) have different characteristics.
Personalizing our models for user’s tastes and behavior often gives large improvements in performance.
Conclusion
20
Questions ?