Recruiting Solutions Machine Learning for Search @ Viet Ha-Thuc Search Quality - LinkedIn 1 Viet Ha- Thuc
Recruiting SolutionsRecruiting SolutionsRecruiting Solutions
Machine Learning for Search @
Viet Ha-ThucSearch Quality - LinkedIn
1
Viet Ha-Thuc
2
• 200+ countries and territories
• 2+ new members per second
3
● Dual Roles of Search○ Enable talent discover opportunity○ Help companies to search for the right talent
4
FLAGSHIP SEARCH
RECRUITER SEARCH
SALES NAVIGATOR
Unique Nature of LinkedIn Search
▪Heterogeneous sourcesPeople, jobs, companies, slideshares, members’ posts, groups
▪Scale
▪Deep Personalization
▪Support many use-casesHiring, connecting, job seeking, research, sales, etc.
5
Overview
6
Query
Federated SearchSpell CorrectionQuery Tagging
People Companies
Federated SearchBlending
Name Title Skill
Jobs
Overview
7
Query
Federated SearchSpell CorrectionQuery Tagging
People Companies
Federated SearchBlending
Name Title Skill
Jobs
Agenda
▪Introduction
▪Vertical Ranking–People Search by Skills [BigData’15,SIGIR’16]–Job Search [KDD’16]
▪Federation [CIKM’15]
▪Lessons 8
Introduction
▪Skills– 40K+ standardized skills– Members get endorsed on
skills– Represent professional
expertise
9
Introduction▪Unique challenges to LinkedIn expertise Search
– Scale: 400M members x 40K standardized skills
– Sparsity of skills in profiles
– Personalization
10
…
ReputationInformation a decision maker uses to make a
judgment on an entity with a record (*)
11
(*) “Building web reputation systems”, Glass and Farmer, 2010
Skill Reputation Scores [BigData’15]
12
▪Decision Maker: searcher
▪Record: Professional career
▪Skill reputation: member expertise on a skill
▪Judgment: Hire?
Estimating Skill Reputation
13
Endorse profile
browsemap
? .85 .45? ? .35
? .42 ?
? ? .05Mem
bers
Skills
P(expert| member, skill)
Supervised Learning algorithm
Estimating Skill Reputation
14
Endorse profile
browsemap
? .85 .45
? ? .35
? .42 ?
? ? .05Mem
bers
Skills0.5 1
0.7 0
0 0.6
0.1 0
0.2 0.3 0.5
0.5 0.7 0.2
Mem
bers
Skills
Each row is a representation of a member in latent space
Each column represents a skill in
latent space
Matrix Factorization
Estimating Skill Reputation
15
Endorse profile
browsemap
? .85 .45
? ? .35
? .42 ?
.02 ? ?Mem
bers
Skills0.5 1
0.7 0
0 0.6
0.1 0
0.2 0.3 0.5
0.5 0.7 0.2
Mem
bers
Skills
.6 .85 .45
.14 .21 .35
.3 .42 .12
.02 .03 .05Mem
bers
SkillsFill in unknown cells in
the original matrix
Features▪Reputation feature
▪Social Connection
▪Homophily– Geo– Industry
▪Textual Features
16
Learning to Rank
▪Listwise– Consider relevance is relative to every query– Allow optimizing quality metric directly
▪Objective function– Normalized Discounted Cumulative Gain (NDCG@K)– Graded relevance labels
17
Labeling Strategy
18
▪Logs + Top-K randomization
Uncertain (removed)
Bad: label = 0
Good: label = 1click
InMail Perfect: label = 3
Experiments
CTR@10 # Messages per Search
Flagship +11% +20%
Premium +18% +37%
19
▪Query Tagging
▪Target Segment: skill and no-name▪ Baseline
– No skill reputation feature– Hand-tuned
Agenda
▪Introduction
▪Vertical Ranking–People Search by Skills [BigData’15, SIGIR’16]–Job Search [KDD’16]
▪Federation [CIKM’15]
▪Lessons 20
Challenges of Job Search
▪“Hidden” structures
▪Query only represents a small fraction of information need–“San Francisco”, “software engineer”, “java”“Hidden” structures
▪Job attractiveness varies on many aspects–“Hot” titles: “data scientist”–Top companies: Google, Facebook, etc. –Trending skills: machine learning, big data, etc.,–Location
21
Entity-Aware Matching
22
Expertise Homophily
▪“Classic” homophily in social networks–People tend to interact with similar ones
▪Expertise homophily in job search–Searcher tends to apply for jobs with similar expertise–Apply rate of job results with overlapping skills is 2x higher
▪Expertise: skill reputation scores
23
Entity-faceted CTRs
▪Job attractiveness– Historical CTRs for individual jobs
– Challenge: job lifetime is short -> unreliable estimation
▪Entity-faceted historical CTRs– CTRs of jobs with standardized tile “data scientist”– CTRs of jobs from company IBM – CTRs of jobs requiring trending skill: machine learning, big data, etc.
▪Advantages– Alleviate data sparseness by grouping jobs by facets– Resolve cold start problem
24
Experiment Results
▪Baseline▪All of the existing features except entity-aware ones▪Machine learned▪Optimized for the same objective function
25
CTR Apply RateImprovement +11.3% +5.3%
Agenda
▪Introduction
▪Vertical Ranking–People Search by Skills [BigData’15, SIGIR’16]–Job Search [KDD’16]
▪Federation [CIKM’15]
▪Lessons 26
Personalized Blending
Personalized Blending▪ Why do we need this?
– Not to overwhelm the user with too much information – Make results personally relevant
Blending Flow
Learning Model▪ Training data: click logs▪ Features
– Relevance scores from base rankers– Searcher intent– Query intent– Prior scores
Calibrate Scores across Verticals▪ Relevance scores from vertical rankers are incomparable
Calibrate Scores across Verticals▪ Relevance scores from vertical rankers are incomparable▪ Construct composite features
People relevance score of searcher if result is People
f 1= ⎨0, otherwise
Searcher IntentSearcher’s job seeking intent if result is job vertical cluster
Searcher’s job seeking intent if result is individual job
Searcher’s recruiting intent if result is people vertical cluster
Searcher’s recruiting intent if result is individual people ...
Take-Aways▪Text match is still important but not enough
▪Advanced features based on semi-structured data
–People search: skill reputation scores–Job Search: expertise homophily
▪Personalized Learning-to-Rank is crucial
34
35
Email: [email protected]
References
▪“Personalized Expertise Search at LinkedIn”, Ha-Thuc, Venkataraman, Rodriguez, Sinha, Sundaram and Guo, BigData, 2015▪“Personalized Federated Search at LinkedIn”, Arya, Ha-Thuc and Sinha, CIKM, 2015▪“Learning to Rank Personalized Search Results in Professional Networks”, Ha-Thuc and Sinha, SIGIR, 2016▪“How to Get Them a Dream Job?”, Li, Arya, Ha-Thuc, Sinha, KDD, 2016
36