Machine Learning for Search at LinkedIn

Recruiting SolutionsRecruiting SolutionsRecruiting Solutions

Machine Learning for Search @

Viet Ha-ThucSearch Quality - LinkedIn

1

Viet Ha-Thuc

2

• 200+ countries and territories

• 2+ new members per second

3

● Dual Roles of Search○ Enable talent discover opportunity○ Help companies to search for the right talent

4

FLAGSHIP SEARCH

RECRUITER SEARCH

SALES NAVIGATOR

Unique Nature of LinkedIn Search

▪Heterogeneous sourcesPeople, jobs, companies, slideshares, members’ posts, groups

▪Scale

▪Deep Personalization

▪Support many use-casesHiring, connecting, job seeking, research, sales, etc.

5

Overview

6

Query

Federated SearchSpell CorrectionQuery Tagging

People Companies

Federated SearchBlending

Name Title Skill

Jobs

Overview

7

Query

Federated SearchSpell CorrectionQuery Tagging

People Companies

Federated SearchBlending

Name Title Skill

Jobs

Agenda

▪Introduction

▪Vertical Ranking–People Search by Skills [BigData’15,SIGIR’16]–Job Search [KDD’16]

▪Federation [CIKM’15]

▪Lessons 8

Introduction

▪Skills– 40K+ standardized skills– Members get endorsed on

skills– Represent professional

expertise

9

Introduction▪Unique challenges to LinkedIn expertise Search

– Scale: 400M members x 40K standardized skills

– Sparsity of skills in profiles

– Personalization

10

…

ReputationInformation a decision maker uses to make a

judgment on an entity with a record (*)

11

(*) “Building web reputation systems”, Glass and Farmer, 2010

Skill Reputation Scores [BigData’15]

12

▪Decision Maker: searcher

▪Record: Professional career

▪Skill reputation: member expertise on a skill

▪Judgment: Hire?

Estimating Skill Reputation

13

Endorse profile

browsemap

? .85 .45? ? .35

? .42 ?

? ? .05Mem

bers

Skills

P(expert| member, skill)

Supervised Learning algorithm


14

Endorse profile

browsemap

? .85 .45

? ? .35

? .42 ?

? ? .05Mem

bers

Skills0.5 1

0.7 0

0 0.6

0.1 0

0.2 0.3 0.5

0.5 0.7 0.2

Mem

bers

Skills

Each row is a representation of a member in latent space

Each column represents a skill in

latent space

Matrix Factorization


15

Endorse profile

browsemap

? .85 .45

? ? .35

? .42 ?

.02 ? ?Mem

bers

Skills0.5 1

0.7 0

0 0.6

0.1 0

0.2 0.3 0.5

0.5 0.7 0.2

Mem

bers

Skills

.6 .85 .45

.14 .21 .35

.3 .42 .12

.02 .03 .05Mem

bers

SkillsFill in unknown cells in

the original matrix

Features▪Reputation feature

▪Social Connection

▪Homophily– Geo– Industry

▪Textual Features

16

Learning to Rank

▪Listwise– Consider relevance is relative to every query– Allow optimizing quality metric directly

▪Objective function– Normalized Discounted Cumulative Gain (NDCG@K)– Graded relevance labels

17

Labeling Strategy

18

▪Logs + Top-K randomization

Uncertain (removed)

Bad: label = 0

Good: label = 1click

InMail Perfect: label = 3

Experiments

CTR@10 # Messages per Search

Flagship +11% +20%

Premium +18% +37%

19

▪Query Tagging

▪Target Segment: skill and no-name▪ Baseline

– No skill reputation feature– Hand-tuned

Agenda

▪Introduction

▪Vertical Ranking–People Search by Skills [BigData’15, SIGIR’16]–Job Search [KDD’16]


▪Lessons 20

Challenges of Job Search

▪“Hidden” structures

▪Query only represents a small fraction of information need–“San Francisco”, “software engineer”, “java”“Hidden” structures

▪Job attractiveness varies on many aspects–“Hot” titles: “data scientist”–Top companies: Google, Facebook, etc. –Trending skills: machine learning, big data, etc.,–Location

21

Entity-Aware Matching

22

Expertise Homophily

▪“Classic” homophily in social networks–People tend to interact with similar ones

▪Expertise homophily in job search–Searcher tends to apply for jobs with similar expertise–Apply rate of job results with overlapping skills is 2x higher

▪Expertise: skill reputation scores

23

Entity-faceted CTRs

▪Job attractiveness– Historical CTRs for individual jobs

– Challenge: job lifetime is short -> unreliable estimation

▪Entity-faceted historical CTRs– CTRs of jobs with standardized tile “data scientist”– CTRs of jobs from company IBM – CTRs of jobs requiring trending skill: machine learning, big data, etc.

▪Advantages– Alleviate data sparseness by grouping jobs by facets– Resolve cold start problem

24

Experiment Results

▪Baseline▪All of the existing features except entity-aware ones▪Machine learned▪Optimized for the same objective function

25

CTR Apply RateImprovement +11.3% +5.3%

Agenda

▪Introduction

▪Vertical Ranking–People Search by Skills [BigData’15, SIGIR’16]–Job Search [KDD’16]


▪Lessons 26

Personalized Blending

Personalized Blending▪ Why do we need this?

– Not to overwhelm the user with too much information – Make results personally relevant

Blending Flow

Learning Model▪ Training data: click logs▪ Features

– Relevance scores from base rankers– Searcher intent– Query intent– Prior scores

Calibrate Scores across Verticals▪ Relevance scores from vertical rankers are incomparable

Calibrate Scores across Verticals▪ Relevance scores from vertical rankers are incomparable▪ Construct composite features

People relevance score of searcher if result is People

f 1= ⎨0, otherwise

Searcher IntentSearcher’s job seeking intent if result is job vertical cluster

Searcher’s job seeking intent if result is individual job

Searcher’s recruiting intent if result is people vertical cluster

Searcher’s recruiting intent if result is individual people ...

Take-Aways▪Text match is still important but not enough

▪Advanced features based on semi-structured data

–People search: skill reputation scores–Job Search: expertise homophily

▪Personalized Learning-to-Rank is crucial

34

35

Email: [email protected]

References

▪“Personalized Expertise Search at LinkedIn”, Ha-Thuc, Venkataraman, Rodriguez, Sinha, Sundaram and Guo, BigData, 2015▪“Personalized Federated Search at LinkedIn”, Arya, Ha-Thuc and Sinha, CIKM, 2015▪“Learning to Rank Personalized Search Results in Professional Networks”, Ha-Thuc and Sinha, SIGIR, 2016▪“How to Get Them a Dream Job?”, Li, Arya, Ha-Thuc, Sinha, KDD, 2016

36

Machine Learning for Search at LinkedIn

Internet