Top Banner
Jobmash Job searching without the pain Marianne Hoogeveen
15

Marianne hoogeveen demo1

Apr 16, 2017

Download

Data & Analytics

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Marianne hoogeveen demo1

Jobmash

Job searching without the pain

Marianne Hoogeveen

Page 2: Marianne hoogeveen demo1

Searching for data science jobs is tiring and depressing

‘Data Scientist’ Ranges from Excel pusher to software engineer

! MANY jobs, but how to find the right kind?

! What IS the right kind (for me)?

Page 3: Marianne hoogeveen demo1

Want “more like this” feature

MORE LIKE THIS, PLEASE!

Page 4: Marianne hoogeveen demo1

Data: 11000 job postings from Indeed

Specific data challenges:Buzzwords (“world-class”, “driven”, “exciting”, “mission”, “opportunity”)

Difficult to validate: what is ground truth?

Look for “Data Scientist”, “Data Engineer”, “Data Analyst” job descriptions

US Wide

Page 5: Marianne hoogeveen demo1

Compute text similarity

Remove low-information words (stop words, dates, locations, numbers, common verbs, …)

Count occurrence of words (and bigrams), weighted negatively if they are common in the corpus of all documents (TF-IDF)

Compare similarity between these weighted TF-IDF vectors using cosine similarity

Page 6: Marianne hoogeveen demo1

What are the job titles of top-100 most similar job postings?

Search term: Data Scientist Search term: Data EngineerSearch term: Data Analyst

DA DA DS DEDSDS DE DEDAother other other

Using cosine similarity on TF-IDF vectors, after removing buzzwords; find 100 most similar and compare job titles

Page 7: Marianne hoogeveen demo1

Which job titles can we predict from job description?

Removing buzzwords

False positive rate0.0 1.00.4 0.6 0.80.2

1.0 1.0

False positive rate0.0 1.00.4 0.6 0.80.2

Keeping buzzwords

True

pos

itive

rate

True

pos

itive

rate

(“world-class”, “exciting”, “mission”, “opportunity”)

Page 8: Marianne hoogeveen demo1

Let’s have a look!

JOBMASH APP

Page 9: Marianne hoogeveen demo1

Why is this useful?

Less irrelevant results

Don’t miss similar jobs that don’t have the right job title

Page 10: Marianne hoogeveen demo1

About me

PhD Theoretical Physics, King’s College London

Data Science Internship at Cytora Ltd:

Recognising street addresses in newspaper articles

Page 11: Marianne hoogeveen demo1

Extra slides

Page 12: Marianne hoogeveen demo1

Validation

200 random docs

Compare with same docs cut in half

Compute pairwise cosine similarities

0

0.2

0.4

0.6

0.8

1

Orig

inal

doc

umen

ts s

orte

d by

ID

Half documents sorted by original’s ID

Page 13: Marianne hoogeveen demo1

Topic modellingSalient terms:

ClientMachineModelSystemStatusFinancialResearchRiskAnalysisStatisticalEngineerEmploymentSupportBig

Page 14: Marianne hoogeveen demo1

LDA topics

e.g. “analysis”, “report”, “statistical” , “excel”, “sas”, “insight”

e.g. “big”, “technology”, “system” , “engineer”, “design”, “build”, “hadoop”, “platform”

Page 15: Marianne hoogeveen demo1

Other LDA topics

e.g. “status”, “disability”, “gender” e.g. “benefit”, “dental”, “pay”, “medical”, “401k”