Determining a digital profile from public social media information.

Post on 29-Nov-2014

54 Views

Category:

Social Media

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation on master's thesis "Determining a digital profile from public social media information."

Transcript

DETERMINING A DIGITAL PROFILE FROM PUBLIC SOCIAL MEDIA INFORMATIONDepartment of Informatics, School of Informatics and Engineering2013/14

KAROLINA STAMBLEWSKA

B00075232

Outline Motivation Review of existing tools Data harvesting Data harvesting with Selenium 2.0 Resolving e-mail address Demo Results

Motivation

Q2 201 4 surveys conducted by Ipsos MRBI with 1000 respondents aged 15

Motivation CPL “Employment Market

monitor report” indicates that 60% of them using social media sources for pre-screening or for background digital footprinting job applicants’ prior to employment (CPL, Q3, 2013)

Options Regularly Sometimes Never Do not

approve

of this

Google 13% 26% 53% 8%

LinkedIn 30% 41% 24% 5%

Facebook 9% 22% 59% 10%

Google+ 3% 7% 81% 9%

Other 10% 4% 77% 9%

Motivation

jobseekers

lying or exag-gerating

Fasle claim to speak an-tother language

inflating IT skills

Global-Lingo.com surveys UK jobseekers market in Q1 2014

63% of jobseekers admitted to lying on their CV !!!!!!

Motivation – “resume-less”

“Having an ability to showcase and validate a candidate’s work through a social graph (Twitter, About.me, Facebook, Slideshare, Google+, forums, etc.), search engine footprint (special URL references to projects, linkbacks, publications, etc.), network connections is much more powerful than just 1 – 2 pages and 3 prepped references. The prospective employer now has an ability to fully evaluate a candidate and understand if they are a fit or not based on actual work, not just 2 pages of crafty wording.”

#socialCV Mr. Vala Afshar (Chief Customer Officer @Enterasys)

Motivation – GitHub “Forget LinkedIn: Companies turn to GitHub to find tech talent” – CNET.COM

In the red-hot market for skilled software engineers, companies looking to make great hires are discovering that relying on traditional services that showcase candidates' work histories -- but not their actual work -- is a great way to miss out on the best available talent.

GitHub, a place where hiring managers and recruiters alike are increasingly turning to find not just the potential employees who look best on paper, but the ones that actively (and publicly) demonstrate their capabilities.

Review of existing tools

Data harvesting

Website Application Programming Interface (also called Web API ) – provides client with interface query over website provider database via HTTP request messages. In result client gets data output in XML or JSON.

Web scraping – software based technique, which transform the unstructured data on the web (typically HTML), into structured data that can be stored and analysed

Web harvesting - caveats

Web 2.0 - highly driven on AJAX and dynamically populate HTML depends on user’s preferences and various conditions

Basic python libraries don’t catch all source code, as object may be hidden or event driven

Usually secure with SSL/TLS

Selenium 2.0 has capability like native webdriver imitate the functionality of Android,

Firefox, Google Chrome, Internet Explorer, Safari, Opera and event JavaScript HtmlUnit framework Phantomjs

perfect for dynamic populated elements

allows selecting elements via various html attributes from tag name, id to Xpath and even CSS selector

Caveats Solution

Web harvestingFacebook Friends List example

Web harvestingFacebook Friends List example

Resolving e-mail address  Network Method Extra Details

1 Facebook Direct  

2 Twitter Gmail  

3 SlideShare.net Direct Advance search, by user

4 Academia.edu Gmail  

5 Github Semi-Direct Specific query over local-part of e-mail

address

6 LinkedIn Gmail Caveats

1) not resolving e-mail address until,

user send invitation to e-mail

address owner

2) caching previous search queries and

suggest them in next query round

Resolving e-mail addressFacebook Search Engine

Find/Invite Friends

Resolving e-mail address Academia.edu Twitter

Resolving e-mail address GitHub SlideShare

Resolving e-mail addressLinkedIn

SlideShare

Resolving e-mail addressLinkedIn

SlideShare

Resolving e-mail addressLinkedIn

SlideShare

Demo

Results

Overview of performance test of three open source search engine vs. implemented prototype (ScrapYA).

grav

atar

twitt

er

face

book

gith

ub

stum

bleu

pon

vim

eo

Yout

ube

pica

ssa

pint

rest

klou

t

four

squa

re

amaz

oneb

ay

aol l

ives

tream

soun

dClo

ud

inst

agra

m g+ho

me

slide

shar

e

abou

t.me

linke

din

acad

emia

.edu

0

1

2

3

4

5

6

7

8

9

people smartpiplspokeoscrapYA

Results

Refined result of search test to the Social Media platforms implemented by the prototype (ScrapYA).

twitterfacebook

githubslideshare

academia.edu

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

scrapYAspokeopiplpeople smart

Karolina Stamblewska

B00075232

Determining a digital profile from public social media information

Click icon to add picture

top related