Top Banner
Spatio-temporal linkage of real and virtual identity Muhammad Adnan (and Paul Longley) University College London
47

Spatio-temporal linkage of real and virtual identity

Nov 02, 2014

Download

Education

Muhammad Adnan

This presentation outlines the initial work explaining the linkage of identities in the real and virtual worlds worlds.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Spatio-temporal linkage of real and virtual identity

Spatio-temporal linkage of real and virtual identity

Muhammad Adnan (and Paul Longley)University College London

Page 2: Spatio-temporal linkage of real and virtual identity

Geodemographics

• “Analysis of people by where they live [places]”(Sleight, 1993:3)

• Social similarity, not locational proximity

HomeAddressPerson

Area

Page 3: Spatio-temporal linkage of real and virtual identity
Page 4: Spatio-temporal linkage of real and virtual identity

Identity of individuals in the real world

• Name (Forename & Surname)

• Surnames have geographic concentrations

• Prospects for linkage with socio-economic data

• E.g. Analysing the socio-economic circumstances of different ethnic groups

Page 5: Spatio-temporal linkage of real and virtual identity

An example – gbnames.publicprofiler.org

Longley Cheshire

Page 6: Spatio-temporal linkage of real and virtual identity

An example – Output Area Classification

Kingston upon Hull Hereford

Page 7: Spatio-temporal linkage of real and virtual identity

A socio-economic and ethnic classification

Page 8: Spatio-temporal linkage of real and virtual identity

A socio-economic and ethnic classification

Page 9: Spatio-temporal linkage of real and virtual identity
Page 10: Spatio-temporal linkage of real and virtual identity

Wu

Page 11: Spatio-temporal linkage of real and virtual identity

Source: Cheshire and Longley (2011)

Page 12: Spatio-temporal linkage of real and virtual identity

12

Courtesy: James Cheshire

Page 13: Spatio-temporal linkage of real and virtual identity

Wordle.net

Page 14: Spatio-temporal linkage of real and virtual identity

The European scale

16 countries.

400 million people.

5.95 million unique surnames

Courtesy: James Cheshire

Page 15: Spatio-temporal linkage of real and virtual identity

Onomap classification

Surnames

UK Electoral Roll

Forenames

Pablo Mateos

Garcia

Pérez

...Juan

Rosa

Marta

...

Sánchez

Rodríguez

...– Several iterations until self-contained cluster is exhausted– Cluster assigned a cultural, ethnic & linguistic Onomap type– Probability of ethnicity assigned to each name

Mateos et al (2007) CASA Working Paper 116

Forename-Surname clustering (based on Hanks and Tucker, 2000)

Page 16: Spatio-temporal linkage of real and virtual identity

WorldNames CEL clusters

Source: Mateos et al (2011)

Page 17: Spatio-temporal linkage of real and virtual identity
Page 18: Spatio-temporal linkage of real and virtual identity
Page 19: Spatio-temporal linkage of real and virtual identity

Uncertainty and virtual identity

• Identity increasingly shaped by online activities– => value may be leveraged from the fusion of physical

and virtual data sources• Data fusion and generalisation to relate physical

and virtual properties• Use of residence alongside activity patterns and

social network information

Page 20: Spatio-temporal linkage of real and virtual identity

Most of us have virtual identities

• Email address; social media accounts

• People use different procedures and providers to establish virtual identities

• Harvesting these data has interesting potential applications• Cyber crime• Cyber geodemographics (Facebook has already started

this)

Page 21: Spatio-temporal linkage of real and virtual identity

Most of us have virtual identities

• Facebook data mining engine• Analyses the words you use and tailors advertisement

accordingly

Page 22: Spatio-temporal linkage of real and virtual identity

Starting Point

http://worldnames.publicprofiler.org

• Worldnames holds data for approximately 1 billion population around 28 countries of the world

• Approximately 1.6 million unique users have visited the website since 2008

Page 23: Spatio-temporal linkage of real and virtual identity

Starting Point

http://worldnames.publicprofiler.org

• Worldnames has been archiving ‘Surname search’, ‘Email Address’, ‘Gender’, and ‘IP Address’ for searches over the past 6 months• c. 175,000 records: email validation• 150,000 usable ‘IP Address’ entries

Page 24: Spatio-temporal linkage of real and virtual identity

IP Address to Latitude/Longitude conversion

http://quova.com

An API to convert “IP addresses” to their corresponding latitude / longitude values

Page 25: Spatio-temporal linkage of real and virtual identity

IP Address to Latitude/Longitude conversion

http://quova.com

A search for an IP Address in UCL (128.40.214.196)

Page 26: Spatio-temporal linkage of real and virtual identity

Top CountriesWebsite was searched from 155 countries over the past

6 months

UNITED STATES

UNITED KIN

GDOM

CANADA

GERMANYITALY

AUSTRALIA

BRAZIL

FRANCE

ARGENTINA

SPAIN

NEW ZEALAND

NETHERLANDS

GREECE

SWITZERLAND

BELGIU

M

POLAND

AUSTRIA

MEXICO

IRELA

ND

SWEDEN0

10000

20000

30000

40000

50000

60000

70000

80000

90000

UNITED STATES 76708UNITED KINGDOM 21892CANADA 8154GERMANY 7158ITALY 4058AUSTRALIA 2978BRAZIL 2440FRANCE 2028ARGENTINA 1958SPAIN 1830NEW ZEALAND 1236NETHERLANDS 1074GREECE 1040SWITZERLAND 992BELGIUM 940POLAND 880AUSTRIA 874MEXICO 834IRELAND 710SWEDEN 630

Page 27: Spatio-temporal linkage of real and virtual identity

UK and Ireland

Page 28: Spatio-temporal linkage of real and virtual identity

Europe

Page 29: Spatio-temporal linkage of real and virtual identity

North America

Page 30: Spatio-temporal linkage of real and virtual identity

South America

Page 31: Spatio-temporal linkage of real and virtual identity

India, China, Japan, Singapore

Page 32: Spatio-temporal linkage of real and virtual identity

Popular Surname Searches

SMITH

JONES

JOHNSON

ANDERSON

WILLIA

MS

MILLER

MARTIN

WILSON

BROWN

MOORE

THOMAS

TAYLOR

CLARK

LEE

ROBERTS

DAVIS

CAMPBELL

LEWIS

HARRIS

MITCHELL0

100

200

300

400

500

600

700

800

SMITH 708JONES 306JOHNSON 258ANDERSON 224WILLIAMS 222MILLER 218MARTIN 202WILSON 194BROWN 194MOORE 188THOMAS 178TAYLOR 170CLARK 164LEE 160ROBERTS 156DAVIS 152CAMPBELL 144LEWIS 138HARRIS 138MITCHELL 136

Page 33: Spatio-temporal linkage of real and virtual identity

Popular Email Domains

GMAIL.COM

HOTMAIL.COM

YAHOO.COM

AOL.COM

COMCAST.NET

HOTMAIL.CO.U

K

MSN.COM

WEB.DE

YAHOO.CO.U

K

GMX.DE

SBCGLOBAL.N

ET

BTINTERNET.C

OM

HOTMAIL.IT

VERIZON.NET

GOOGLEMAIL.

COM

LIVE.C

OM

COX.NET

ATT.NET

MAILINATOR.C

OM

LIBERO.IT

0

5000

10000

15000

20000

25000

30000

35000

GMAIL.COM 31842HOTMAIL.COM 22098YAHOO.COM 15542AOL.COM 5550COMCAST.NET 2696HOTMAIL.CO.UK 1948MSN.COM 1624WEB.DE 1522YAHOO.CO.UK 1290GMX.DE 1260SBCGLOBAL.NET 1246BTINTERNET.COM 860HOTMAIL.IT 844VERIZON.NET 798GOOGLEMAIL.COM 742LIVE.COM 742COX.NET 708ATT.NET 632MAILINATOR.COM 616LIBERO.IT 616

Page 34: Spatio-temporal linkage of real and virtual identity

Popular Email Domains by Surnames

Smith (English)GMAIL.COMYAHOO.COMHOTMAIL.COMAOL.COMMAILINATOR.COM

Jones (Welsh)GMAIL.COMHOTMAIL.COMYAHOO.COMCOMCAST.NETGOOGLEMAIL.COM

Johnson (English)GMAIL.COMHOTMAIL.COMYAHOO.COMMSN.COMVERIZON.NET

Perez (Spanish) Gupta (Indian)GMAIL.COMHOTMAIL.COMYAHOO.COMGOOGLAMAIL.COMINDIATIMES.COM

Meyer (German)

GMAIL.COMHOTMAIL.COMYAHOO.ESCHARTER.NETGRANDECOM.NET

GMAIL.COMHOTMAIL.COMYAHOO.COMAOL.COMGMX.DE

Page 35: Spatio-temporal linkage of real and virtual identity

Popular Email Domains by Country

UK USA France

Germany Brazil JapanYAHOO.COMYAHOO.CO.JPGMAIL.COMHOTMAIL.COMMSN.COM

GMAIL.COMYAHOO.COMHOTMAIL.COMAOL.COMCOMCAST.NET

HOTMAIL.FRGMAIL.COMHOTMAIL.COMYAHOO.FRLAPOSTE.NET

GMAIL.COMHOTMAIL.COMHOTMAIL.CO.UKYAHOO.CO.UKYAHOO.COM

WEB.DEGMX.DET-ONLINE.DEYAHOO.DEGMAIL.COM

HOTMAIL.COMGMAIL.COMYAHOO.COM.BRIG.COM.BRBOL.COM.BR

Page 36: Spatio-temporal linkage of real and virtual identity

Top GoogleMail.com users

BINDERWATKINSWHITEWOODSROBINSONSLEEMANBENNETTRITCHIESHARPROLLINGS

Top Surnames

Page 37: Spatio-temporal linkage of real and virtual identity

GoogleMail.com users• Surname ‘Binder’

Germany Switzerland

Page 38: Spatio-temporal linkage of real and virtual identity

GoogleMail.com users• Surname ‘Binder’

Germany Switzerland

Page 39: Spatio-temporal linkage of real and virtual identity

GoogleMail.com users• Surname ‘Blackbourn’

New Zealand

Page 40: Spatio-temporal linkage of real and virtual identity

Who use their surnames as part of their email address• Approximately 40% of the users have their surname

as part of their email address• [email protected] (Surname: Harper)• [email protected] (Surname: Kempe)

• Top Countries

SOUTH AFRIC

A

SLOVENIA

UNITED KIN

GDOM

IRELA

NDIN

DIA

MALAYSIA

PORTUGAL

GERMANY

COSTA RIC

A

AUSTRIA

LUXEMBOURG

BELGIU

M

CANADA

NEW ZEALAND

AUSTRALIA

CHINA

TURKEY

CROATIA

SWITZERLAND

UNITED STATES

05

101520253035404550

Page 41: Spatio-temporal linkage of real and virtual identity

Who use long email addresses ? • Grand mean average email length of 8 characters

• Number of characters on the left side of ‘@’• United Kingdom, USA, Canada, and other European countries

• People from South American countries and India have long email addresses (Average length: 13 characters)

• South Indians have longer email address than North Indians

BRAZIL [email protected] (14 characters)CHILE [email protected] (25 characters)URUGUAY [email protected] (17 characters)INDIA [email protected] (18 characters)ARGENTINA [email protected] (13 characters)

Page 45: Spatio-temporal linkage of real and virtual identity

• There are some interesting patterns found in the study of email addresses• some problems (accuracy of geocoding techniques)

• Prospect of data linkage of data coded to unit postcode level• cluster analysis and data mining techniques

• Future work may involve the data mining of Facebook and Twitter data• issues of generalisation

• Visualisation of the data

Conclusion and future work

Page 46: Spatio-temporal linkage of real and virtual identity

Any Questions ?

Thanks for Listening

Page 47: Spatio-temporal linkage of real and virtual identity

A research agenda

1 Acquire relevant real and virtual data sources and devise DBMS2 Devise GB-wide classification of NICT usage at neighbourhood scale3 Devise GB-wide classification of social network traffic4 Develop enhanced worldnames site to harvest real and virtual user data5 Undertake text analysis of worldnames user data and use to link

classifications (2) and (3)6 Devise, implement and analyse social networking application and

cybergeodemographic classification