WHAT’S IN A NAME? - IDENTITY & CULTURE - GENDER (LE SEXE) NamSor Applied Onomastics 1 2014-06-26
WHAT’S IN A NAME?
- IDENTITY & CULTURE
- GENDER (LE SEXE)
NamSor Applied Onomastics
1
2014-06-26
Où sont les Femmes? 2
Male 86%
Female 14%
Paris DataGeek Gender Gap
Où sont les Femmes? ex. le CINOCHE Mining 5M names to assess GENDER*
3
IMDB File THE CINEMATOGRAPHERS LIST
Name Origin Male Female Unknown
France 82% 16% 2%
Tunisia 77% 16% 8%
Morocco 80% 15% 5%
Algeria 86% 11% 3%
Ireland 89% 10% 1% *Using NamSor GendRE API v0.0.13
What’s in a name? What’s a name? 4
Elena Rossini
@_Elena (Twitter)
Elian Carsenat
@ElianCarsenat (Twitter)
tioulpanov (Skype)
NamSor.com
+ Social Network (LinkedIn, Twitter, FB …) : more names
Onomastics = the science of proper names
NamSor socio linguistics algorithm 5
FN LN
Mette Andersen
Lene Andersson
Eva Arndt-Riise
Heidi Astrup
Mie Augustesen
Margot Bærentzen
Louise Bager Nørgaard
Marie Bagger Rasmussen
Yutta Barding
Ulla Barding-Poulsen
FN LN
Xian Dongmei
Zheng Dongmei
Jin Dongxiang
Xu Dongxiang
Li Dongxiao
Qin Dongya
Li Dongying
Han Duan
Li Duihong
Jiang Fan
Training set : Athletes
Step 1 – Learn stereotypes bitao gong
biwang jiang
birgitta agerberth
birgitte l. eriksen
bitao gong
bitten thorengaard
biwang Jiang
birgitta agerberth
birgitte l. eriksen
bitten thorengaard
Data set : Actors
Step 2 – Classify
Decrypting IDENTITY 6
Source: Commonwealth WWI Casualties
Mining 3M Geo-Tweets to map FLOWS
7
Source Target Type Id Onoma Weight
United Kingdom France Directed 16 Great Britain 37
Spain France Directed 55 Spain 14
United States France Directed 75 Great Britain 12
Turkey France Directed 79 Turkey 11
Brazil France Directed 87 Portugal 10
United Kingdom France Directed 112 Ireland 9
Italy France Directed 152 Italy 7
Switzerland France Directed 226 France 5
Belgium France Directed 247 France 5
United Kingdom France Directed 258 France 5
Mexico France Directed 287 Spain 4
Ireland France Directed 317 Great Britain 4
United Kingdom France Directed 333 Italy 4
United States France Directed 375 France 4
Source: Twitter
Isn’t predicting gender SIMPLE? 8
Can you tell: Andrea/Rossini vs. Andrea/Parker
O./Sokolova
Kjell/Bergqvist
声涛/周
נתניהו/בנימין
المرعبي/معين
Our target, globally for all countries/lang./cultures:
99% precision, 99% recall for both Male & Female
We’re getting there, combining classic baby name statistics with our unique algorithm
9
100% of objectives reached for 10 countries
75% of objectives reached for 28 countries
Currently, each version brings
~30% improvement!
Want to play?
10
Android Gadget RapidMiner Extension
11
Improve your targeting, increase your open and
click rates by saying "Hello Sir", "Hello Madam"
without mistakes in your emailing
Conclusions 12
We recognize names in any language, any place, any database; we can classify and we can sort
Onomastic class is no ‘hard fact’ like a place of birth, a nationality, etc. but it’s accurate and fine-grain
Our sociolinguistic approach surpasses the traditional geo-demographics or ‘dictionary’ approach used in the US/UK
Our unique capability to decrypt identity and gender in high growth / emerging countries (Russia, Africa, India, Indonesia…) can be put to work in a wide range of applications
Elian Carsenat
http://fdimagnet.com/
http://namsor.com/
13
Juillet 2013, Ambassade de Lituanie à Paris
+33 6 52 77 99 07
Twitter @NamsSor_com
APPENDICES
14
NamSor sorts names : functions, use cases 15
2.Name Transliteration & Matching
3.Named Entity Extraction, Parsing
1.Name Ling. Classification
Multilingual Text Mining
Control Watch Lists
Social Networks Analytics
Geo demographics
Migration Studies
Gender Studies