Top Banner
Fairness-aware AI A data science perspective Indrė Žliobaitė Dept. of Computer Science, University of Helsinki October 15, 2018
82

Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

May 28, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Fairness-aware AIA data science perspective

Indrė ŽliobaitėDept. of Computer Science, University of Helsinki

October 15, 2018

Page 2: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Image source: https://analyticsindiamag.com/turing-test-key-contribution-field-artificial-intelligence/

Machine intelligence?

Strong AI – machine consciousness and mind

Page 3: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Weak AI – focused on one narrow task

no genuine intelligence, no self-awareness, no life

Image source: http://www.trustedreviews.com/news/apple-s-next-gen-siri-app-could-blow-alexa-and-co-away-2943759https://www.translinguoglobal.com/10-reasons-why-google-translate-is-not-better-than-learning-language/

Page 4: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

AI in everyday life: data driven decision support

Image source: https://www.thebalance.com/how-credit-scores-work-315541http://tcgstudy.com/ranking_to_universities.html https://www.pinterest.com/pin/443041682071895474/https://www.insperity.com/blog/people-analytics-step-step-guide-using-data-make-hiring-decisions/

Banking Hiring

University admittanceSports analytics

Page 5: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Learning coarse representations

Source: https://www.quora.com/What-is-deep-learning-Why-is-this-a-growing-trend-in-machine-learning-Why-not-use-SVMs

We don't want to know how decisions are made in a doctor's head.We trust the judgement. How about AI?

Page 6: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

May 11, 2016

Page 7: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

How models are built

Learn model Use model

historicaldata

Model

Learningalgorithm

newdata

Model

prediction

Imag

e s

our

ce:

Intr

odu

ctio

n t

o D

ata

Min

ing

, 2n

d E

ditio

nby

Tan

, S

tein

bac

h,

Kar

pa

tne,

Ku

ma

r

Data science:model is a summary of data

Page 8: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Rule-based models: decision tree

Image source: Introduction to Data Mining, 2nd Editionby Tan, Steinbach, Karpatne, Kumar

Page 9: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017
Page 10: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

If there is discrimination in data on which a model is trained

unless instructed otherwise, a learned model willcarry discrimination forward

historicaldata

Model

Learningalgorithm

newdata

Model

prediction

Imag

e s

our

ce:

Intr

odu

ctio

n t

o D

ata

Min

ing

, 2n

d E

ditio

nby

Tan

, S

tein

bac

h,

Kar

pa

tne,

Ku

ma

r

Model is a summary of data

Page 11: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

If there is discrimination in data on which a model is trained

unless instructed otherwise, a learned model willcarry discrimination forward

historicaldata

Model

Learningalgorithm

newdata

Model

prediction

Imag

e s

our

ce:

Intr

odu

ctio

n t

o D

ata

Min

ing

, 2n

d E

ditio

nby

Tan

, S

tein

bac

h,

Kar

pa

tne,

Ku

ma

r

Model is a summary of data

Image: https://commons.wikimedia.org/wiki/File:Rosetta_Stone_BW.jpeg

Page 12: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Fairness-aware AI

Page 13: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Conditional discrimination paper by Žliobaitė et al2011

First dedicated course, Aalto and UH2015

Obama report II 2015

Google hires Moritz Hardt 2015

2015 First causality perspective paper by Bonchi at al

2008 First paper on discrimination-aware data mining by Pedrechi et al

2009 First paper on algorithmic prevention by Kamiran and Calders

2012 First dedicated workshop at IEEE ICDM

2013 Edited book

2014 Special issue in AI and Law

2017-2018 A new paper or a few every week,many on optimization criteria/ measurement

2017-2018Major attention from the society, media, funding agencies

2016 O'Neil's book

Second PhD thesis: S.Hajian2013

2010 First Bayesian solutions by Calders and Verwer

First PhD thesis: F.Kamiran2011

Brief history offairness-aware ML

AirBnB case: digital discrimination 2014

2015Repeated coverage in Guardian

2017FATML is sold out

2016 Many research papers are coming out, conferences start having dedicated tracks

2014/5FATML is established

Obama report I 2014

Obama report III 2016

2016NGOs are being established

Page 14: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

● Discrimination – inferior treatment based on ascribed grouprather than individual merits

● Discriminate = distinguish (lat.)

● Machine learning

– uses proxies to distinguish an individual from the mean

– without judging what is morally right or wrong

– can enforce constraints defined by legislation and/or social norms

Machine learning and discrimination

(external)

Page 15: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

There are no “right” or “wrong” variables

Morality is a social convention?

Immanuel Kant introduced the categorical imperative: "Act only according to that maxim whereby you can, at the same time, will that it should become a universal law"

Source: Wikimedia Commons, Unidentified painter 18th-century portrait painting of men

Page 16: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017
Page 17: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Why unbiased computational processes can lead to discriminative decision procedures

● data is incorrect

● due to biased decisions in the past● population is changing over time

● data is incomplete (omitted variable bias)

● sampling procedure skews the data

Calders and Žliobaitė 2013, book chapter

Page 18: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Desired outcomes?

Page 19: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

No direct discrimination“Twin test”

– p(+|X,s1) ≠ p(+|X,s2)

Image source: https://www.reference.com/science/can-boy-girl-identical-twins-f20c37b6ec0408c1

Page 20: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Source: https://www.astridbaumgardner.com/blog-and-resources/blog/ysm-mock-auditions/

“Blind” auditioning

Page 21: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

No indirect discrimination

● Challenges:

– what is the “right” outcome (qualitatively)?

– what level of positive decision we are aiming at(quantitatively)

– p(+|X,s1) = p(+|X,s2) AND

– p(+|X) = p(+|X)”right”

Page 22: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017
Page 23: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Source: "Home Owners' Loan Corporation Philadelphia redlining map". Licensed under Public Domain via Wikipedia

Redlining

Page 24: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

● Discrimination – inferior treatment based on ascribed group

rather than individual merits

Machine learning and discrimination

Predictive models y → f(X)

y polarized

X

s

Source: https://www.theverge.com/2018/8/16/17693866/ford-self-driving-car-safety-report-dot

Page 25: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

● Discrimination – inferior treatment based on ascribed group

rather than individual merits

Machine learning and discrimination

Predictive models y → f(X)

y polarized

X

s

?

Source: https://ilmatieteenlaitos.fi/

Page 26: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

● Discrimination – inferior treatment based on ascribed group

rather than individual merits

Machine learning and discrimination

Predictive models y → f(X)

y polarized

X

s

? ?

Source: https://www.theverge.com/2018/8/16/17693866/ford-self-driving-car-safety-report-dotSource: https://ilmatieteenlaitos.fi/

Page 27: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

● Removing protected characteristic

– y → f(X,s) y → f(X)

Page 28: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

http://www.stevehackman.net/wp-content/uploads/2013/02/Sneetches.jpg

SneetchesDr. Seuss, 1961

"...until neither the Plain nor theStar-Bellies knew

whether this one was that one... orthat one was this one...

or which one was what one... orwhat one was who."

Page 29: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

● Removing protected characteristic does not solve the problem ifs is correlated with X

– desired: y → f(X)

s

X

y“ethnicity”

“test score”

“accept/reject?”

Page 30: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

● Removing protected characteristic does not solve the problem ifs is correlated with X

– desired: y → f(X)

s

X

y s

X

y s

X

y

No problem Problem! No problem

Page 31: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

● Removing protected characteristic does not solve the problem ifs is correlated with X

– desired: y → f(X)

– what happens: y → f(X,s*), s* → f(X)

X

y s

X

y

X

y

No problem Problem! No problem

Page 32: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Naive baseline

Page 33: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Naive baseline: removing sensitive variable

● Suppose salary is decided (in decision maker's head) as

Page 34: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Naive baseline: removing sensitive variable

● Suppose salary is decided (in decision maker's head) as

● Data scientist assumes

Page 35: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Naive baseline: removing sensitive variable

● Suppose salary is decided (in decision maker's head) as

● Data scientists assumes

● Observes data

Page 36: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Naive baseline: removing sensitive variable

● Suppose salary is decided (in decision maker's head) as

● Data scientist assumes

● Observes data

● Learned model

Page 37: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Removing sensitive variable makes it worse

● Suppose salary is decided (in decision maker's head) as

● Data scientist assumes

● Observes data

● Learned model Lower base salary, higher reward for education

the model punishes ethnical minorities

Page 38: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

How large is the bias?

● Suppose salary is decided (in decision maker's head) as

● Learned model

● Salary = (1000 - 398) + (100 + 28) x education

Žliobaitė and Custers 2016

Punishment on base salary

γ = α x mean(Ethnicity) – β x mean(education) = (-500)x0.5 – 28x5.3 = 398

α

Award for education

β = α x Cov(Education,Ethnicity)/Var(Education) = (-500)x(-0.72)/12.9 = 28

γ β

Page 39: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

A simple solution (special case)

● Learn a model on the full dataset

● Remove the sensitive component of the model

Žliobaitė and Custers 2016

Page 40: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Measuring discrimination

Page 41: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Measuring

p(+|M) = 50%p(+|F) = 25%

Page 42: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Measuring

p(+|M) = 50%p(+|F) = 25%

p(+|M) = 38%p(+|F) = 38%

Page 43: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Measuring

Reverse discrimination

Page 44: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Measuring

p(+|1..3,M) = n.a.p(+|1..3,F) = 0%

p(+|4..6,M) = 0%p(+|4..6,F) = 0%

p(+|7..9,M) = 67%p(+|7..9,F) = 67%

p(+|10..12,M) = 100%p(+|10..12,F) = n.a.

Page 45: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Measuring

● “Twin test” / counterfactual fairness

– p(+|X,M) = p(+|X,F)

● No “Redlining”

– p(+|X,M) = p(+|X,F) = p(+|X) = “right”

Page 46: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Kamiran and Žliobaitė 2013

Which is the “right” level?

Like all?

Like male?

5.8$/h

13.8$/h

11.4$/h

Redistribution of the same total resources?

Page 47: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Indirect discrimination

Page 48: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Kamiran and Žliobaitė 2013

Is there discrimination?

25%

45% 35%

15%

Page 49: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Is there discrimination?

No discrimination

Discrimination is present

How to correct?What should be the acceptance rate?

Kamiran et al 2013, KAIS

Page 50: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Is there discrimination?

No discrimination

Discrimination is present

Kamiran et al 2013, KAIS

200030%600

200030%600

Page 51: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Adversary action

5%5%

++++

55%55%

++++++++

Page 52: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Redlining / indirect discrimination

No discrimination

Is there discrimination, or not?

200030%600

Page 53: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Redlining / indirect discrimination

No discrimination

Is there discrimination, or not?

200030%600

200030%600

50 550

400200

Page 54: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Kamiran et al 2013

Page 55: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

See literature for more on measuring

● Starting points:

– Kamiran, F., Žliobaitė, I. and Calders, T. (2013).Quantifying explainablediscrimination and removing illegal discrimination in automated decisionmaking.Knowledge and Information Systems 35(3), p. 613-644.

– Žliobaitė, I. (2017). Measuring discrimination in algorithmic decisionmaking. Data Mining and Knowledge Discovery 31(4), 1060-1089.

● We can measure on model outputs the same way we measure on data

Page 56: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Algorithmic solutions

Page 57: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Algorithmic solutions

● Preprocessing

– Modify historical data (inputs, protected or outcomes)

– Resample input data

● Post-processing

– Modify models

– Modify predictions

● Optimization with non-discrimination constraints

Page 58: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Modify input data

● Modify y - massaging

Kamiran and Calders 2009

Page 59: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Modify input data

● Modify X – massaging

– Any attributes in X that could be used to predict s are changedsuch that a fairness constraint is satisfied

– approach is similar to sanitizing datasets for privacy preservation

Feldman et al 2014

Page 60: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Resample

● Preferential sampling

Kamiran and Calders 2010

Page 61: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Post-processing

● Modifying the resulting model

Relabel tree leaves to remove

the most discrimination with

the least damage

to the accuracy

Kamiran et a 2010

Page 62: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Optimization with constraints

● Decision tree

Entropy wrt class label

Data subsetdue to split

Regular tree induction

Discrimination-aware tree induction

Entropy wrt protected characteristic

Tree splits are decided on: IGC - IGS

Kamiran et a 2010

Page 63: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Prevention solutions

● Preprocessing

– Modify input data X, s or y

– Resample input data

● Post-processing

– Modify models

– Modify outputs

● Learning with constraints

From the legal perspective ??

– Decision manipulation – very bad

– Data manipulation – quite bad

– Learning with constraints - ok

● Protected characteristic shouldnot be used in decision making

Page 64: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Baselines

Page 65: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Problem

● Baseline accuracy and baseline discrimination varies withvarying overall acceptance rate

● Classifiers with different overall acceptance rates are notcomparable

True Prediction

Acc = 50%

True Prediction

Acc = 90%

Page 66: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Experiment

Adult dataset from UCI

Page 67: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Discrimination-accuracy tradeoffs

Decreasing acceptance rates may show lower nominal discrimination!

Žliobaitė 2015

Page 68: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Preferential treatment is a ranking problem

● No discrimination baseline: random order

● Maximum discrimination: all members of the favored community gobefore all the members of the protected community

Žliobaitė 2015, discrimination-accuracy trade-offs

Page 69: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Normalized measures

● We propose to normalize discrimination by dmax

– d = D/dmax, where ● 1 max, 0 no discrimination, <0 reverse discrimination

● We recommend normalizing accuracy – Cohen's kappa

– k = (Acc – RAcc) / (1 – Racc)● 1 max, 0 like random, <0 very bad

acceptance rate p(+)

proportion of natives p(native)

Page 70: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Experiment cont.

Page 71: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Tradeoffs

Page 72: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Kamiran et al 2010

Baselines and tradeoffs

Census income dataset

Page 73: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Testing on discriminatory data?

discrimination

accuracy

0

0

100%

100%

data

pred

ictive

mod

els

random/constant

Page 74: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Testing on discriminatory data?

discrimination

accuracy

0

0

100%

100%

data

pred

ictive

mod

els

random/constant

Shouldn't removing discrimination improve accuracy?

Page 75: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Testing on discriminatory data?

Shouldn't removing discrimination improve accuracy?

Page 76: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Algorithm auditing?

Page 77: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Deon - Data Science Ethics Checklist¶

● Use with caution!

● http://deon.drivendata.org/#a-data-collection

Page 78: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017
Page 79: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017
Page 80: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017
Page 81: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

Algorithm auditing?

● Assessing a particular model– in the context of application on a particular dataset/population

– with respect to a particular set of sensitive variables and

– a particular legislation

● Assessing the modeling process– Assessing the procedure of how models are made

– Authorities? Methodology?

– Checklist everything and there will be no information left tolearn upon

Page 82: Fairness-aware AI A data science perspective · 2018-10-22 · fairness-aware ML AirBnB case: digital discrimination 2014 Repeated coverage in Guardian 2015 FATML is sold out 2017

There are no “right” or “wrong” variables

Source: Wikimedia Commons, Unidentified painter 18th-century portrait painting of men

Image source: https://analyticsindiamag.com/turing-test-key-contribution-field-artificial-intelligence/

AI is not there yet to replace human moralityFairness-awareness in AI needs to be explicit