Top Banner
Yanai Elazar and Yoav Goldberg Bar-Ilan University / NLP Group November 2, 2018 Adversarial Removal of Demographic Attributes from Text Data
69

November 2, 2018 Bar-Ilan University / NLP Group ...

Mar 19, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: November 2, 2018 Bar-Ilan University / NLP Group ...

Yanai Elazar and Yoav Goldberg

Bar-Ilan University / NLP Group

November 2, 2018

Adversarial Removal of Demographic Attributes from Text Data

Page 2: November 2, 2018 Bar-Ilan University / NLP Group ...

Motivation

Text is used for predictions

2

Page 3: November 2, 2018 Bar-Ilan University / NLP Group ...

Motivation

• For example, consider a text classification setup, where we predict:• Hiring decisions• Mortgages approval• Loans rates

This applicant would easily get any NLP job3

Page 4: November 2, 2018 Bar-Ilan University / NLP Group ...

Motivation

The common implementation:

Input CV

ML Model

Hire

Don’t Hire

4

Page 5: November 2, 2018 Bar-Ilan University / NLP Group ...

Motivation

The common implementation:

Input CV

Representation

Hire

Don’t Hire

Encode Predict

5

Page 6: November 2, 2018 Bar-Ilan University / NLP Group ...

Motivation

• But then we see this

6

Page 7: November 2, 2018 Bar-Ilan University / NLP Group ...

Motivation

• When deciding on recruiting an applicant from his/her writings/CV • We would like that attributes like the author’s

• Gender• Race• Age

• Won’t be part of the decision• In some places, this is even illegal

7

Page 8: November 2, 2018 Bar-Ilan University / NLP Group ...

Motivation

• We seek to build models which are:• Predictive for some main task (e.g. Hiring decision)

• Agnostic to irrelevant attributes (e.g. race, gender, …)

8

Page 9: November 2, 2018 Bar-Ilan University / NLP Group ...

Text classification - Example

We do not have access to sensitive tasks like Resumes.

We will focus on other tasks, less sensitive

9

Page 10: November 2, 2018 Bar-Ilan University / NLP Group ...

Text classification - Example

Let's predict... EMOJIS

We use DeepMoji.

DeepMoji is a model for predicting Emojis from tweets

10

Page 11: November 2, 2018 Bar-Ilan University / NLP Group ...

Text classification - Example

Let's predict... EMOJIS Deep Moji (Felbo et al., 2017)

11

Page 12: November 2, 2018 Bar-Ilan University / NLP Group ...

Text classification - Example

Let's predict... EMOJIS Deep Moji (Felbo et al., 2017)

● DeepMoji is a strong and expressive model

● It also create powerful representations

12

Encode Predict

Page 13: November 2, 2018 Bar-Ilan University / NLP Group ...

Text classification - Example

Let's predict... EMOJIS Deep Moji (Felbo et al., 2017)

● DeepMoji is a strong and expressive model

● It also create powerful representations

● Achieved several SOTA results on text classification 13

Encode Predict

Page 14: November 2, 2018 Bar-Ilan University / NLP Group ...

Text classification - Example

Let's predict... EMOJIS

Does this representation also contain information on sensitive attributes?

14

Encode Predict

Race

Gender

Age

Page 15: November 2, 2018 Bar-Ilan University / NLP Group ...

Setup

I love messing with yo mind

Embeddings

Encoder

Representation

Classifier

x

h(x)

DeepMoji Encoder

Task(Emojis)

We use the representation that predict Emojis

Deep Moji (Felbo et al., 2017)

15

Page 16: November 2, 2018 Bar-Ilan University / NLP Group ...

Setup

Representation

Classifier

h(x)

Task(Emojis) Demographics

(Gender)a.k.a. Atta

cker

We use the representation that predict Emojis

And use them to predict demographics.

We define: leakage = score above a random guess an “Attacker” achieves

Page 17: November 2, 2018 Bar-Ilan University / NLP Group ...

Text Leakage – Case Study

● We use DeepMoji encoder, to encode tweets, from 3 datasets,

all binary and balanced

17

0 1 0 1 0 1

● Each dataset is tied to a different demographic label

● We then train Attackers to predict these attributesDemographics(e.g. Gender)a.k.a. Atta

cker

Page 18: November 2, 2018 Bar-Ilan University / NLP Group ...

Text Leakage – Case Study

Big Surprise? DeepMoji

The dev-set scores above chance level are quite high

Not really.This is the core idea in

Transfer-Learning.We’ve seen its benefits in pretrained embeddings, language models etc.

18

Random Guess

Page 19: November 2, 2018 Bar-Ilan University / NLP Group ...

Text Leakage – Case Study

• Why do we get this major “help” in predicting other

attributes than those we trained on?

• One option is the correlation between attributes in the data

Fair enough. Let’s control it

19

Page 20: November 2, 2018 Bar-Ilan University / NLP Group ...

Controlled Setup

20

Page 21: November 2, 2018 Bar-Ilan University / NLP Group ...

New setup

• We focus on sentiment prediction, emoji based

• With Race, Gender and Age as protected attributes

21Blodgett et al., 2016 Rangel et al., 2016 Rangel et al., 2016

• We use Twitter data

Page 22: November 2, 2018 Bar-Ilan University / NLP Group ...

New setup

Demographics

Task(Sentiment)

50%Positive

50%Negative

50% Male 50% FemaleBalanced Dataset

22

Page 23: November 2, 2018 Bar-Ilan University / NLP Group ...

Balanced Training

I love messing with yo mind

Embeddings

Encoder

Representation

Classifier

Main Task (sentiment)

Training our own encoder on the balanced datasets

23

Page 24: November 2, 2018 Bar-Ilan University / NLP Group ...

Balanced Training

I love messing with yo mind

Encoder

Representation

x

Protected Attribute (gender) at (h(x))

Freeze

Trainable

h(x)

Embeddings

And using the Attacker to check for leakage

24

Page 25: November 2, 2018 Bar-Ilan University / NLP Group ...

Balanced Training - Leakage

We wanted to see something like this:

25

But instead...

Random Guess

Page 26: November 2, 2018 Bar-Ilan University / NLP Group ...

Balanced Training - Leakage

The Attacker manages to extract a substantial amount of sensitive information

Even in a balanced setup, leakage exists

26

Random Guess

Page 27: November 2, 2018 Bar-Ilan University / NLP Group ...

Our objective

• Create a representation which:• Is predictive of the main task (e.g. sentiment)

27

Page 28: November 2, 2018 Bar-Ilan University / NLP Group ...

Our objective

and• Is not predictive of protected attribute (e.g. gender, race)

• Create a representation which:• Is predictive of the main task (e.g. sentiment)

28

Page 29: November 2, 2018 Bar-Ilan University / NLP Group ...

Our objective

• Interesting technical problem – How to unlearn something?• Interesting technical problem – Can we unlearn something?

29

Page 30: November 2, 2018 Bar-Ilan University / NLP Group ...

30

Actively Reducing Leakage

Page 31: November 2, 2018 Bar-Ilan University / NLP Group ...

Adversarial Setup

• First introduced by Goodfellow et al., 2014

• A very active line of research

• We will go through the details

31

Page 32: November 2, 2018 Bar-Ilan University / NLP Group ...

Adversarial Setup

• The motivation came from “Generative Models”

• We would like to automatically create images

• From… random input?

32

Page 33: November 2, 2018 Bar-Ilan University / NLP Group ...

Adversarial Setup

• 2 components:

• Generator

• Discriminator

33

Page 34: November 2, 2018 Bar-Ilan University / NLP Group ...

Adversarial Setup

A good Discriminator(real data gets a high score, meaning it’s real)

34

Page 35: November 2, 2018 Bar-Ilan University / NLP Group ...

Adversarial Setup

A good Generator(fake data gets a high score, for maximizing D’s probability)

35

Page 36: November 2, 2018 Bar-Ilan University / NLP Group ...

Adversarial Setup

● 2 competing objectives.● We don’t know how to

solve this

36

Page 37: November 2, 2018 Bar-Ilan University / NLP Group ...

Adversarial Setup

Goodfellow et al. solution:iterate training between the Generator and Discriminator

37

Page 38: November 2, 2018 Bar-Ilan University / NLP Group ...

Adversarial Setup

• The Adversarial setup was invented to create an “output”

• Which can’t (or seem hard) to separate real from fake

• What if we want to create an intermediate representation?

38

Page 39: November 2, 2018 Bar-Ilan University / NLP Group ...

Adversarial Setup

• The Adversarial setup was invented to create an “output”

• Which can’t (or seem hard) to separate real from fake

• What if we want to create an intermediate representation…

• Which is indistinguishable for some feature or attribute?

39

Page 40: November 2, 2018 Bar-Ilan University / NLP Group ...

Adversarial Setup

• Ganin and Lempitsky, 2015

• Application: Domain Adaptation

• New trick for adversary train: Gradient Reversal Layer (GRL)

40

Page 41: November 2, 2018 Bar-Ilan University / NLP Group ...

Adversarial Setup (Ganin and Lempitsky, 2015)

I love messing with yo mind

Encoder

Representation

Classifier 1(Main Task)

f(h(x))

Embeddings

x

h(x)

Predict Sentiment

41

Page 42: November 2, 2018 Bar-Ilan University / NLP Group ...

Adversarial Setup (Ganin and Lempitsky, 2015)

I love messing with yo mind

Encoder

Representation

Classifier 1(Main Task)

f(h(x))

Embeddings

x

h(x)

Predict Sentiment

Classifier 2 - Adv(Protected Attribute)

Predict Race

ad (h(x))

try to interfere

42

Page 43: November 2, 2018 Bar-Ilan University / NLP Group ...

Adversarial Setup (Ganin and Lempitsky, 2015)

Classifier 1(Main Task)

f(h(x))

x

h(x)

Classifier 2 - Adv(Protected Attribute)

ad (h(x))3 different sub-objectives

classify well

x

h(x)

adversary should succeed

Classifier 2 - Adv(Protected Attribute)

-ad (h(x))

x

h(x)

encoder should make adversary

fail43

Page 44: November 2, 2018 Bar-Ilan University / NLP Group ...

Adversarial Setup (Ganin and Lempitsky, 2015)

Classifier 1(Main Task)

f(h(x))

x

h(x)

Classifier 2 - Adv(Protected Attribute)

ad (h(x))

x

h(x)

blue: update parameterswhite: don’t update

Classifier 2 - Adv(Protected Attribute)

-ad (h(x))

x

h(x)

44

Page 45: November 2, 2018 Bar-Ilan University / NLP Group ...

Adversarial Setup (Ganin and Lempitsky, 2015)

Classifier 1(Main Task)

f(h(x))

x

h(x)

Classifier 2 - Adv(Protected Attribute)

ad (h(x))

x

h(x)

blue: update parameterswhite: don’t update

Classifier 2 - Adv(Protected Attribute)

-ad (h(x))

x

h(x)

g a (-ad (h(x)))45

Page 46: November 2, 2018 Bar-Ilan University / NLP Group ...

Adversarial Setup (Ganin and Lempitsky, 2015)

Classifier 1(Main Task)

f(h(x))

x

h(x)

Classifier 2 - Adv(Protected Attribute)

ad (h(x))

x

h(x)

blue: update parameterswhite: don’t update

Classifier 2 - Adv(Protected Attribute)

-ad (h(x))

x

h(x)

g a (-ad (h(x)))=-g a (ad (h(x)))46

Page 47: November 2, 2018 Bar-Ilan University / NLP Group ...

Adversarial Setup (Ganin and Lempitsky, 2015)

I love messing with yo mind

Encoder

Representation

Classifier 1(Main Task)

Classifier 2 - Adv(Protected Attribute) ad (h(x))

f(h(x))gradient reversal layer

Remove stuff from

representation

Embeddings

x

h(x)

47

Page 48: November 2, 2018 Bar-Ilan University / NLP Group ...

Adversarial Setup (Ganin and Lempitsky, 2015)

• In their paper, the representation after the adversarial training

seems invariant to the domain

before after48

Page 49: November 2, 2018 Bar-Ilan University / NLP Group ...

Does it work?

“I love mom’s cooking”

Successfully predicting sentiment

49

Page 50: November 2, 2018 Bar-Ilan University / NLP Group ...

Does it work?

“I love mom’s cooking”

Successfully removed demographics?

50

Page 51: November 2, 2018 Bar-Ilan University / NLP Group ...

Does it work?

During adversary training the demographic information seems

to be gone (close to chance)

IS THAT SO?51

Page 52: November 2, 2018 Bar-Ilan University / NLP Group ...

Does it work? Not so quickly...

We can still recover a considerable amount

of information

When training the Attacker

52

Page 53: November 2, 2018 Bar-Ilan University / NLP Group ...

Does it work? Not so quickly...

Consistent across tasks and protected attributes

53Random Guess

Page 54: November 2, 2018 Bar-Ilan University / NLP Group ...

Does it work? more or less

Well, the adversarial method does help.But not enough

54

Random Guess

Page 55: November 2, 2018 Bar-Ilan University / NLP Group ...

While effective during training, in test time, the adversarial do not remove all the protected

information

55

Page 56: November 2, 2018 Bar-Ilan University / NLP Group ...

Stronger, Better, Bigger???

Can we make stronger adversaries?

56

Page 57: November 2, 2018 Bar-Ilan University / NLP Group ...

Stronger, Better, Bigger???

I love messing with yo mind

Embeddings

Encoder

Representation

Classifier 1(Main Task)

Classifier 2 - Adv(Protected Attribute) ad (h(x))

f(h(x))

x

h(x)

More Parameters!Baseline

gradient reversal layer

57

Page 58: November 2, 2018 Bar-Ilan University / NLP Group ...

Stronger, Better, Bigger???

I love messing with yo mind

Embeddings

Encoder

Representation

Classifier 1(Main Task)

Classifier 2 - Adv(Protected Attribute) ad (h(x))

f(h(x))

x

h(x)

Bigger Weight!Baseline

gradient reversal layer

Scale the reverse gradients

58

Page 59: November 2, 2018 Bar-Ilan University / NLP Group ...

Stronger, Better, Bigger???

I love messing with yo mind

Embeddings

Encoder

Representation

Classifier 1(Main Task)

Classifier 2 - Adv(Protected Attribute) ad (h(x))

f(h(x))

x

h(x)

gradient reversal layer

More Adversaries!

gradient reversal layer

Classifier 3 - Adv(Protected Attribute)Baseline

59

Page 60: November 2, 2018 Bar-Ilan University / NLP Group ...

Stronger, Better, Bigger???

60

Page 61: November 2, 2018 Bar-Ilan University / NLP Group ...

Stronger, Better, Bigger???

Better, but still not perfect

61

Page 62: November 2, 2018 Bar-Ilan University / NLP Group ...

Error Analysis

62

Page 63: November 2, 2018 Bar-Ilan University / NLP Group ...

Wait. I remember this thing called Overfitting

● We still have a problem○ During training it seems that the information was removed○ But the Attacker tells us another story

● Everything we reported was on the dev-set● Is it possible that we just overfitted on the training-set?

63

Page 64: November 2, 2018 Bar-Ilan University / NLP Group ...

Wait. I remember this thing called Overfitting

● “Adversary overfitting”:○ Memorizing the training data○ By removing all its sensitive information○ While leaking in test time

64

Page 65: November 2, 2018 Bar-Ilan University / NLP Group ...

Wait. I remember this thing called Overfitting

We trained on 90% on the “overfitted” training set, and tested the remaining 10%

65

90% 10%

Training Set

new Train new Dev It is more than that

Page 66: November 2, 2018 Bar-Ilan University / NLP Group ...

Persistent Examples

• What are the hard cases, which slip the adversary?

• We trained the adversarial model 10 times (with random seeds)

• then, trained the Attacker on each model

• We collected all examples, which were consistently labeled

correctly

66

Page 67: November 2, 2018 Bar-Ilan University / NLP Group ...

Persistent Examples

AAE(“non-hispanic blacks”)

Enoy yall day

_ Naw im cool

My Brew Eatting

My momma Bestfrand died

Tonoght was cool

More about the leakage origin can be found in the paper67

SAE (“non-hispanic whites”)

I want to be tan again

Why is it so hot in the house?!

I want to move to california

I wish I was still in Spain

Ahhhh so much homework.

Page 68: November 2, 2018 Bar-Ilan University / NLP Group ...

Few words about fairness

• Throughout this work, we aimed in achieving zero leakage, or in

other words: fairness by blindness

• Many other definitions for “fairness” (>20)

• With 3 popular

• Demographic parity

• Equality of Odds

• Equality of Opportunity

In the paper, we prove that in out setup (balanced data) these definitions are identical

68

Page 69: November 2, 2018 Bar-Ilan University / NLP Group ...

Summary

● When training a text encoder for some task

○ Encoded vectors are still useful for predicting various things (“transfer

learning”)

○ Including things we did not want to encode (“leakage”)

● It is hard to completely prevent such leakage

○ Do not blindly trust adversarial training

○ Recheck your model using an “Attacker”

Thank you

69