Top Banner
Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)
23

Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Dec 19, 2015

Download

Documents

Chrystal Cook
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Influence in Classification via Cooperative Game Theory

Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Page 2: Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Big Data Analysis and Transparency•Big data is big business. •It is “good”: able to identify trends,

produce accurate results, impartial (algorithms are not inherently discriminatory).

•It is not transparent! •As a user (or even as a data scientist!) it

is hard to tell what factors determine classification outcomes.

Page 3: Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Motivation

•We are given classified dataset (flagged clients in a bank).

•Classifier is unknown.•What is the importance of a given

feature to the classification outcome?

(F,25-35,English,PA)

(M,18-25,English,CO)

(F,35-55,Spanish,NY)

(M,25-35,English,PA)

(F,18-25,Spanish,PA)

(M,18-25,Spanish,PA)

(M,25-35,Spanish,PA)(F,35-55,Spanish,PA)

(M,18-25,Spanish,PA)

Page 4: Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Methodology

•Feature selection: learn a classifier, see what features add the most information. ▫Are we choosing the right classifier to

learn? Can be very complex.▫Some classifiers have no intuitive notion of

feature importance (e.g. decision trees).▫Requires a lot of knowledge about the

dataset (what happens when features are removed).

Page 5: Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Methodology

•Our approach: we assign a value to every feature .

•Corresponds to power indices in cooperative games.

•Empirical influence (dataset based)•Can be justified axiomatically.•Verified empirically.•Relates to notions of cause,

responsibility and blame.

Page 6: Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Notation

•A set of features •For each , - set of possible states.• - all possible profiles.• , labels data. •Dataset: , where

(we don’t see all profiles)•Can also be written as ,

where and are disjoint.

Page 7: Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Notation

•An influence measure: a function that, given a dataset , outputs a value for every feature .“how important is gender for this classification?”

Page 8: Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Ideas from Game Theory

•If for all , we have a classic cooperative game (features are players)

•Our work can be thought of as an extension of the Banzhaf index to games where players have more than one state (e.g. OCF games).

•In particular, when applied to cooperative games, our value is exactly the Banzhaf index.

Page 9: Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Causality

•Formal notions of causality: the value of is , is feature a cause of it?How responsible is ? Is to blame for it?

•Our work can be thought of as an (initial) application of these notions to the machine-learning setting (via a cooperative game methodology).

Page 10: Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Axiomatic ApproachA feature is a dummy if for all and all .Dummy property: whenever is a dummy.

A measure is state symmetric if relabeling of states does not change its value.A measure is feature symmetric if relabeling of features does not change their value.

Page 11: Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Axiomatic ApproachA measure is additive if

Theorem: if satisfies the dummy, symmetry and additivity axioms, it assigns a value of to all features.

Bad news… Standard notions will not immediately work.

Page 12: Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Axiomatic ApproachA measure satisfies the disjoint union property if

For disjoint.

𝐿

𝑊𝑊 ′

𝜑𝑖(𝑊 ,𝐿)𝜑𝑖(𝑊 ′ ,𝐿)𝜑𝑖 (𝑊 ,𝐿 )+𝜑𝑖 (𝑊 ′ ,𝐿)

Page 13: Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Axiomatic ApproachTheorem: if satisfies the dummy, symmetry and disjoint union axioms, then ; here:

and is a constant independent of (but may depend on ).

Page 14: Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Axiomatic Approach

• measures the number of times that a change in state causes a change in value.

•Coincides with the Banzhaf value.

Page 15: Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Relation to Linear ClassifiersTheorem: suppose that is a linear classifier, defined by and . Then if and only if .

High weight translates to high influence!

Page 16: Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

ExtensionsState Influence: how influential is being 25-35, vs. how important is age.

Weighted Influence: each vector has a weight.

Generalized distance measure: replacing with a pseudo-distance .

Can be axiomatically justified.

Page 17: Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Implementation

•To test our measure’s behavior, we measure influence on a generated dataset.

•We employ the AdFisher framework [Datta et al. 2014] to create fake Google user profiles and observe the ads that they are presented.

Page 18: Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Implementation

•1200 simulated users, different setting of ▫Gender: male or female▫Age: 18-24, 35-44, 55-64▫Language: {English, Spanish}

•Go to bbc.com/news, collect the ads displayed. •We then compare the different demographics

in terms of ads displayed.

Page 19: Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Top Ads for AgeTitle/Ad Description Influence

Buy Home For Taxes Owed/Or Get 18-36% Interest! Watch 8min Video That Explains All.

0.07

Jim Rickards Project 2015/Economist, Jim Rickards explains the coming economic crash.

0.0663

”My Insomnia Trick”/Naturally Fall Asleep Fast, Stay Asleep All Night – Wake Up Refreshed

0.0661

Get In Now With Graphene/Money-Making Mineral Set To Launch Can Shape The World And Your Wealth

0.0611

Sciatica Exercises?/Stop: What You MUST know Before attempting to Treat your Sciatica:

0.0606

Statistic Value

Mean 0.0318

Median 0.031

StdDev 0.0144

Page 20: Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Top Ads for GenderTitle/Ad Description Influence

Jim Rickards Project 2015/Economist, Jim Rickards explains the coming economic crash.

0.07

Buy Home For Taxes Owed/Or Get 18-36% Interest! Watch 8min Video That Explains All.

0.0583

Tech Gadgets/Daily Deals on Modern Gadgets. Exclusive Pricing - Up To 70% Off.

0.0564

Get In Now With Graphene/Money-Making Mineral Set To Launch Can Shape The World And Your Wealth

0.0561

Elabore su Presupuesto/Nuestros Consejeros Certificados Est´an listos para ayudarlo

0.0534

Statistic Value

Mean 0.0324

Median 0.0299

StdDev 0.0161

Page 21: Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Top Ads for LanguageTitle/Ad Description Influence

Elabore su Presupuesto/Nuestros Consejeros Certificados Est´an listos para ayudarlo

0.1667

The Greatest Penny Stocks/Get free daily penny stock alerts. Join now. New pick out soon.

0.0755

Business Leads CRM/Business Lead Manager, Dialer, CRM. 400% Boost in Conversion Rates.

0.0683

Get In Now With Graphene/Money-Making Mineral Set To Launch Can Shape The World And Your Wealth

0.0644

Buy Home For Taxes Owed/Or Get 18-36% Interest! Watch 8min Video That Explains All.

0.06

Statistic Value

Mean 0.033

Median 0.0291

StdDev 0.024

Page 22: Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Findings

•Overall influence of specific features over ads is somewhat limited (except for language).

•Ads seem to be targeted at specific subsets (e.g. young men and elderly women).

•Further (more refined) measurements on larger dataset needed.

Page 23: Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Future Work

•Beyond single state changes (what is the minimal number of changes to others’ states that we need in order to affect a change in value?); necessary if we want to use our measure in datasets where we cannot control the features.

•What happens when there are priors on data?

•White box vs. Black box analysis.

Thank you! Questions?