Top Banner
www.ihsn.org •International Household Survey Network A network of international agencies Based in Paris at the OECD at PARIS21 A coordinating mechanism to: Improve quality and use of household survey data in developing countries Harmonize international recommendations for survey design, data analysis, etc Produce and disseminate international good practices About IHSN
21

Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier.

www.ihsn.org

• International Household Survey Network• A network of international agencies

• Based in Paris at the OECD at PARIS21• A coordinating mechanism to:

– Improve quality and use of household survey data in developing countries

– Harmonize international recommendations for survey design, data analysis, etc

– Produce and disseminate international good practices

About IHSN

Page 2: Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier.

www.ihsn.org

Accelerated Data Program

• Implementing the IHSN Tools in the countries• Technical and financial support to establish national data

archives (in > 50 countries)• Many datasets documented (DDI)• Improved access to data by researchers, but not yet

satisfactory. We can measure demand through the NADA• Need to anonymize data remains the most frequently

expressed concern and obstacle to data access.• The ADP has provided some guidance but there is a lack of

simple and intuitive tools and guidelines available ADP countries.

Page 3: Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier.

ADP/IHSN in the world

ADP country Expected ADP in 2009 By partners

Page 4: Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier.

www.ihsn.org

Setting up Catalogs

Page 5: Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier.

Focus Nigeria

Effects of data availability on MDG 7.Halving the population without sustainable access tosafe drinking water.

Providing robustestimates to informpolicy makersand sectormonitoring.

Water and SanitationSector. Workshop withWHO/UNICEF

Page 6: Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier.

www.ihsn.org

Effects of Data Availability

• Nigeria and the MDG: Rural access to improved water source

Page 7: Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier.

Resistance in the countries

• Nigeria Statistics Law: Statistical Act of 2007 obliges microdata release after due anonymization. The legal framework exists.

• Willing institution (the NBS in Nigeria)• Current anonymization strategies undertaken are limited to

removal of direct identifiers however,• Other countries are unable to articulate a proper policy for

dissemination and tend to use confidentiality as a barrier to mask political resistance or inertia.

• IHSN anonymization tools will be a way to deal with both real ethical concerns but also political resistance

Page 8: Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier.

www.ihsn.org

Better use of survey data

• Lots of survey data remain under-exploited because not accessible by researchers/users

• Obstacles:– Technical – Psychological– Financial Support by many sponsors– Legal – Ethical– Political … ? …

IHSN data documentation and cataloguing tools and guidelines

Page 9: Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier.

www.ihsn.org

•Direct identifiers, which are variables such as names, addresses, or identity card numbers. They permit direct identification of a respondent but are not needed for statistical or research purposes, and should thus be removed from the published dataset.

•Indirect identifiers, which are characteristics that may be shared by several respondents, and whose combination could lead to the re-identification of one of them. For example, the combination of variables such as district of residence, age, sex, and profession would be identifying if only one individual of that particular sex, age and profession lived in that particular district. Such variables are needed for statistical purposes, and should thus not be removed from the published data files.

Anonymize:Process

Page 10: Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier.

Once all identifying variables have been removed we can still have a disclosure problem, the problem remains dealing with the indirect identifiers.

The IHSN Anonymization tools will approach these problems by building on a

great deal of technical work undertaken by experts in the field.

The IHSN hosted an expert meeting in October 2008 to present its tools and acknowledges the work done by:

University of ManchesterISTAT (Italian Statistics)Cornell UniversityICPSR

Defining the problem

Page 11: Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier.

Developing SDC tools

• Building on existing work • Not an integrated software• A collection of specialized tools for:– Measuring the risk– Reducing the risk– Assessing the information loss 12 plug ins developed in C++ that interface with SPSS,

STATA or direct Server (Windows/Linux).Need to be thoroughly tested.

Page 12: Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier.

12 Plug-ins

• 12 plug-ins1. The μ-argus risk for weighted sample2. Re-identification rate to individual risk threshold3. Individual risk to household risk4. L-diversity for unweighted data5. SUDA2: DIS-sample data

6. Kanon: Micro-aggregation7. Local recoding8. Fixed length micro aggregation9. Noise Addition10. Pram: Post Randomization11. Rank Swapping12. Sampling

Risk Measures &Intruder ScenariosWhat does theintruder know?

Risk Reduction

What does the intruderwant?

Page 13: Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier.

Based on CENEX Handbook on Statistical Disclosure Control Version 1.01

Individual risk methodology

Poisson model

Individual

Hierarchical

K-anonymityl-diversity

t-completeness

SUDA

Record linkage

Distance-based

Probabilistic

Others

Measuring Disclosure Risk

Page 14: Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier.

Based on CENEX Handbook on Statistical Disclosure Control Version 1.01

Masking data Synthetic data file

Perturbative

Sampling

Global recoding

Top/bottom coding

Local suppression

Non perturbative

MASCC

Fixed/variable group

Uni-/Multivariate

Uncorrelated

Correlated

Non-linear

Noise addition

Multiplicative noise

Micro-aggregation

Data swapping

Rank swapping

Rounding

Resampling

PRAM

Local recoding

Reducing risk disclosure

Page 15: Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier.

Categorical data Continuous data

Entropy-based measures Mean variation

Direct comparison

Comparison of contingency tables

Mean square error

Mean absolute error

Based on CENEX Handbook on Statistical Disclosure Control Version 1.01

Measuring Information Loss

Page 16: Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier.

• In Stata (SPSS, SAS) using C++ plugins– Stata version 9 or >– Log file for easy replication of procedure– Informative output

• Or command-line (plugins with “data server”)• Why Stata (SPSS/SAS)?

– Because most countries use/know these software– Can use all tabulation and analysis functions

Developing SDC toolsProposal

Page 17: Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier.

Beta Interface

Page 18: Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier.

• Large, imperfect datasets in under resourced countries

• For use by official data producers in developing countries (IHSN objective)

• Relevant for other users as well• Free to all; public source code

Target use

Page 19: Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier.

• Testing, “calibrating” and documenting– Cornell + IHSN + selected countries

• Development/implementation of training and TA program– Detailed documentation and guidelines– Reference manual and training materials

• Possibly launched before end of the year (IHSN website)

• Participation of others welcome

Work Program for 2009

Page 20: Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier.

• Adding to the Tools to facilitate data access in developing countries:– Tools

• Metadata Editor• CDROM/HTML developer• Web Based National Data Archives• Question Bank

– Guidelines• Data Dissemination• Documentation Guide• Survey Quality Assessment Framework

Page 21: Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier.

www.ihsn.org

Thank you.

The End