Top Banner
The NORC Data Enclave The NORC Data Enclave for Sensitive Microdata for Sensitive Microdata Timothy M. Mulcahy Timothy M. Mulcahy Senior Research Scientist, Senior Research Scientist, NORC/University of Chicago, NORC/University of Chicago, mulcahy- mulcahy- [email protected] [email protected]
32

The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

Jan 18, 2018

Download

Documents

Enclave Mission To Promote access to sensitive NIST micro dataTo Promote access to sensitive NIST micro data –Serves mandate of TIP to “accelerate the development of high- risk, transformative research targeted to address key societal challenges.” –NIST has a unique source of innovation data which researchers can use to study: Entrepreneurship & innovationEntrepreneurship & innovation Early stage technology developmentEarly stage technology development Commercialisation of high-risk R&DCommercialisation of high-risk R&D To Protect ConfidentialityTo Protect Confidentiality –Technical –Legal –Organizational –Statistical To Archive, Index and Curate ATP Micro- dataTo Archive, Index and Curate ATP Micro- data
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

The NORC Data Enclave for The NORC Data Enclave for Sensitive MicrodataSensitive Microdata

Timothy M. MulcahyTimothy M. MulcahySenior Research Scientist, NORC/University of Senior Research Scientist, NORC/University of

Chicago, Chicago, [email protected]@norc.uchicago.edu

Page 2: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

OverviewOverview• Enclave MissionEnclave Mission• Data ProtectionData Protection• Metadata DocumentationMetadata Documentation• Portfolio ApproachPortfolio Approach• Focus on Research Focus on Research

Collaboration/Developing MetricsCollaboration/Developing Metrics• Current StatusCurrent Status• SummarySummary

Page 3: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

Enclave MissionEnclave Mission• To Promote access to sensitive NIST micro To Promote access to sensitive NIST micro

datadata– Serves mandate of TIP to “accelerate the development Serves mandate of TIP to “accelerate the development

of high-risk, transformative research targeted to address of high-risk, transformative research targeted to address key societal challenges.” key societal challenges.”

– NIST has a unique source of innovation data which NIST has a unique source of innovation data which researchers can use to study:researchers can use to study:• Entrepreneurship & innovationEntrepreneurship & innovation• Early stage technology developmentEarly stage technology development• Commercialisation of high-risk R&DCommercialisation of high-risk R&D

• To Protect ConfidentialityTo Protect Confidentiality– TechnicalTechnical– LegalLegal– OrganizationalOrganizational– StatisticalStatistical

• To Archive, Index and Curate ATP Micro-To Archive, Index and Curate ATP Micro-datadata

Page 4: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

What’s in it for NIST?What’s in it for NIST?• Researcher access to database to Researcher access to database to

examine entrepreneurship and firm examine entrepreneurship and firm behaviorbehavior

• Development of research community, Development of research community, including graduate students (and including graduate students (and possibly undergraduates)possibly undergraduates)

• High quality research => more insights High quality research => more insights into value added of ATP/TIP programinto value added of ATP/TIP program

a)a) High quality analysis leverages federal investmentHigh quality analysis leverages federal investmentb)b) Metadata documentation improves scientific qualityMetadata documentation improves scientific quality

Page 5: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

Ideal SystemIdeal System• SecureSecure• FlexibleFlexible• Low CostLow Cost• Meet Replication standardMeet Replication standard

– The only way to understand and evaluate an The only way to understand and evaluate an empirical analysis fully is to know the exact empirical analysis fully is to know the exact process by which the data were generatedprocess by which the data were generated

– Replication dataset include all information Replication dataset include all information necessary to replicate empirical resultsnecessary to replicate empirical results

– Metadata crucial to meet the standardMetadata crucial to meet the standard• Composed of documentation and structured metadataComposed of documentation and structured metadata• Undocumented data are uselessUndocumented data are useless

• Create foundation for metadata Create foundation for metadata documentation and extend data lifecycledocumentation and extend data lifecycle

Page 6: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

Metadata & Survey CycleMetadata & Survey Cycle

Data collection is not a static process – it’s a lifecycleData collection is not a static process – it’s a lifecycle It dynamically evolved across time and involves many It dynamically evolved across time and involves many

playersplayers It extends to aggregate data to reach decision makersIt extends to aggregate data to reach decision makers Metadata are crucial to capture knowledgeMetadata are crucial to capture knowledge

*Exhibit Courtesy of Chuck Humphrey*Exhibit Courtesy of Chuck Humphrey

Page 7: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

NORC Data Enclave: MechanicsNORC Data Enclave: Mechanics1.1. Data ProtectionData Protection

a)a) Already collect data for multiple statistical Already collect data for multiple statistical agencies (BLS, Federal Reserve (IRS data), agencies (BLS, Federal Reserve (IRS data), EIA, NSF/SRS etc.) => safeguards in placeEIA, NSF/SRS etc.) => safeguards in place

b)b) NIST approved IT security plan NIST approved IT security plan 2.2. Provision of access – a portfolio approachProvision of access – a portfolio approach

a)a) Statistical protection (statistical)Statistical protection (statistical)b)b) Researcher training (Educational)Researcher training (Educational)c)c) Dissemination to researcher community Dissemination to researcher community

(Operational)(Operational)d)d) Agency-specific data protection requirements Agency-specific data protection requirements

(Legal)(Legal)

Page 8: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

Statistical, Technical, Legal & Statistical, Technical, Legal & Operational Controls Operational Controls

Utility Confidentiality

Page 9: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

Data ProtectionData Protection

Page 10: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

Data ProtectionData ProtectionThe Data Enclave is fully compliant with The Data Enclave is fully compliant with DOC IT Security Program Policy, Section DOC IT Security Program Policy, Section 6.5.2, the Federal Information Security 6.5.2, the Federal Information Security Management Act, provisions of Management Act, provisions of mandatory Federal Information mandatory Federal Information Processing Standards (FIPS) and all Processing Standards (FIPS) and all other applicable NIST Data IT system other applicable NIST Data IT system and physical security requirements. and physical security requirements. 

Page 11: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

IT SecurityIT Security• Encrypted connection with the data enclave using virtual Encrypted connection with the data enclave using virtual

private network (VPN) technology. VPN technology enables private network (VPN) technology. VPN technology enables the data enclave to prevent an outsider from reading the the data enclave to prevent an outsider from reading the data transmitted between the researcher’s computer and data transmitted between the researcher’s computer and NORC’s network. NORC’s network.

• Users access the data enclave from specific, pre-defined IP Users access the data enclave from specific, pre-defined IP addresses. addresses.

• Citrix’s Web-based technology. Citrix’s Web-based technology. – All applications and data run on the server at the data enclave. All applications and data run on the server at the data enclave. – Data enclave can prevent the user from transferring any data Data enclave can prevent the user from transferring any data

from data enclave to a local computer. from data enclave to a local computer. – Data files cannot be downloaded from the remote server to the Data files cannot be downloaded from the remote server to the

user’s local PC. user’s local PC. – User cannot use the “cut and paste” feature in Windows to User cannot use the “cut and paste” feature in Windows to

move data from the Citrix session. move data from the Citrix session. – User is prevented from printing the data on a local computer. User is prevented from printing the data on a local computer.

• Audit logs and audit trailsAudit logs and audit trails

Page 12: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

Provision of AccessProvision of Access

Page 13: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

2413

Menu Options for Agency X (and Study Y)

1,42,312Licensing (different levels of anonymization)

None13,53 withcustomization

Onsite Access

252None

Remote Access

Educational (1,2,3,4)

Operational (1,2,3,4,5)

Statistical (1,2,3,4,5)

LegalOptions (1,2,3,4)

Sample Modalities

Provision of Research AccessProvision of Research Access

Page 14: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

Provision of Research Provision of Research AccessAccessTwo Approaches:Two Approaches:

Remote accessRemote access– External researchers access data via an encrypted External researchers access data via an encrypted

connection with the data enclave using VPNconnection with the data enclave using VPN– RSA Smart Card RSA Smart Card – Restrict user access from specific, pre-defined IP addressesRestrict user access from specific, pre-defined IP addresses– Citrix technology to access applications – configured so no Citrix technology to access applications – configured so no

downloads, cut and paste or print possibledownloads, cut and paste or print possible Onsite accessOnsite access

– Secure room at NORC site (Bethesda, MD & Chicago, IL)Secure room at NORC site (Bethesda, MD & Chicago, IL)– Secure machinesSecure machines– Video cameraVideo camera– Audit logs and trailsAudit logs and trails– WorkspacesWorkspaces

Page 15: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

Legal and Statistical Legal and Statistical ProtectionsProtections

LegalLegal– Access Agreement signed by institutional and individual Access Agreement signed by institutional and individual

researcherresearcher– Approved institutionsApproved institutions– Access limited to data requested and authorizedAccess limited to data requested and authorized

StatisticalStatistical– Remove obvious identifiers and replace with unique Remove obvious identifiers and replace with unique

identifiersidentifiers– Statistical techniques chosen by agency (recognising Statistical techniques chosen by agency (recognising

data quality issues)data quality issues)Note: Both are at discretion of agency and can go above Note: Both are at discretion of agency and can go above

and beyond the minimum level of protectionand beyond the minimum level of protection

Page 16: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

Researcher TrainingResearcher TrainingSubjectsSubjects

– Basic confidentiality Basic confidentiality – Agency specific (joint with agency)Agency specific (joint with agency)– Dataset specific (joint with agency)Dataset specific (joint with agency)

LocationsLocations– OnsiteOnsite– Web-basedWeb-based– Researcher locations (AAEA, JSM, AOM, ASA, ASSA, Researcher locations (AAEA, JSM, AOM, ASA, ASSA,

NBER summer institute)NBER summer institute)Note: The training is designed to go above and beyond Note: The training is designed to go above and beyond

current practice in terms of both frequency and current practice in terms of both frequency and coveragecoverage

Page 17: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

Data Enclave Training Agenda NORC – University of Chicago 4350 East West Highway Suite 800 Bethesda, MD 20814 Day 1 8:30-9:00 Welcome (NASS/ERS/NORC) 9:00-10:30 Data enclave navigation (NORC) 10:30-10:45 Break 10:45-12:15 Metadata documentation (NORC) 12:15-1:15 Lunch 1:15-2:45 Confidentiality and data disclosure (NORC) 2:45-3:00 Break 3:00-4:00 ARMS survey overview (ERS) –ERS Staff 4:00-4:10 Confidentiality agreement signing

Page 18: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

Data Enclave Training Agenda NORC – University of Chicago 4350 East West Highway Suite 800 Bethesda, MD 20814 Day 2 8:30-9:00 Data files and documentation (Data Producer) 9:00-10:00 Sampling and weights (Data Producer) 10:00-10:15 Break 10:15-11:15 Item quality control and treatments for non-response (Data Producer) 11:15-12:15 Statistical testing (Data Producer) 12:15-12:30 Closing and adjournment

Page 19: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

Researcher Researcher ResponsibilitiesResponsibilities

• Serve Agency MissionServe Agency Mission• Metadata documentationMetadata documentation

– CodeCode– Information about variablesInformation about variables

• Post research outputPost research output• Cite sourcesCite sources• Evaluation and feedbackEvaluation and feedback

Page 20: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

Developing a Virtual Developing a Virtual CollaboratoryCollaboratory

• Value AddedValue Added– Serve Agency MissionServe Agency Mission– Metadata documentationMetadata documentation

• CodeCode• Information about variablesInformation about variables

• Policy RelevancePolicy Relevance– Research outputResearch output

• Cite sourcesCite sources• Evaluation and feedbackEvaluation and feedback

Page 21: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

Logging OnLogging OnThe browser downloads the .ica file and launches the Citrix Client

Page 22: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

ENCLAVE LEVEL PORTAL

Page 23: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

ENCLAVE LEVEL PORTAL

SITEMENU CONTENT

DISPLAYAREA

Page 24: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,
Page 25: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

ENCLAVE LEVEL FEATURESInforms users about

enclave updates, events, publications, new features,

etc.

Guidelines and technical assistance for new users

Calendar of events such as conferences, data release,

trainings,….

Background information on the data enclave

Page 26: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

ENCLAVE LEVEL FEATURES

Overview and catalog of surveys available in the

enclave

General information on clients or survey series

Page 27: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

ENCLAVE LEVEL FEATURES

Access to enclave documentation and public

survey documents (reports, questionnaires, no data!).

This Information consists of files organized in folders. Can also be searched by

categories.

A wiki based knowledge area maintained by the

enclave managers. Provides FAQ, technical

info, tips & trick,…

Issue tracking system for users to request technical

assistance from the enclave staff or report

issues with the survey data.

Collaborative features reserved for data enclave

managers (not be visible to regular users)

Page 28: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,
Page 29: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

GROUP LEVEL FEATURES

Page 30: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

SummarySummary• Goal: To promote access to Goal: To promote access to

sensitive ATP micro data while sensitive ATP micro data while protecting confidentialityprotecting confidentiality

• Benefits:Benefits:– Secure, low-cost approach to leveraging Secure, low-cost approach to leveraging

ATP’s investment in data collectionATP’s investment in data collection– Archiving, Indexing, and Curation of ATP Archiving, Indexing, and Curation of ATP

Micro-dataMicro-data– Applicable and Customizable to agency Applicable and Customizable to agency

needs and requirementsneeds and requirements

Page 31: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

Next StepsNext Steps• Developing metricsDeveloping metrics

– Number of interactionsNumber of interactions– Additions to the wiki, code, combined Additions to the wiki, code, combined

variables, macrosvariables, macros– Research output (how to quantify)Research output (how to quantify)

• Developing incentivesDeveloping incentives– Establish leaders Establish leaders – External communicationsExternal communications

Page 32: The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

Contact InformationContact Information• Timothy M. MulcahyTimothy M. Mulcahy• [email protected]@norc.uchicago.edu• WebsiteWebsite

– http://dataenclave.norc.orghttp://dataenclave.norc.org