The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

Post on 18-Jan-2018

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Enclave Mission To Promote access to sensitive NIST micro dataTo Promote access to sensitive NIST micro data –Serves mandate of TIP to “accelerate the development of high- risk, transformative research targeted to address key societal challenges.” –NIST has a unique source of innovation data which researchers can use to study: Entrepreneurship & innovationEntrepreneurship & innovation Early stage technology developmentEarly stage technology development Commercialisation of high-risk R&DCommercialisation of high-risk R&D To Protect ConfidentialityTo Protect Confidentiality –Technical –Legal –Organizational –Statistical To Archive, Index and Curate ATP Micro- dataTo Archive, Index and Curate ATP Micro- data

Transcript

The NORC Data Enclave for The NORC Data Enclave for Sensitive MicrodataSensitive Microdata

Timothy M. MulcahyTimothy M. MulcahySenior Research Scientist, NORC/University of Senior Research Scientist, NORC/University of

Chicago, Chicago, mulcahy-tim@norc.uchicago.edumulcahy-tim@norc.uchicago.edu

OverviewOverview• Enclave MissionEnclave Mission• Data ProtectionData Protection• Metadata DocumentationMetadata Documentation• Portfolio ApproachPortfolio Approach• Focus on Research Focus on Research

Collaboration/Developing MetricsCollaboration/Developing Metrics• Current StatusCurrent Status• SummarySummary

Enclave MissionEnclave Mission• To Promote access to sensitive NIST micro To Promote access to sensitive NIST micro

datadata– Serves mandate of TIP to “accelerate the development Serves mandate of TIP to “accelerate the development

of high-risk, transformative research targeted to address of high-risk, transformative research targeted to address key societal challenges.” key societal challenges.”

– NIST has a unique source of innovation data which NIST has a unique source of innovation data which researchers can use to study:researchers can use to study:• Entrepreneurship & innovationEntrepreneurship & innovation• Early stage technology developmentEarly stage technology development• Commercialisation of high-risk R&DCommercialisation of high-risk R&D

• To Protect ConfidentialityTo Protect Confidentiality– TechnicalTechnical– LegalLegal– OrganizationalOrganizational– StatisticalStatistical

• To Archive, Index and Curate ATP Micro-To Archive, Index and Curate ATP Micro-datadata

What’s in it for NIST?What’s in it for NIST?• Researcher access to database to Researcher access to database to

examine entrepreneurship and firm examine entrepreneurship and firm behaviorbehavior

• Development of research community, Development of research community, including graduate students (and including graduate students (and possibly undergraduates)possibly undergraduates)

• High quality research => more insights High quality research => more insights into value added of ATP/TIP programinto value added of ATP/TIP program

a)a) High quality analysis leverages federal investmentHigh quality analysis leverages federal investmentb)b) Metadata documentation improves scientific qualityMetadata documentation improves scientific quality

Ideal SystemIdeal System• SecureSecure• FlexibleFlexible• Low CostLow Cost• Meet Replication standardMeet Replication standard

– The only way to understand and evaluate an The only way to understand and evaluate an empirical analysis fully is to know the exact empirical analysis fully is to know the exact process by which the data were generatedprocess by which the data were generated

– Replication dataset include all information Replication dataset include all information necessary to replicate empirical resultsnecessary to replicate empirical results

– Metadata crucial to meet the standardMetadata crucial to meet the standard• Composed of documentation and structured metadataComposed of documentation and structured metadata• Undocumented data are uselessUndocumented data are useless

• Create foundation for metadata Create foundation for metadata documentation and extend data lifecycledocumentation and extend data lifecycle

Metadata & Survey CycleMetadata & Survey Cycle

Data collection is not a static process – it’s a lifecycleData collection is not a static process – it’s a lifecycle It dynamically evolved across time and involves many It dynamically evolved across time and involves many

playersplayers It extends to aggregate data to reach decision makersIt extends to aggregate data to reach decision makers Metadata are crucial to capture knowledgeMetadata are crucial to capture knowledge

*Exhibit Courtesy of Chuck Humphrey*Exhibit Courtesy of Chuck Humphrey

NORC Data Enclave: MechanicsNORC Data Enclave: Mechanics1.1. Data ProtectionData Protection

a)a) Already collect data for multiple statistical Already collect data for multiple statistical agencies (BLS, Federal Reserve (IRS data), agencies (BLS, Federal Reserve (IRS data), EIA, NSF/SRS etc.) => safeguards in placeEIA, NSF/SRS etc.) => safeguards in place

b)b) NIST approved IT security plan NIST approved IT security plan 2.2. Provision of access – a portfolio approachProvision of access – a portfolio approach

a)a) Statistical protection (statistical)Statistical protection (statistical)b)b) Researcher training (Educational)Researcher training (Educational)c)c) Dissemination to researcher community Dissemination to researcher community

(Operational)(Operational)d)d) Agency-specific data protection requirements Agency-specific data protection requirements

(Legal)(Legal)

Statistical, Technical, Legal & Statistical, Technical, Legal & Operational Controls Operational Controls

Utility Confidentiality

Data ProtectionData Protection

Data ProtectionData ProtectionThe Data Enclave is fully compliant with The Data Enclave is fully compliant with DOC IT Security Program Policy, Section DOC IT Security Program Policy, Section 6.5.2, the Federal Information Security 6.5.2, the Federal Information Security Management Act, provisions of Management Act, provisions of mandatory Federal Information mandatory Federal Information Processing Standards (FIPS) and all Processing Standards (FIPS) and all other applicable NIST Data IT system other applicable NIST Data IT system and physical security requirements. and physical security requirements. 

IT SecurityIT Security• Encrypted connection with the data enclave using virtual Encrypted connection with the data enclave using virtual

private network (VPN) technology. VPN technology enables private network (VPN) technology. VPN technology enables the data enclave to prevent an outsider from reading the the data enclave to prevent an outsider from reading the data transmitted between the researcher’s computer and data transmitted between the researcher’s computer and NORC’s network. NORC’s network.

• Users access the data enclave from specific, pre-defined IP Users access the data enclave from specific, pre-defined IP addresses. addresses.

• Citrix’s Web-based technology. Citrix’s Web-based technology. – All applications and data run on the server at the data enclave. All applications and data run on the server at the data enclave. – Data enclave can prevent the user from transferring any data Data enclave can prevent the user from transferring any data

from data enclave to a local computer. from data enclave to a local computer. – Data files cannot be downloaded from the remote server to the Data files cannot be downloaded from the remote server to the

user’s local PC. user’s local PC. – User cannot use the “cut and paste” feature in Windows to User cannot use the “cut and paste” feature in Windows to

move data from the Citrix session. move data from the Citrix session. – User is prevented from printing the data on a local computer. User is prevented from printing the data on a local computer.

• Audit logs and audit trailsAudit logs and audit trails

Provision of AccessProvision of Access

2413

Menu Options for Agency X (and Study Y)

1,42,312Licensing (different levels of anonymization)

None13,53 withcustomization

Onsite Access

252None

Remote Access

Educational (1,2,3,4)

Operational (1,2,3,4,5)

Statistical (1,2,3,4,5)

LegalOptions (1,2,3,4)

Sample Modalities

Provision of Research AccessProvision of Research Access

Provision of Research Provision of Research AccessAccessTwo Approaches:Two Approaches:

Remote accessRemote access– External researchers access data via an encrypted External researchers access data via an encrypted

connection with the data enclave using VPNconnection with the data enclave using VPN– RSA Smart Card RSA Smart Card – Restrict user access from specific, pre-defined IP addressesRestrict user access from specific, pre-defined IP addresses– Citrix technology to access applications – configured so no Citrix technology to access applications – configured so no

downloads, cut and paste or print possibledownloads, cut and paste or print possible Onsite accessOnsite access

– Secure room at NORC site (Bethesda, MD & Chicago, IL)Secure room at NORC site (Bethesda, MD & Chicago, IL)– Secure machinesSecure machines– Video cameraVideo camera– Audit logs and trailsAudit logs and trails– WorkspacesWorkspaces

Legal and Statistical Legal and Statistical ProtectionsProtections

LegalLegal– Access Agreement signed by institutional and individual Access Agreement signed by institutional and individual

researcherresearcher– Approved institutionsApproved institutions– Access limited to data requested and authorizedAccess limited to data requested and authorized

StatisticalStatistical– Remove obvious identifiers and replace with unique Remove obvious identifiers and replace with unique

identifiersidentifiers– Statistical techniques chosen by agency (recognising Statistical techniques chosen by agency (recognising

data quality issues)data quality issues)Note: Both are at discretion of agency and can go above Note: Both are at discretion of agency and can go above

and beyond the minimum level of protectionand beyond the minimum level of protection

Researcher TrainingResearcher TrainingSubjectsSubjects

– Basic confidentiality Basic confidentiality – Agency specific (joint with agency)Agency specific (joint with agency)– Dataset specific (joint with agency)Dataset specific (joint with agency)

LocationsLocations– OnsiteOnsite– Web-basedWeb-based– Researcher locations (AAEA, JSM, AOM, ASA, ASSA, Researcher locations (AAEA, JSM, AOM, ASA, ASSA,

NBER summer institute)NBER summer institute)Note: The training is designed to go above and beyond Note: The training is designed to go above and beyond

current practice in terms of both frequency and current practice in terms of both frequency and coveragecoverage

Data Enclave Training Agenda NORC – University of Chicago 4350 East West Highway Suite 800 Bethesda, MD 20814 Day 1 8:30-9:00 Welcome (NASS/ERS/NORC) 9:00-10:30 Data enclave navigation (NORC) 10:30-10:45 Break 10:45-12:15 Metadata documentation (NORC) 12:15-1:15 Lunch 1:15-2:45 Confidentiality and data disclosure (NORC) 2:45-3:00 Break 3:00-4:00 ARMS survey overview (ERS) –ERS Staff 4:00-4:10 Confidentiality agreement signing

Data Enclave Training Agenda NORC – University of Chicago 4350 East West Highway Suite 800 Bethesda, MD 20814 Day 2 8:30-9:00 Data files and documentation (Data Producer) 9:00-10:00 Sampling and weights (Data Producer) 10:00-10:15 Break 10:15-11:15 Item quality control and treatments for non-response (Data Producer) 11:15-12:15 Statistical testing (Data Producer) 12:15-12:30 Closing and adjournment

Researcher Researcher ResponsibilitiesResponsibilities

• Serve Agency MissionServe Agency Mission• Metadata documentationMetadata documentation

– CodeCode– Information about variablesInformation about variables

• Post research outputPost research output• Cite sourcesCite sources• Evaluation and feedbackEvaluation and feedback

Developing a Virtual Developing a Virtual CollaboratoryCollaboratory

• Value AddedValue Added– Serve Agency MissionServe Agency Mission– Metadata documentationMetadata documentation

• CodeCode• Information about variablesInformation about variables

• Policy RelevancePolicy Relevance– Research outputResearch output

• Cite sourcesCite sources• Evaluation and feedbackEvaluation and feedback

Logging OnLogging OnThe browser downloads the .ica file and launches the Citrix Client

ENCLAVE LEVEL PORTAL

ENCLAVE LEVEL PORTAL

SITEMENU CONTENT

DISPLAYAREA

ENCLAVE LEVEL FEATURESInforms users about

enclave updates, events, publications, new features,

etc.

Guidelines and technical assistance for new users

Calendar of events such as conferences, data release,

trainings,….

Background information on the data enclave

ENCLAVE LEVEL FEATURES

Overview and catalog of surveys available in the

enclave

General information on clients or survey series

ENCLAVE LEVEL FEATURES

Access to enclave documentation and public

survey documents (reports, questionnaires, no data!).

This Information consists of files organized in folders. Can also be searched by

categories.

A wiki based knowledge area maintained by the

enclave managers. Provides FAQ, technical

info, tips & trick,…

Issue tracking system for users to request technical

assistance from the enclave staff or report

issues with the survey data.

Collaborative features reserved for data enclave

managers (not be visible to regular users)

GROUP LEVEL FEATURES

SummarySummary• Goal: To promote access to Goal: To promote access to

sensitive ATP micro data while sensitive ATP micro data while protecting confidentialityprotecting confidentiality

• Benefits:Benefits:– Secure, low-cost approach to leveraging Secure, low-cost approach to leveraging

ATP’s investment in data collectionATP’s investment in data collection– Archiving, Indexing, and Curation of ATP Archiving, Indexing, and Curation of ATP

Micro-dataMicro-data– Applicable and Customizable to agency Applicable and Customizable to agency

needs and requirementsneeds and requirements

Next StepsNext Steps• Developing metricsDeveloping metrics

– Number of interactionsNumber of interactions– Additions to the wiki, code, combined Additions to the wiki, code, combined

variables, macrosvariables, macros– Research output (how to quantify)Research output (how to quantify)

• Developing incentivesDeveloping incentives– Establish leaders Establish leaders – External communicationsExternal communications

Contact InformationContact Information• Timothy M. MulcahyTimothy M. Mulcahy• Mulcahy-Tim@norc.uchicago.eduMulcahy-Tim@norc.uchicago.edu• WebsiteWebsite

– http://dataenclave.norc.orghttp://dataenclave.norc.org

top related