Page 1
SHARINGSENSITIVEDATAWITHCONFIDENCE:THEDATATAGSSYSTEM
MercèCrosas,Ph.D.ChiefDataScienceandTechnologyOfficerIQSSHarvardUniversity
MichaelBar-SinaiPhDcandidateinComputerScienceattheBen-GurionUniversityoftheNegev,IsraelFellowattheInstituteforQuantitativeSocialScienceatHarvardUniversity.
LatanyaSweeneyProfessorofGovernmentandTechnologyinResidenceDirectorofDataPrivacyLabHarvardUniversity
Page 2
Datasharing: goodforyouandgoodfortheworld
Page 3
Datasharing: goodforyouandgoodfortheworld
ResearchersGetcreditfortheirdata
Page 4
Datasharing: goodforyouandgoodfortheworld
ResearchersGetcreditfortheirdata
PublishersandJournals
Verifypublishedwork
Page 5
Datasharing: goodforyouandgoodfortheworld
ResearchersGetcreditfortheirdata
PublishersandJournals
Verifypublishedwork
Federalfundingagencies
Makepublicassets
accessible
Page 6
Datasharing: goodforyouandgoodfortheworld
ResearchersGetcreditfortheirdata
PublishersandJournals
Verifypublishedwork
Federalfundingagencies
Makepublicassets
accessible
ScienceValidate,reuseandextend
previouswork
Page 7
dataverse.org
Open-sourcesoftwaredevelopedatHarvard’sIQSSsince2006Usedtoshare,publish,citeandarchiveresearchdata
Installedin12sitesworldwideServing100sofuniversitiesandorganizations
Page 9
HarvardDataverse:dataverse.harvard.eduStartedasacommunityrepositoryforSocialScienceNowopentoallresearchfieldsandallresearchers
Morethan1300dataversesMorethan59,000datasets
Morethan1,500,000downloads
Page 10
DataRepositoriesvs RepositorySoftware
Page 11
DataRepositoriesvs RepositorySoftware
Page 12
DataRepositoriesvs RepositorySoftware
Page 13
DataRepositoriesvs RepositorySoftware
Page 14
But,existingcommunityrepositoriesdonotsupportsensitivedata
Page 15
“UserUploadsmustbevoidofallidentifiableinformation,suchthatre-identificationofanysubjectsfromtheamalgamationoftheinformationavailablefromallofthematerials(acrossdatasetsanddataverses)uploadedunderanyoneauthorand/orusershouldnotbepossible.”
Page 16
“SubmitterrepresentsandwarrantsthattheContentdoesnotcontainanyinformation(i)whichidentifies,orwhichcanbeusedinconjunctionwithotherpubliclyavailableinformationtopersonallyidentify,anyindividual;”
Page 17
“IfyouaresubmittinghumansequencestoGenBank,donotincludeanydatathatcouldrevealthepersonalidentityofthesource.Itisourassumptionthatyouhavereceivedanynecessaryinformedconsentauthorizationsthatyourorganizationsrequirepriortosubmittingyoursequences.”
GenBank
Page 18
HOWCANWEMAXIMIZESHARINGSENSITIVEDATAWHILEBEINGMINDFULOFPRIVACY?
Page 19
SweeneyL,CrosasM,Bar-SinaiM.SharingSensitiveDatawithConfidence:TheDataTagsSystem.TechnologyScience.2015101601.October16,2015.http://techscience.org/a/2015101601
Page 20
Adatatagisasetofsecurityfeaturesandaccessrequirementsforfilehandling
Page 21
Adatatagisasetofsecurityfeaturesandaccessrequirementsforfilehandling
Adatatagsrepositoryisonethatstoresandsharesdatafilesinaccordancewithastandardizedandorderedlevelsofsecurityandaccessrequirements.
Page 22
ADataTagsRepositorymustsatisfythefollowingconditions:
Page 23
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
Page 24
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
2. Eachfileintherepositorymusthaveoneandonlyonedatatag
Page 25
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
2. Eachfileintherepositorymusthaveoneandonlyonedatatag
a. additionalrequirementscannotweakenthefilesecurity
Page 26
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
2. Eachfileintherepositorymusthaveoneandonlyonedatatag
a. additionalrequirementscannotweakenthefilesecurity
b. andcannotrequiredthesameormoresecuritythanamore
restrictivedatatag
Page 27
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
2. Eachfileintherepositorymusthaveoneandonlyonedatatag
a. additionalrequirementscannotweakenthefilesecurity
b. andcannotrequiredthesameormoresecuritythanamore
restrictivedatatag3. Arecipientofafilefromtherepositorymust:
Page 28
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
2. Eachfileintherepositorymusthaveoneandonlyonedatatag
a. additionalrequirementscannotweakenthefilesecurity
b. andcannotrequiredthesameormoresecuritythanamore
restrictivedatatag3. Arecipientofafilefromtherepositorymust:
a. satisfyfile’saccessrequirements,
Page 29
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
2. Eachfileintherepositorymusthaveoneandonlyonedatatag
a. additionalrequirementscannotweakenthefilesecurity
b. andcannotrequiredthesameormoresecuritythanamore
restrictivedatatag3. Arecipientofafilefromtherepositorymust:
a. satisfyfile’saccessrequirements,
b. producesufficientcredentialsasrequested,
Page 30
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
2. Eachfileintherepositorymusthaveoneandonlyonedatatag
a. additionalrequirementscannotweakenthefilesecurity
b. andcannotrequiredthesameormoresecuritythanamore
restrictivedatatag3. Arecipientofafilefromtherepositorymust:
a. satisfyfile’saccessrequirements,
b. producesufficientcredentialsasrequested,
c. andagreetoanytermsofuserequiredtoacquirethefile.
Page 31
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
2. Eachfileintherepositorymusthaveoneandonlyonedatatag
a. additionalrequirementscannotweakenthefilesecurity
b. andcannotrequiredthesameormoresecuritythanamore
restrictivedatatag3. Arecipientofafilefromtherepositorymust:
a. satisfyfile’saccessrequirements,
b. producesufficientcredentialsasrequested,
c. andagreetoanytermsofuserequiredtoacquirethefile.4. Providestechnologicalguaranteesforrequirements1,2and3.
Page 32
DatatagsLevelsTagType Description SecurityFeatures AccessRequirements
Blue Public ClearstorageCleartransmission Open
Green Controlledpublic
ClearstorageCleartransmission
Email,OAuthverifiedregistration
Yellow Accountable ClearstorageEncryptedtransmit
Password,Registered,Approval,ClickDUA
Orange Moreaccountable
EncryptedstorageEncryptedtransmit
Password,Registered,Approval,SignedDUA
Red Fullyaccountable
EncryptedstorageEncryptedtransmit
Two-factorauthentication,Approval,SignedDUA
Crimson Maximallyrestricted
MultiEncryptstoreEncryptedtransmit
Two-factorauthentication,Approval,SignedDUA
Page 33
DATATAGSWITHHARVARDDATAVERSE
Page 34
Level1:Nosensitivedata;opendata
Level1:De-identifieddata
Level2:ConfidentialinformationbyUniversitystandards;nomaterialharm
Level3:Confidentialinformationthatcouldcausematerialharm(non-level4FERPA)
Level4:Highriskconfidentialinformation(SSN)
Level5Informationthatwouldcausesevereharm
DataTagsvsHarvardSecurityLevels
Page 35
Dataverses,Datasets,DataFilesandDataTags
ADatatagisassignedtoeachDataFile(nottotheDataset)
Page 36
DataTagsWorkflowwithDataverse
http://datatags.orghttp://privacytools.seas.harvard.edu
Page 37
DataTagsWorkflowwithDataverse
DataFileIngestion
http://datatags.orghttp://privacytools.seas.harvard.edu
Page 38
DataTagsWorkflowwithDataverse
DataFileIngestion
http://datatags.orghttp://privacytools.seas.harvard.edu
AutomaticInterview
ReviewBoardApproval
Page 39
DataTagsWorkflowwithDataverse
DataFileIngestion
http://datatags.orghttp://privacytools.seas.harvard.edu
AutomaticInterview
ReviewBoardApproval
Page 40
DataTagsWorkflowwithDataverse
DataFileIngestion
SensitiveDataset
http://datatags.orghttp://privacytools.seas.harvard.edu
AutomaticInterview
ReviewBoardApproval
Page 41
DataTagsWorkflowwithDataverse
DataFileIngestion
SensitiveDataset
DirectAccess
http://datatags.orghttp://privacytools.seas.harvard.edu
AuthorizedSignedDUA
AutomaticInterview
ReviewBoardApproval
Page 42
DataTagsWorkflowwithDataverse
DataFileIngestion
SensitiveDataset
DirectAccess
PrivacyPreservingAccess
http://datatags.orghttp://privacytools.seas.harvard.edu
AuthorizedSignedDUA
AutomaticInterview
ReviewBoardApproval
Page 43
ACuratorModelforPrivacy-PreservingAnalysis
Acknowledgement:Honaker,J.andNissim,K.,DataPrivacyToolsProject
DifferentiallyPrivatestatistics(summaries,causalinference,regression,interactivequeries)
Page 44
CredentialsandRetrievalinDataverse
DataFilenotrestrictedGuestbook–Emailtoaccess
DataFilerestricted;Dataverse/InCommonaccount;Requestaccess;ClickDUA
DataFilerestricted;Dataverse/InCommonaccount;Requestaccess;SignDUA
DataFilerestricted;InCommonaccount;Requestaccess;Two-FactorauthenticationSignDUA
Page 45
OTHERTYPEOFDATATAGSREPOSITORIES
Page 46
Betty:SoleResearcher
• Receivedconsentfromparticipants• Repositoryforsharinghighly
sensitivedata(notnecessarilyHarvardDataverse)
Page 47
Betty:GlobalResearchRepositoryIngestion and
Decision-making Knowledge
IRB determination or an interview system.
Codification and Infrastructure
Blue, Green, Yellow, Orange, Red, Crimson.
Credentials and Retrieval
Different files may additionally require specific terms of use based on legal or regulatory requirements or adopted best practices.
(SameusecaseasDataverse)
Page 48
Adam:LargeMedicalResearchGroup
• Repositoryforsharinglocaldata• Repositoryforpublisheddata• Repositoryforsharingwith
collaborators
Page 49
Adam:LargeMedicalResearchGroup
Page 50
Diane:MultinationalCorporation
• Cloudcontainsdatafromallovertheworld,collectedunderavarietyofterms,subjecttodifferentlaws
• Repositorythatenforcesrequirementsonemployeeaccess
Page 51
Diane:MultinationalCorporation
Page 52
Charles:InstitutionalReviewBoard
• Documentcommitteedecisions• Recommendhandlingbasedon
priordecisions
Page 53
Charles:InstitutionalReviewBoard
Page 54
Howtechnologyimpactshumans.
DATA
Page 55
Howtechnologyimpactshumans.
DATA
Page 56
Howtechnologyimpactshumans.
DATA
DirectDepositDirectTagging
Page 57
KhannaA.Facebook'sPrivacyIncidentResponse:astudyofgeolocationsharingonFacebookMessenger.TechnologyScience.2015081101.August11,2015.http://techscience.org/a/2015081101
techscience.org
Page 58
KhannaA.Facebook'sPrivacyIncidentResponse:astudyofgeolocationsharingonFacebookMessenger.TechnologyScience.2015081101.August11,2015.http://techscience.org/a/2015081101
techscience.org
Page 59
Published2015-09-29
SweeneyL,YooJ.De-anonymizingSouthKoreanResidentRegistrationNumbersSharedinPrescriptionData.TechnologyScience.2015092901.September29,2015.http://techscience.org/a/2015092901
techscience.org
Page 61
ADatataggingtoolneeds:
• FormaldescriptionofaDatatag– Capturethedatahandlingpolicyofthetag– Capturethe“stricter-than”ordering
• Interviewcreationtool– Supportuser-friendlyinterviews– Decideonthedatatagbasedontheanswersonly
Page 62
FormalDescriptionofaDatatag
• Modeldatahandlingpoliciesasasetoforthogonalaspects– Storageencryption,accessrequirements…
• Describeimplementationoptionsforeachaspect;orderimplementationsfromlenienttostrict– Clear<Encrypted<MultiEncrypt
Page 63
DataHandlingPolicySpace
Page 64
DataHandlingPolicySpace
Page 65
Tags:TagsSpacefile(.ts)
• Describeatagspace
• Conveniencefeatures:hierarchy,“slots”ofdifferenttypes,top-downdesignsupport,comments…
ScreenshotfromactualAtompackage:GalMaman,MatanToledano,BGU
Page 66
ComprehensionAid:Visualization
Page 67
ComprehensionAid:Visualization
Page 68
ComprehensionAid:Visualization
Page 69
FindingtheRightTag–DecisionGraph
• Directed,AcyclicGraph
• NodeTypes:– Ask– Set– Convenience:Call,End,Reject,Todo
Page 70
FindingtheRightTag–DecisionGraph
ScreenshotfromactualAtompackage:GalMaman,MatanToledano,BGU
Page 71
InterviewVisualization
Interviewcredit:TheDataPrivacyLab@Harvard(LatanyaSweeney,SeanHooley),BerkmanCtr.forInternetandSociety(AlexandraWood,DavidO'Brien,clinicalstudents),IQSS(MercèCrosas,MichaelBar-Sinai).PartofthePrivacytoolsforsharingresearchdataproject
Page 72
InterviewVisualization
Interviewcredit:TheDataPrivacyLab@Harvard(LatanyaSweeney,SeanHooley),BerkmanCtr.forInternetandSociety(AlexandraWood,DavidO'Brien,clinicalstudents),IQSS(MercèCrosas,MichaelBar-Sinai).PartofthePrivacytoolsforsharingresearchdataproject
Page 73
InterviewVisualization
Interviewcredit:TheDataPrivacyLab@Harvard(LatanyaSweeney,SeanHooley),BerkmanCtr.forInternetandSociety(AlexandraWood,DavidO'Brien,clinicalstudents),IQSS(MercèCrosas,MichaelBar-Sinai).PartofthePrivacytoolsforsharingresearchdataproject
Page 74
InterviewontheWeb
Page 75
InterviewontheWeb
Page 76
InterviewontheWeb
Page 77
InterviewontheWeb
Page 78
InterviewontheWeb
Interviewavailableatdatatags.org
Page 79
DecisionGraphPoints
Page 80
DecisionGraphPoints
• Familiar“interviewwithaspecialist”metaphor
Page 81
DecisionGraphPoints
• Familiar“interviewwithaspecialist”metaphor
• Implicitlydescribelogicinference
Page 82
DecisionGraphPoints
• Familiar“interviewwithaspecialist”metaphor
• Implicitlydescribelogicinference
Page 83
DecisionGraphPoints
• Analysis:DetectionofIndependentparts
Page 84
DecisionGraphPoints
• Analysis:DetectionofIndependentparts
Page 85
DecisionGraphPoints
• Analysis:DetectionofIndependentparts
• Queries,suchas“whatseriesofanswerswillcreateadatatagsthatallowsclearstorage?”
Page 86
DecisionGraphPoints
• Optimizations
ExamplecreatedbyEyalBen-Simon,BGU
Page 87
DecisionGraphPoints
• Optimizations
ExamplecreatedbyEyalBen-Simon,BGU
Page 88
DecisionGraphPoints
• Optimizations
ExamplecreatedbyEyalBen-Simon,BGU
Page 89
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
Page 90
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
Page 91
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
Page 92
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
Page 93
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
Page 94
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
Page 95
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
Page 96
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
Page 97
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
Page 98
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
Page 99
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
Page 100
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
Page 101
FutureoftheTagsTool
• Updatewebinterviewapplication– Includeuploadandinspectionfeatures
• On-linecollaborationenvironment– A-laGoogledocs?
Page 102
THANKSMercèCrosas,MichaelBar-Sinai,LatanyaSweeney