1 QUAKE: Quadruple Key and Encryption C raig A.M ason Shihfen Tu Q uansheng Song University ofM aine Centers for Disease Control and Prevention Third Annual National Early Hearing Detection and Intervention Conference, Washington, DC, February, 2004.
47
Embed
1 QUAKE: Quadruple Key and Encryption Centers for Disease Control and Prevention Third Annual National Early Hearing Detection and Intervention Conference,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
QUAKE: Quadruple Key and Encryption
Craig A. Mason Shihfen Tu Quansheng SongUniversity of Maine
Centers for Disease Control and Prevention Third Annual National Early Hearing Detection and Intervention Conference,
Washington, DC, February, 2004.
2
Background
University of Maine research team involved in research in informatics and developmental epidemiology
Deterministic Linkage A series of common identifying fields are
selected across two databases Records are matched across databases
based on these fields Two records must have identical values
across all of these fields in order to be linked “John”, “Bartholomew”, “Szapoznick” “Jon”, “Bartholomew”, “Szapoznick”
6
Probabilistic Linkage Two records do not have to match
across all fields in order to be linked For a possible pairing, a value is
calculated that reflects the likelihood that the two records are (or are not) the same person
Based upon the frequencies of values and the quality of the data
7
Reliability of data fields Greater reliability results in increased odds of a correct
match If a field is pure noise, correct matches will be random
Frequency of field values The more common the value in a field, the greater the odds
that the records will be erroneously matched E.g., a match based on the name Szapocznik is more likely to
reflect a correct match than is a match on the name Smith Number of matches
The greater the number of individuals in one database that also appear in the other database, the greater probability of linkage across databases.
If two databases have no individuals in common, the probability of a linkage across the databases must be zero
Factors Influencing Probabilistic Linkage
8
Statistician’s Anonymous
“I’m David, and I’m a bean-counter”
9
Encryption Ecretsay odecay Information is coded so that true values
are not obvious Ancient field Modern era focus on electronic
transmission of sensitive data Notice the little yellow padlock in the bottom
corner of your browser when shopping on e-bay?
10
Encryption Techniques Asymmetric or public key
Different key for encryption and decryption Encryption key is public Decryption key is private Decryption key cannot be derived from encryption
key Provide security of data transmission
Anyone can use the public key to code a message Only I can decrypt it
Typically based on product of large primes
11
Challenge of Factorization
Factors hard to find But once you know one, the other is easy to find
Public Key: 114,381,625,757,888,867,669,235,779,976,146,612,010,218,296,721,242,362,562,561,842,935,706,935,245,733,897,830,597,123,563,958,705,058,989,075,147,599,290,026,879,543,541
Private Key Based on Factors:3,490,529,510,847,650,949,147,849,619,903,
898,133, 417,764,638,493,387,843,990,820,577
and
32,769,132,993,266,709,549,961,988,190,834,461,
413,177,642,967,992,942,539,798,288,533
12
Encryption Techniques Symmetric key
Same key for encryption and decryption Key is not made public
Secret key - One Key to Rule Them All More secure than asymmetric key
Nothing suggesting a possible key is published Asymmetric key must be 6 to 30 times longer
than symmetric key for equivalent security Useful if you know in advance exactly who
will want to encrypt a message to you
13
Encryption Techniques Security often described in terms of bits
128 bit encryption indicated 2128 possible keys 3,402,823,669,209,384,634,633,746,074,300,000,0
00,000,000,000,000,000,000,000,000,000,000 A lot of possibilities…
Widespread use of 1024 and 2048 bit encryption on the horizon
128 bit symmetric = 2304 bit asymmetric (Cryptography, p.166)
14
A Dirty Little Secret..
These big numbers hide the fact that the security is only as good as the algorithm Think reliability of DNA testing Plaintext attack (and its variations)
If the only unique name in the data set is Szapocznik
And the only unique variation in the encrypted data set is “X*GFfF825d=“…..
The key can be resolved
15
A Dirty Little Secret..
Even without the key, you can determine my grade Some computational or physical wall between
decrypted and encrypted data
SCREENING DATA SCHOOL DATALast Name First Name Last Name First Name Grade
Identifiers are encrypted into one of multiple values Lack of uniqueness increases challenge of decryption
Craig
93812….2431
Encryption Key
H3~f9(-dor9Dj1D[d dfR1”d/Gor
18
That’s nice, but how can this help with data
linkage?
All right. But apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, the fresh water system, and public health… What have the Romans ever done for us?
--- Reg, spokesman for the People’s Front of Judea
Monty PythonLife of Brian
(and Martin White, UC Berkeley)
19
The Politics of Linkage
Two data systems contain information on same individuals Would like to link data for public health research
Service Data: Craig A. Mason…. School Data: Craig A. Mason….
20
Service Data: Craig A. Mason…. School Data: Craig A. Mason….
I may not want schools to know about health services I have received
The Politics of Linkage
21
Service Data: Craig A. Mason…. School Data: Craig A. Mason….
What solution may allow data to be linked, yet prevent sources from seeing each other’s identifying data
The Politics of Linkage
22
Quake
QUAdruple Key and Encryption
Service Data: Craig A. Mason…. School Data: Craig A. Mason….
23
Quake
Requires algorithms to be reversible You can “undo” a process to come back to
original value
22;22
3515;1553
358;853
22
24
Quake
Requires algorithms to be commutative You get the same answer even if you do the
problem backwards
4631
3423
43
21
87
65;
5043
2219
87
65
43
21
1535;1553
835;853
25
Quake
052385043…9471 757260024…2512
Each provider selects their own unique encryption key that is used to encrypt identifiers prior to linkage
Service Data: Craig A. Mason…. School Data: Craig A. Mason….
26
Quake
850258434…3435
052385043…9471
420504763….8372
757260024…2512
Community members representing individuals in each dataset also select their own unique encryption keys
Service Data: Craig A. Mason…. School Data: Craig A. Mason….
Bring both encrypted files together on independent, non-networked machine
Each of the four parties enters their own key Respective files internally decrypted and linked New, de-identified linked file containing fields of
interest created Record of identifiers and keys electronically or
physically erased DoD 5220.22-M protocol
38
Linking Encrypted Files Benefits
Flexible linkage strategies (partial names, etc.) Easiest to perform Once completed no identifiers to enable
plaintext attack Issues
Process of encryption/decryption can be computationally demanding
Potential record of encrypted data and all keys Can be destroyed, but time consuming
39
Variation of Quake
Key: 052385043…9471 Key: 757260024…2512
Service Data: Craig A. Mason School Data: Craig A. Mason
Each provider selects own unique encryption key used to encrypt identifiers prior to linkage
40
Variation
Key: 052385043…9471 Key: 757260024…2512
Service Data: *Bj&!33t…. School Data: yy#K66….
Identifiers in their file encrypted with a 1:1 symmetric key
41
Service Data: *Bj&!33t….
Variation
Key: 052385043…9471 Key: 757260024…2512
School Data: yy#K66….
Parties then switch encrypted files If identifying fields in both files are all equal..
May be prone to variations of a plaintext attack Inclusion of additional records whose identifiers
contain random noise can nearly eliminate this risk
42
Service Data: Jf*72Coo….
Variation
Key: 052385043…9471 Key: 757260024…2512
School Data: Jf*72Coo….
Each party then applies their own key to the other parties already-encrypted file
Identifiers in each file will have the same value Can not determine key used by other source
43
Service Data: Jf*72Coo….
Variation
Key: 052385043…9471 Key: 757260024…2512
School Data: Jf*72Coo….
If files brought together by one of the parties They may be able to conduct a plaintext attack May then be able to determine key used by other
party Both files linked by trusted third party
44
Service Data: Jf*72Coo….
Variation
Key: 052385043…9471 Key: 757260024…2512
School Data: Jf*72Coo….
Again, may bring in community representatives
Linked Data: Jf*72Coo, Services, Grades
Final Linked Data: Services, Grades
45
Variation Link based upon the encrypted
identifier fields No need to decrypt files when linking Apply deterministic and probabilistic
algorithms to encrypted data No machine ever sees all keys
Final file contains no identifiers and only a limited number of fields of interest
46
Variation of Quake Issues
Requires 1:1 encryption algorithm Can be addressed, but adds level of
complexity Can not examine partial strings
Specific partial strings can be generated prior to encryption
Month of birth, day of birth First letter of first name
47
Advanced Linkage Protocols for Addressing Confidentiality
Concerns Encrypted Linkage Protocols
Unique encryption keys administered by each database administrator and community liaisons
No one at any time sees the other person’s identifiers Person conducting the linkage never sees any identifiers Resulting linked set includes no decrypted identifiers Resulting file can not be decoded, expanded, or relinked
without agreement and cooperation of all parties The community participates in the process
Technology that creates confidentiality concerns may provide means for reducing those concerns