On Compression of Data Encrypted with Block Ciphers Demijan Klinc * Carmit Hazay † Ashish Jagmohan ** Hugo Krawczyk ** Tal Rabin ** * Georgia Institute of Technology ** IBM T.J. Watson Research Labs † Weizmann Institute and IDC
On Compression of Data Encrypted with Block CiphersDemijan Klinc* Carmit Hazay† Ashish
Jagmohan**
Hugo Krawczyk** Tal Rabin**
* Georgia Institute of Technology** IBM T.J. Watson Research Labs
† Weizmann Institute and IDC
Traditional ModelTransmitting redundant data over
insecure and bandwidth-constrained channel• Traditionally, data first compressed
and then encrypted key (k)
X
source
compress
encrypt
encoder
C(X) EK(C(X))
Traditional ModelWhat if encryptor and compressor
are two entities with different goals?• E.g., storage provider wants to
compress data to minimize storage space but does not have access to the key
Can we reverse the order of these steps?
Compression and Encryption in Reverse Order
key (k)
X
source
encrypt compress
Ek(X) C(Ek(X))
Does not
know k!
Can we encrypt first and only then compress without knowing the key?
Compression and Encryption in Reverse Order
For a fixed key, encryption scheme is a bijection, therefore the entropy is preserved• It follows that it is theoretically
possible to compress the source to the same level as before encryption
In practice, encrypted data appears to be random • Conventional compression techniques
do not yield desirable results
Compression and Encryption in Reverse Order
Fully homomorphic encryption shows that one can compress optimally without decrypting• Simply run the compression
algorithm on the plaintext
Fully homomorphic encryption supports addition and multiplication:E(m1), E(m2) → E(m1+m2)E(m1), E(m2) → E(m1∙m2)Stating differently:C, E(m) → E(C(m))
OutlinePreliminariesSource Coding with Side
InformationCompressing Stream CiphersCompressing Block CiphersSimulation resultsImpossibility Result
Private Key EncryptionTriple of algorithms:
(Gen,Enc,Dec)• Same key for encryption and
decryption
Security – CPA security (informally):• It should be infeasible to distinguish
an encryption of m from an encryption of m’
Private Key EncryptionTwo categories:• Stream ciphers
Plaintext encrypted one symbol at a time, typically by summing it with a key (XOR operation for binary alphabets), e.g., one-time pad
• Block ciphers Encryption is accomplished by means of
nonlinear mappings on input blocks of fixed length E.g., AES, DES
Binary Symmetric Channel
0
1
p
p
1-p
1-p
0
1
Communication model where each sent bit is flipped with probability p
Entropy is: H(p)= - (p log p +(1-p) log
(1-p))
X Y
Pr( Y = 0 | X = 0 ) = 1−pPr( Y = 0 | X = 1) = pPr( Y = 1 | X = 0 ) = pPr( Y = 1 | X = 1 ) = 1−p
Outline
PreliminariesSource Coding with Side
InformationCompressing Stream CiphersCompressing Block CiphersSimulation resultsImpossibility Result
Source Coding with Side Information
X
source
compress
decompressC(X) X
Y
X,Y : random variables over a finite alphabet with a joint probability distribution PXY
Goal: losslessly compress X with Y known only to the decoder
Source Coding with Side InformationFor sufficiently large block length,
this can be done at rates arbitrarily close to H[X|Y] [SlepianWolf73]• Non constructive theorem• Practical coding schemes use
constructions based on good linear error-correcting codes e.g. LDPC code [RichardsonUrbanke08]
Linear Error Correcting Codes
Error correcting codes:• Communication is over a noisy channel• Add redundancy to source to correct
errors
A linear code of length n and dimension r is a linear subspace of the vector space (F2)m
• Encoding: using generating matrix• Decoding: using parity check matrix
Linear Error Correcting CodesMinimum distance:• The weight of the lowest-weight
nonzero codeword
In order to correct i errors the minimum distance should be 2i+1
Linear Error Correcting CodesCosets:
Suppose that C is [m, r] linear code over F2 and that a is any vector in (F2)m
• Then the set a+C = {a+x | xC} is called a coset of C• Every vector of (F2)m is in some coset of C
• Every coset contains exactly 2r vectors• Two cosets are either disjoint
or equal
Source Coding with Side InformationExample:
Assume Y known to encoder and decoder Ham(X,Y)≤1
X
source
compress
decompressC(X) X
Y
Source Coding with Side InformationLet X=010, then Y{010, 011,
000, 110}Goal:
encode XY using less than 3 bits
How?Let e= XY, then e{000, 001, 010, 100} encoder sends index of coset in which e occurs
Source Coding with Side InformationLet C={Y,Y} be a linear code
with distance 3 that can fix one error
The space is partitioned into 4 cosets:
• Coset 1 = {000,111}• Coset 2 = {001, 110}• Coset 3 = {010, 101}• Coset 4 = {100, 011}
Recall:e{000, 001, 010,
100}
Each index requires 2 bits
decoding: output Ye’where e’ is the leader
000001010100
Source Coding with Side Information
X
source
compress
decompressC(X) X
Y
Without Y the encoder cannot compute e!• e= XY
Source Coding with Side InformationStill possible: • Encode coset in which X occurs
• Coset 1 = {000,111}• Coset 2 = {001, 110}• Coset 3 = {010, 101}• Coset 4 = {100, 011}
Each index requires 2 bits
decoding: output e’where the hamming
distance of e’ and Y is smallest
Slepian-Wolf codes over finite block
lengths have nonzero error which implies
that the decoder will sometimes fail
Source Coding with Side InformationIn practice:1. Fix p and determine the
compression rate of a Slepian-Wolf code that satisfies the target error
2. Pick Slepian-Wolf code and determine the maximum p for which target error is satisfiedNeed to know the source statistics!
OutlinePreliminariesSource Coding with Side
InformationCompressing Stream CiphersCompressing Block CiphersSimulation resultsImpossibility Result
Compression Stream Ciphers
This problem can be formulated as a Slepian-Wolf coding problem [JohnsonWagnerRamchandran04]
key (k)
X
source
compress
Ek(X) C(Ek(X))
The ciphertext is cast as a
source
The shared key k is cast as the decoder-only
side-information
Compression Stream Ciphers
• Compression is achievable due to correlation between the key K and the ciphertext C=XK
• The joint distribution of the source and side-information can be determined from the statistics of the source
X
source
compress
Ek(X) C(Ek(X))
key (k)
Compression Stream Ciphers
key (k)
C(Ek(X))
sourceJoint decryption
anddecompression
decoder
X
The decoder knows k and source statisticsCompression rate H(Ek(X)|K)=H(XK|K)=H(X) is asymptotically achievable
EfficiencyEncoding: finding coset of Ek(X) can
be done by multiplying Ek(X) with parity check matrix• I.e., Ek(X)∙HT is the syndrome of Ek(X)
Decoding: exhaustive search through the coset of Ek(X)• Is improved using LDPC codes,
decoding is polynomial in the block length
SecurityCompression that operates on
top of one time pad does not compromise security of the encryption scheme• Compressor does not know K
OutlinePreliminariesSource Coding with Side
InformationCompressing Stream CiphersCompressing Block CiphersSimulation resultsImpossibility Result
Compressing Block CiphersWidely used in practiceThe correlation between the key
ciphertext is more complex• Previous approach is not directly
applicable
Does data encrypted with block ciphers can be compressed without access to the key?
Electronic Code Book (ECB) Mode The simplest mode of operation where each block is
evaluated separately Compression in this mode is theoretically possible, is
it also practical?
block cipher
X1
k
Ek(X1)
block cipher
X2
k
Ek(X2)
block cipher
Xn
k
Ek(Xn)
…
The compression schemes that we present rely on the
specifics of chaining operations
Cipher Block Chaining (CBC) Mode
block cipher
k
Ek(X1)
block cipher
k
Ek(X2)
block cipher
k
Ek(Xn)IV
IV
Xn
Xn
X2
X2
X1
X1
Correlation between Ek(Xi) and Xi+1 is easier to characterize and can be exploit for compression
…
Compressing Block Ciphers
IV, Ek(X1)…Ek(Xn)
compressor
Last block is left uncompressed, while IV
is compressed
C(IV,) C(Ek(X1))…Ek(Xn)
Recalling that Xi+1= Ek(Xi)Xi+1
Ek(Xi) is cast as the source and Xi+1 is cast as the side information
Decoding
decryptionK
Ek(Xn)
Xn
Xn
Slepian-Wolf
decoder
C(Ek(Xn-1))
Ek(Xn-1)
decryptionK
Xn-1
Slepian-Wolf
decoder
C(Ek(Xn))
Ek(Xn)
Ek(Xn-1)
Xn-1
OutlinePreliminariesSource Coding with Side
InformationCompressing Stream CiphersCompressing Block CiphersSimulation resultsImpossibility Result
Compression Factorlet {Cm,R,Dm,R} denote an order m
Slepian-Wolf code with compression rate R• Compressor Cm,R: {0,1}m → {0,1}mR
• Decompressor Dm,R: {0,1}mR x {0,1}m
→ {0,1}m
compression factor:
R1
m+R•m•nm•)1+n(
≈
Compression ResultsIrregular LDPC codes were used in
our performance evaluation
Table: Attainable compression rates for m = 128 bits
Source Entropy
Compression Rate
Target Error
P
0.1739 0.50 10-3 0.026
0.1301 0.50 10-4 0.018
0.3584 0.75 10-3 0.068
0.3032 0.75 10-4 0.054
Compression ResultsIrregular LDPC codes were used in
our performance evaluation
Table: Attainable compression rates for m = 1024 bits
Source Entropy
Compression Rate
Target Error
P
0.3195 0.50 10-3 0.058
0.2778 0.50 10-4 0.048
0.5710 0.75 10-3 0.134
0.5464 0.75 10-4 0.126
OutlinePreliminariesSource Coding with Side
InformationCompressing Stream CiphersCompressing Block CiphersSimulation resultsImpossibility Result
Recall -- ECB Mode
block cipher
m1
K
Ek(m1)
block cipher
m2
K
Ek(m2)
block cipher
mn
K
Ek(mn)
…
Notable ObservationsExhaustive strategies are infeasible
in most cases• Except for very low-entropy plaintext
distributions or compression ratios• By truncating the ciphertext
For example, consider plaintext distribution consisting of 1,000 128-bit values uniformly distributed• One can compress the output of a 128-
bit block cipher by truncating the 128-bit ciphertext to 40 bits
Can we construct a better strategy?
Impossibility ResultThere does not exist generic
(C,D) for block ciphers unless (C,D)• Either exhaustive or• Computationally infeasible
There does not exist efficient (C,D) for ECB
mode!
The Public-Key SettingHybrid encryption• Using public-key scheme to encrypt
a symmetric key and then encrypt the data with this key
El Gamal encryption• Similar technique when using xor
Concluding RemarksData encrypted with block ciphers
are practically compressible, when chaining modes are employed
Notable compression factors were demonstrated with binary memoryless sources
Short block sizes limit the performance, but that could change in the future
Generic compression is impossible
Future WorkAn interesting question refers to
whether compression is possible without any preliminary knowledge on the data• Can compression be achieved using
algorithms that do not rely on the source statistics, i.e., universal algorithms
The error:• Can we consider less limited setting
where the error is not independent?
Thank You!