Welcome message from author

This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

Modern Trends in Steganography and Steganalysis

Jessica Fridrich

State University of New York

Department of Electrical and Computer Engineering

Fundamental questions

• How to hide information securely? • What does it mean securely? • How much can I embed undetectably? • How should I embed? • What are the most relevant open problems? • What are the main achievements?

IEEE publications per year

steganography

steganalysis

journal

conference

Number of software stego applications

Data courtesy of Chet Hosmer, Wetstone Tech

Data hiding applications by media

Number of software applications that can hide data in electronic media as of June 2008 Data courtesy of Neil Johnson

Fundamental questions

• How to hide information securely? • What does it mean securely? • How much can I embed undetectably? • How should I embed? • What are the most relevant open problems? • What are the main achievements?

Answers to these questions depend on whom you ask!

Everything depends on the source

Artificial sources (iid sequences, Markov chains) • Steganographic capacity known (positive rate) • Capacity-approaching embedding using QIM • Optimal detectors are LRTs

Empirical sources (digital media) • Model unknown • Capacity unknown • Best embedding algorithms unknown • Optimal detectors unknown

abstraction

reality

Steganographic channel

Cover source random variable x on X Stego source random variable y on X Message source random variable on M Key source random variable on K Channel error-free/noisy (passive/active Warden)

Warden x px

y py

Measure of security DKL(px py) Kullback Leibler divergence (Cachin 1998) Perfect security DKL(px py) 0 -security DKL(px py)

• KL divergence is error exponent for optimal Neyman Pearson detector: PD(PFA) exp( nDKL), n is no. of pixels • Other Ali-Silvey distances could be used Note that x can be - pixels (DCT coefficients)

- groups of pixels - features = low-dimensional image representations - entire images

Steganographic security

• Given full knowledge of the cover source px , perfectly secure stegosystems exist (Anderson & Petitcolas, 1997)

• Can be realized using the principle of cover synthesis by utilizing a perfect compressor of the cover source

• Sender communicates on average H(x) per object sent, the entropy of the cover source

Message source

Encrypted message are random bits m

Source decompressor D y = D(m)

Stego object y is generated (no cover x on sender’s input)

Perfect security

• First practical realization for JPEG images, Model-based steganography (Sallee, IWDW 2003)

Is the maximum rate over all perfectly-secure stegosystems

• Measured in bits per cover element (rate)

• Property of the cover+key+message sources and the channel, not of a specific embedding scheme!

• It is positive – the size of secure payload increases linearly with the cover size.

Steganographic capacity

The most general result (Moulin & Wang, 2004, 2008) for active warden and distortion-limited embedder: Distortion per cover element: Em,k,x[d(x,y)] Ds Channel distortion (y y’): Ex[d(y,y’)] Dw

Steganographic constraint: px py Sender and Warden know the channel px, pm, pk, py y’ Capacity C(Ds,Dw) can be derived when the size of covers, n . Explicit formula available for some simple cover sources: Bernoulli sequence with d(.,.) = Hamming distance (Moulin & Wang, 2004)

Gaussian source with d(.,.) = square distance (Gonzales & Comesaña, 2007)

Steganographic capacity

For empirical sources, px is unknown. Alice cannot preserve it and Warden cannot build LRT If Alice preserves a simplified model, Warden will work outside of the model and detect embedding History teaches us that no matter how hard Alice tries, the Warden will be smarter and will find representation of the covers where detection will be possible (DKL > 0) Alice knows that her scheme is not secure. How big a payload (secure payload) can Alice send with a fixed* risk? *Fixed risk = fixed KL-divergence.

Capacity for empirical sources?

Square root law of imperfect steganography (Ker 2007 for iid sources, Filler 2009, Markov sources)

• Secure payload property of the embedding scheme! This is why we do not use the term “capacity”

• Secure payload grows only with the square root of the cover size

• Model mismatch does not decrease the communication rate but makes it drop to zero! (already suspected by Anderson & Petitcolas 1997) This is fundamentally very different from other situations in communication

• Experimentally verified on images for a wide spectrum of steganographic methods

The square root law of secure payload

1. Cover source is a first-order Markov chain with transition probability matrix A (aij), Pr(xn j | xn+1 i)

- simplest tractable model that captures dependencies among pixels

2. Embedding is a probabilistic mapping applied to each pixel independently Pr(y k | x l) bkl

- captures the essence of most practical stegosystems

3. Steganographic system is not perfect (DKL 0) - we already know that perfectly secure stegosystems have positive rate

(secure payload increases linearly), this result holds only for artificial covers

The square root law assumptions

j k l i

j’ k’ l’ i’

aij ajk akl bii’ bjj’ bkk’ bll’ embedding

stego

cover

HMC

MC

As n ,

Conservative payload: M/ n 0 Stegosystem is asymptotically perfectly secure (DKL 0)

Unsafe payload: M/ n Arbitrarily accurate detector can be built

Borderline payload: M/ n const. Stegosystem is –secure (DKL ) n cover size (e.g., number of pixels) M payload size (message length in bits)

The square root law theorem

n number of cover elements px cover distribution py stego distribution with fraction of β of changed pixels I(0) Fisher information w.r.t. β at β = 0 I(0) = Ex[( log py(β)/ β)2]

DKL(px py) ½ nβ2 I(0) + O(β3) const. Fixed risk nβ2 const. payload nβ n

Where does the SRL come from?

Experimental verification

Se

cu

re p

aylo

ad

M (

bit

s)

• A ten-times bigger image can hold only three-times bigger payload at the same level of statistical detectability - Alice and Bob, be conservative with your payload!

• It is easier to detect the same relative payload in larger images! - careful when comparing steganalysis on different databases - need to fix the payload/ n – the root rate

• Use root rate for comparing/benchmarking/ designing stegosystems

What does this mean for practitioners?

Options available:

• Force embedding to resist a given attack - necessary but not sufficient, usually rather weak

• Adopt model (or its sampled version) and preserve it - unless model sufficiently complex usually easy to detect

• Mimic natural processing / noise - very hard to do right (Franz, 2008)

• Minimize embedding distortion - the best steganographic algorithms for digital images

known today work precisely in this manner - no need to estimate high-dimensional distributions - can work with very high-dimensional models

- well developed, entire framework is in place

So, how shall we embed in practice?

Historical development d(.,.) = number of embedding changes

- Matrix Embedding (Crandall 1998, Westfeld 2001, …) d(.,.) = L1 or L2 distortion

- MMEx by Kim & Duric & Richards 2006, Sachnev & Kim 2009, Perturbed quantization 2004, …)

general d(.,.) - syndrome-trellis codes (Filler 2009–11, Pevný HUGO 2010)

Strategy a) fix the embedding operation b) define the embedding distortion function d(x,y) c) embed by minimizing d(x,y) for a given payload and cover x d) search for d(.,.) that minimizes detectability in practice

Steganography minimizing distortion

• During embedding, cover x is slightly modified to stego object y that carries the desired payload

• For each x, the sender specifies a set of plausible stego images y Y

• LSB embedding: y {x, LSBflip(x)} • k embedding: y {x, x 1,x 1,…,x k, x k} • When x is not to be modified, y {x}

• Distortion measure d(x,y) is an arbitrary bounded function of x and y

• Embedding algorithm selects each y Y with prob. (y)

Expected payload: H( )

Expected distortion: E [d] y (y) d(x,y)

Formalization

Payload-limited sender

Minimize expected distortion E [d] y (y)d(x,y)

subject to payload constraint H( ) M

Distortion-limited sender

Maximize expected payload H( )

subject to distortion bound E [d] y (y)d(x,y)

The optimization is over all distributions over Y

It is and Y that encapsulate the entire embedding algorithm.

They also depend on cover x !

Searching for the best

The optimal distribution (algorithm) for both senders is the Gibbs distribution

(y) 1/Z( ) exp( d(x,y))

> 0 is a scalar parameter determined from the appropriate constraint (the “inverse temperature”)

Z( ) y exp( d(x,y)) is the normalization factor (partition function)

We can import many useful algorithms from statistical physics into steganography

Gibbs distribution

• By sampling from (y), one can simulate the impact of theoretically optimal embedding!

• Gibbs sampler, Markov chain Monte Carlo methods • Detectability can be tested before implementing the scheme

• Near-optimal practical embedding schemes can be constructed using syndrome-trellis codes

• when d(.,.) is a sum of locally-supported potential functions

• includes the most common case when d is additive (more on next slide)

• We can compute the rate-distortion bound relationship between expected distortion E[d] and payload H( )

• Bound tells us how close to optimum a given implementation is • Thermodynamic integration + Gibbs sampler

Separation principle

d(x,y) k k(x,yk)

• k(.,.) arbitrary bounded functions • Embedding changes do not interact • Sampling is easy as each pixel can be changed independently of

other pixels (no need to use MCMC samplers) • Embedding with minimal d is source coding with fidelity constraint

(Shannon 1954) • Near-optimal quantizers exist based on Viterbi algorithm:

Syndrome–trellis codes (Filler 2009)

Examples k(x,yk) 1 iff xk yk, d is number of embedding changes

k(x,yk) {1, } wet paper codes k(x,yk) k for xk yk, changes have different weights

yk M , |M| m, m-ary embedding

Additive distortion

• Represent image using feature vector f Rk • f is an image model • f can be high-dimensional, e.g., 106 or higher • Define distortion

d(x,y) f(x) – f(y)

• Minimizing d(.,.) model preservation

Embedding while preserving model

HUGO (Highly Undetectable steGO) by Pevný, Filler, Bas (IH 2010, Calgary) - uses model of dimensionality 107 - steganalyzers of 2010 unable to detect payloads 0.4 bpp

• Message hidden while minimizing an additive distortion function designed to correlate with statistical detectability

• Incorporates syndrome-trellis codes for near-optimal content-adaptive embedding

actual changes cover image

HUGO

• Cast as supervised pattern recognition problem (Avcibas, Memon 2001, Farid 2002)

• Represent images using features (dim. reduction) - features sensitive to embedding but not to content - often computed from the noise component - sampled joint or conditional distribution (Shi & Chen & Chen

2006–2008) - designed to capture dependencies among neighboring pixels

• Train a classifier on examples of cover/stego images - support vector machines, neural networks, ensemble classifiers - binary, one-class, or multi-class - regressors can estimate change rate

Modern steganalysis

• Rich cover model - union of many smaller diverse submodels - all dimensions well populated - diversity more important than dimensionality

• Scalable machine learning - ensemble classifier - fast w.r.t. dimensionality as well as training set size - allows using rich cover models

Steganalysis using rich models

• Define pixel predictors Pred(k)(x), k = 1, …, p, Pred(xij) is the value of xij predicted from local neighborhood not containing xij

• Compute: kth noise residual R(k) = Pred(k)(x) – x • Quantize: R(k) = round(R(k) / q), q = quantization step • Truncate: if R(k) > T then R(k) = T • Form co-occurrence matrices of neighboring residual samples

horizontally constant model horizontally linear model horizontally quadratic model … Other options include vertical models, predictors utilizing local 3 3 or 5 5 neighborhoods or their parts, non-linear operations min and max, lookup tables of conditional probabilities learned from cover source images, etc.

ij i, j+1 i,j-1Pred(x ) (x x )/2

ij i, j-1 i, j+1 i,j+2Pred(x ) (x 3x x )/3

ij i, j+1Pred(x ) x

Assembling the rich model (spatial domain)

Image with many edges

Edge close up

Adaptive schemes are likely to embed here even though the content is modelable in the vertical direction However, pixel differences will mostly be in the marginal. Linear or quadratic models bring the residual back inside the co-occurrence matrix

Complex models better adapt to content

Feature space

dim = N

Random subspace dim = k << N

Random subspace

Random subspace

… …

Base learner B1

Base learner B2

Base learner BL

Classifier fusion

Ensemble classifiers

• Ensemble classifier built as random forest • Decision obtained by fusing decisions of L simple, diverse, and

unstable base learners B1, …, BL (majority voting) • Base learners = Fisher Linear Discriminant (analytic formula) • FLDs trained on a bootstrap sample of the training set (bagging)

to increase diversity • Parameters: L (number of base learners) and k can be determined

by search using Out of Bag error estimates (OOBs)

Performance Ensemble outperforms linear SVMs Comparable to Gaussian SVMs

Complexity as a function of the training set size N trn

Algorithm: nsF5, feature set 548-dimensional CC-PEV

Complexity comparison with SVMs

Example I: Detecting HUGO

dim 1234

330 985

3286 12753

× model

CDF 2 rich 6 rich

20 rich 78 rich

Image source: 9074 BOSSbase images grayscale 512 512

number of pixel predictors

Example II: Detecting ±1 embedding

dim 1234

330 985

3286 12753

× model

CDF 2 rich 6 rich

20 rich 78 rich

Image source: 9074 BOSSbase images grayscale 512 512

number of pixel predictors

Steganography in empirical covers • Reduced to finding a good distortion function • For example, specify the cost of changes • Available to you: Near-optimal codes, rate–distortion

bound, simulators of optimal embedding

Steganalysis design • Reduced to finding a sufficiently rich model • Should be diverse • High-dimensional = no longer a problem (ensemble classifiers)

Closing remarks

http://dde.binghamton.edu/download/

It is all about the model …

• Can we build a “complete” model of digital images? - is it possible at all? (empirical sources are incognizable) - if not, how far are we?

• Who is winning: Alice&Bob or Warden? - can we establish fundamental bounds using game theory? - robust classification is needed for practice

• Applications beyond steganography - forensics

- source coding

What is next?

Do not skate where the puck is. Skate where the puck is gonna be. Wayne Gretzky

Related Documents