1 Information and interactive computation January 16, 2012 Mark Braverman Computer Science, Princeton University.

1

Information and interactive computation

January 16, 2012

Mark BravermanComputer Science, Princeton University

Prelude: one-way communication

• Basic goal: send a message from Alice to Bob over a channel.

2

communication channel

Alice Bob

One-way communication1) Encode;2) Send;3) Decode.

3


Alice Bob

Coding for one-way communication• There are two main problems a good

encoding needs to address:– Efficiency: use the least amount of the

channel/storage necessary.– Error-correction: recover from (reasonable)

errors;

4

Interactive computation

Today’s themeExtending information and coding theory to interactive computation.

5

I will talk about interactive information theory and Anup Rao will talk about

interactive error correction.

Efficient encoding• Can measure the cost of storing a random

variable X very precisely. • Entropy: H(X) = ∑Pr[X=x] log(1/Pr[X=x]).• H(X) measures the average amount of

information a sample from X reveals. • A uniformly random string of 1,000 bits has

1,000 bits of entropy.

6

Efficient encoding

7

• H(X) = ∑Pr[X=x] log(1/Pr[X=x]).• The ZIP algorithm works because

H(X=typical 1MB file) < 8Mbits.• P[“Hello, my name is Bob”] >>

P[“h)2cjCv9]dsnC1=Ns{da3”].• For one-way encoding, Shannon’s source

coding theorem states that

Communication ≈ Information.

Efficient encoding

8

• The problem of sending many samples of X can be implemented in H(X) communication on average.

• The problem of sending a single sample of X can be implemented in <H(X)+1 communication in expectation.

Communication complexity [Yao]

• Focus on the two party setting.

9

A B

X YA & B implement a functionality F(X,Y).

F(X,Y)

e.g. F(X,Y) = “X=Y?”

Communication complexity

10

A B

X Y

Goal: implement a functionality F(X,Y).A protocol π(X,Y) computing F(X,Y):

F(X,Y)

m1(X)m2(Y,m1)

m3(X,m1,m2)

Communication cost = #of bits exchanged.

Distributional communication complexity

• The input pair (X,Y) is drawn according to some distribution μ.

• Goal: make a mistake on at most an ε fraction of inputs.

• The communication cost: C(F,μ,ε):C(F,μ,ε) := minπ computes F with error≤ε C(π, μ).

11

Example

12

μ is a distribution of pairs of files. F is “X=Y?”:

MD5(X) (128b)

X=Y? (1b)

Communication cost = 129 bits. ε ≈ 2-128.

A B

X Y

Randomized communication complexity

• Goal: make a mistake of at most ε on every input.

• The communication cost: R(F,ε).• Clearly: C(F,μ,ε)≤R(F,ε) for all μ.• What about the converse?• A minimax(!) argument [Yao]:

R(F,ε)=maxμ C(F,μ,ε).

13

A note about the model

• We assume a shared public source of randomness.

14A B

X YR

The communication complexity of EQ(X,Y)

• The communication complexity of equality:R(EQ,ε) ≈ log 1/ε

• Send log 1/ε random hash functions applied to the inputs. Accept if all of them agree.

• What if ε=0?R(EQ,0) ≈ n,

where X,Y in {0,1}n.15

Information in a two-way channel

• H(X) is the “inherent information cost” of sending a message distributed according to X over the channel.

16


Alice BobXWhat is the two-way

analogue of H(X)?

Entropy of interactive computation

A B

X YR

• “Inherent information cost” of interactive two-party tasks.

One more definition: Mutual Information

• The mutual information of two random variables is the amount of information knowing one reveals about the other:

I(A;B) = H(A)+H(B)-H(AB)• If A,B are independent, I(A;B)=0.• I(A;A)=H(A).

18

H(A) H(B)I(A,B)

Information cost of a protocol

• [Chakrabarti-Shi-Wirth-Yao-01, Bar-Yossef-Jayram-Kumar-Sivakumar-04, Barak-B-Chen-Rao-10].

• Caution: different papers use “information cost” to denote different things!

• Today, we have a better understanding of the relationship between those different things.

19

Information cost of a protocol • Prior distribution: (X,Y) ~ μ.

A B

X Y

Protocol πProtocol transcript π

I(π, μ) = I(π;Y|X) + I(π;X|Y) what Alice learns about Y + what Bob learns about X

External information cost• (X,Y) ~ μ.

A B

X Y

Protocol πProtocol transcript π

Iext(π, μ) = I(π;XY) what Charlie learns about (X,Y)

C

Another view on I and Iext

• It is always the case that C(π, μ) ≥ Iext(π, μ) ≥ I(π, μ).

• Iext measures the ability of Alice and Bob to compute F(X,Y) in an information theoretically secure way if they are afraid of an eavesdropper.

• I measures the ability of the parties to compute F(X,Y) if they are afraid of each other.

Example

23

• F is “X=Y?”.• μ is a distribution where w.p. ½ X=Y and w.p. ½

(X,Y) are random.

MD5(X) [128b]

X=Y?A B

X Y

Iext(π, μ) = I(π;XY) = 129 bits what Charlie learns about (X,Y)

Example• F is “X=Y?”.• μ is a distribution where w.p. ½ X=Y and w.p. ½

(X,Y) are random.

MD5(X) [128b]

X=Y?A B

X Y

I(π, μ) = I(π;Y|X)+I(π;X|Y) ≈what Alice learns about Y + what Bob learns about X

1 + 64.5 = 65.5 bits

The (distributional) information cost of a problem F

• Recall:C(F,μ,ε) := minπ computes F with error≤ε C(π, μ).

• By analogy:I(F, μ, ε) := infπ computes F with error≤ε I(π, μ).

Iext (F, μ, ε) := infπ computes F with error≤ε Iext (π, μ).

25

I(F,μ,ε) vs. C(F,μ,ε): compressing interactive computation

Source Coding Theorem: the problem of sending a sample of X can be

implemented in expected cost <H(X)+1 communication – the information

content of X.

Is the same compression true for interactive protocols?

Can F be solved in I(F,μ,ε) communication?

Or in Iext(F,μ,ε) communication?

The big question

• Can interactive communication be compressed?

• Can π be simulated by π’ such that C(π’, μ) ≈ I(π, μ)?

Does I(F,μ,ε) ≈ C(F,μ,ε)?

27

Compression results we know• Let ε, ρ be constants; let π be a protocol

that computes F with error ε.• π’s costs: C, Iext, I.• Then π can be simulated using:

– (I·C)½·polylog(C) communication; [Barak-B-Chen-Rao’10]

– Iext·polylog(C) communication; [Barak-B-Chen-Rao’10]

– 2O(I) communication; [B’11]

while introducing an extra error of ρ.28

The amortized cost of interactive computation

Source Coding Theorem: the amortized cost of sending many independent

samples of X is =H(X).

What is the amortized cost of computing many independent

copies of F(X,Y)?

Information = amortized communication

• Theorem[B-Rao’11]: for ε>0I(F,μ,ε) = limn→∞ C(Fn,μn,ε)/n.

• I(F,μ,ε) is the interactive analogue of H(X).

30

Information = amortized communication

• Theorem[B-Rao’11]: for ε>0I(F,μ,ε) = limn→∞ C(Fn,μn,ε)/n.

• I(F,μ,ε) is the interactive analogue of H(X).• Can we get rid of μ? I.e. make I(F,ε) a

property of the task F?

C(F,μ,ε) I(F,μ,ε) R(F,ε) ?

Prior-free information cost

• Define:I(F,ε) := infπ computes F with error≤ε maxμ I(π, μ)

• Want a protocol that reveals little information against all priors μ!

• Definitions are cheap!• What is the connection between the

“syntactic” I(F,ε) and the “meaningful” I(F,μ,ε)?

• I(F,μ,ε) ≤ I(F,ε)…32

33


• I(F,ε) := infπ computes F with error ≤ε maxμ I(π, μ).• I(F,μ,ε) ≤ I(F,ε) for all μ.• Recall: R(F,ε)=maxμ C(F,μ,ε).• Theorem[B’11]:

I(F,ε) ≤ 2·maxμ I(F,μ,ε/2).

I(F,0) = maxμ I(F,μ,0).

34


• Recall: I(F,μ,ε) = limn→∞ C(Fn,μn,ε)/n.• Theorem: for ε>0

I(F,ε) = limn→∞ R(Fn,ε)/n.

Example

• R(EQ,0) ≈ n.• What is I(EQ,0)?

35

The information cost of Equality

• What is I(EQ,0)?• Consider the following protocol.

36A B

X in {0,1}n Y in {0,1}n

A non-singular in nn2Z

A1·XA1·Y

A2·XA2·Y

Continue for n steps, or until a disagreement is

discovered.

Analysis (sketch)

• If X≠Y, the protocol will terminate in O(1) rounds on average, and thus reveal O(1) information.

• If X=Y… the players only learn the fact that X=Y (≤1 bit of information).

• Thus the protocol has O(1) information complexity.

37

Direct sum theorems

• I(F,ε) = limn→∞ R(Fn,ε)/n.• Questions:

– Does R(Fn,ε)=Ω(n·R(F,ε))?– Does R(Fn,ε)=ω(R(F,ε))?

38

Direct sum strategy

• The strategy for proving direct sum results. • Take a protocol for Fn that costs Cn=R(Fn,ε),

and make a protocol for F that costs ≈Cn/n.• This would mean that C<Cn/n, i.e. Cn>n C.∙

39

~ ~

A protocol for n

copies of F

1 copy of FCnCn/n?

Direct sum strategy

• If life were so simple…

40

1 copy of FCnCn/nEasy!

Copy 1Copy 2

Copy n

Direct sum strategy

• Theorem: I(F,ε) = I(Fn,ε)/n ≤ Cn = R(Fn,ε)/n.• Compression → direct sum!

41

The information cost angle

• There is a protocol of communication cost Cn, but information cost ≤Cn/n.

1 copy of F

Cn

Cn/n

Restriction

1 bit

Copy

1

Copy

2

Copy

n

C n/n

info

Compression?

Direct sum theorems

Best known general simulation [BBCR’10]:• A protocol with C communication and I

information cost can be simulated using (I·C)½·polylog(C) communication.

• Implies: R(Fn,ε) = Ω(n1/2 R(F,∙ ε)).

43

~

Compression vs. direct sum

• We saw that compression → direct sum. • A form of the converse is also true. • Recall: I(F,ε) = limn→∞ R(Fn,ε)/n.• If there is a problem such that I(F,ε)=o(R(F,ε)),

then R(Fn,ε)=o(n·R(F,ε)).

44

A complete problem

• Can define a problem called Correlated Pointer Jumping – CPJ(C,I).

• The problem has communication cost C and information cost I.

• CPJ(C,I) is the “least compressible problem”. • If R(CPJ(C,I),1/3)=O(I), then R(F,1/3)=O(I(F,1/3))

for all F.

45

The big picture

R(F, ε) R(Fn,ε)/n

I(F, ε) I(Fn,ε)/n

direct sum for information

information = amortized

communicationdirect sum for

communication?

interactive compression?

Partial progress

• Can compress bounded-round interactive protocols.

• The main primitive is a one-shot version of Slepian-Wolf theorem.

• Alice gets a distribution PX. • Bob gets a prior distribution PY. • Goal: both must sample from PX.

47

Correlated sampling

48

A B

PX PY

M ~ PXM ~ PX

• The best we can hope for is D(PX||PY).

Uu Y

XXYX uP

uPuPPPD

)(

)(log)()||(

49

Proof Idea• Sample using D(PX||PY)+O(log 1/ε+D(PX||PY)½)

communication with statistical error ε.

PXPY

u1 u1

u2 u2

u3 u3

u4 u4

u4

~|U| samplesPublic randomness:

q1 q2 q3 q4 q5 q6 q7 ….u1 u2 u3 u4 u5 u6 u7

PX PY

1 1

0 0



u4u2

h1(u4) h2(u4)

50PX

PYu4

PX PY

1 1

0 0u2

5151



u4u2

h4(u4)… hlog 1/ ε(u4)

u4

h3(u4)

PX 2PY

PXPYu4

u4

h1(u4), h2(u4)

1 1

0 0

52

Analysis

• If PX(u4)≈2k PY(u4), then the protocol will reach round k of doubling.

• There will be ≈2k candidates.• About k+log 1/ε hashes.• The contribution of u4 to cost:

– PX(u4) (log PX (u4)/PY (u4) + log 1/ε).

Uu Y

XXYX uP

uPuPPPD

)(

)(log)()||(

Done!

Directions• Can interactive communication be fully

compressed? R(F, ε) = I(F, ε)?• What is the relationship between I(F, ε),

Iext(F, ε) and R(F, ε)?• Many other questions on interactive coding

theory!

53

54

Thank You

1 Information and interactive computation January 16, 2012 Mark Braverman Computer Science, Princeton University.

Documents