A Property Testing Double-Feature of Short Talks Oded Goldreich Weizmann Institute of Science Talk at Technion, June 2013.

A Property Testing

Double-Feature of Short Talks

Oded GoldreichWeizmann Institute of Science

Talk at Technion, June 2013


On the communication complexity methodology for

proving lower bounds on the query

complexity of property testing

Before Blais, Brody, and Matulef (2011) (In order to derive a lower bound on testing the property , reduce a two-party communication problem to .)

Communication Complexity Property Testing

Ax

By

T

z

The models seem incompatible: (1) no natural partition in PT, (2) no distance in CC.

The Methodology of Blais, Brody, and Matulef In order to derive a lower bound on testing the property , reduce a two-party communication problem to .That is, present a mapping F of pairs of inputs (x,y)0,1n+n for the CC-problem to l(n)-bit long inputs for testing such that (x,y) implies F(x,y) and (x,y) implies that F(x,y) is far from .

In [BBM], l(n)=n and each fi is a function of xi and yi only.This restriction complicates the use of the methodology.

Let fi(x,y) be the i-th bit of F(x,y), and suppose that B is an upper bound on the (deterministic) communication complexity of each fi and that C is a lower bound on the randomized communication complexity of . Then, testing requires at least C/B queries.

(x,y) F(x,y)=1

=0

in

far from

Soundness of the Methodology

RCC = randomized CC (with error, say 1/3). Shared randomness.DCC = deterministic CC (or randomized with error 1/6n).PT = query complexity of testing (w.r.t distance as in “far”).

Proof: Each of the two parties invokes a local copy of the tester using the shared randomness. Each query (i.e., i) made by the tester is answered by invoking the corresponding CC protocol (for fi). Note that the two local executions are kept identical.The error probability of this protocol equals that of the tester. ■

THM: Let F:0,1n+n 0,1l(n) be such that (x,y) implies F(x,y) and (x,y) implies that F(x,y) is far from . Let fi(x,y) be the i-th bit of F(x,y).Then, RCC() ≤ maxi{DCC(fi)} ∙ PT(). Extends to CC

promise problems

Applying the Methodology THM: Let F:0,1n+n 0,1l(n) be such that (x,y) implies F(x,y) and (x,y) implies that F(x,y) is far from . Let fi(x,y) be the i-th bit of F(x,y).Then, RCC() ≤ maxi{DCC(fi)} ∙ PT(); i.e., PT() ≥ RCC()/maxi{DCC(fi)}.

THM: Let C:0,1n 0,1l(n) be a linear code of constant relative distance, and k:NN. Then, the query complexity of the set {C(x):x0,1n & wt(x)=k} is (k).

PF: Reduce from k-DISJn (disjointness for k/2-subsets), using F(x,y)=C(x+y)=C(x)+C(y). Note that each bit in F(x,y) has DCC=2 (by exchanging the corresponding bits of C(x) and C(y)).

COR: Testing k-linearity has query complexity (k). [C = Hadamard]

Note: Typically, the i-th bit of F(x,y) depends on a linear number of bits in x and in y. An alternative proof that uses the original BBM formulation needs to maneuver around this difficulty.

Applying the Restricted Methodology THM: Let F:0,1n+n 0,1l(n) be such that (x,y) implies F(x,y) and (x,y) implies that F(x,y) is far from . Let fi(x,y) be the i-th bit of F(x,y).Then, PT() ≥ RCC()/maxi{DCC(fi)}. Restriction: fi(x,y)=fnc(i,xi,yi).

THM: Let C:0,1n 0,1l(n) be a linear code of constant relative distance, and k:NN. Then, the query complexity of the set {C(x):x0,1n & wt(x)=k} is (k).

An alternative proof via the restricted methodology introduces an auxiliary CC problem (“C-encoded k-DISJ”) ’ that consists of pairs (C(x),C(y)) s.t (x,y)k-DISTn and reduces (in the CC world) k-DISJ to ’ and then applies the restricted method to ’.

The general methodology frees the prover/user from this type of acrobatics.

Interestingly, this is only a matter of convenience; that is, it does not add power (i.e., “anything provable via general is essentially provable by restricted”).

Emulating the Restricted Methodology THM: Let F:0,1n+n 0,1l(n) be such that (x,y) implies F(x,y) and (x,y) implies that F(x,y) is far from . Let fi(x,y) be the i-th bit of F(x,y).Then, PT() ≥ RCC()/maxi{DCC(fi)}. Restriction: fi(x,y)=fnc(i,xi,yi).

THM (imprecise sketch): Suppose that , and F satisfy the conditions of the general methodology with B=maxi{DCC(fi)}. Then, there exists ’, ’ and F’ that satisfy the conditions of the restricted methodology while RCC(’)≥RCC() and PT()=(PT(’)/B).

Still, the general methodology frees the prover/user from this type of acrobatics.


On Multiple Input Problems in Property

Testing

Three types of multiple input problems

For any fixed property and proximity parameter .Direct m-Sum Problem: Given a sequence of m inputs, output a sequence of m outputs that each satisfy the testing requirements; that is, for every i, if the ith input is in then the ith output is 1 w.p.≥2/3, whereas if the input is -far from then the output is 1 w.p. ≥ 2/3.

Direct m-Product Problem: Given a sequence of m inputs, output 1 w.p. ≥2/3 if all inputs are in , and 0 w.p.≥2/3 if some input is -far from .m-Concatenation Problem: Given a sequence of m inputs, output 1 w.p. ≥2/3 if all inputs are in , and 0 w.p.≥2/3 if the average distance of the inputs from is at least .

The results at a glance: For DS and DP the query complexity is m times the query complexity of , for CP it is about the same as for .

The main results

m-DS: Given a sequence of m inputs, output a sequence of m outputs such that, for every i, if the ith input is in then the ith output is 1 w.p.≥2/3, whereas if the input is -far from then the output is 1 w.p. ≥ 2/3.

m-DP: Given a sequence of m inputs, output 1 w.p. ≥2/3 if all inputs are in , and 0 w.p.≥2/3 if some input is -far from .m-CP: Given a sequence of m inputs, output 1 w.p. ≥2/3 if all inputs are in , and 0 w.p.≥2/3 if the average distance of the inputs from is at least .

For any and , w.r.t. error probability at most 1/3.

THM 1: m-DS() = (m∙PT()).

THM 2: m-DP() = (m∙PT()).

THM 3: Typically(*), m-DP() = Õ(PT()).

*) “Typically” = if PT() increases at least linearly with 1/

Comments re the proof of THM1

THM 1: m-DS() = (m∙PT()).(m-DS = given a sequence of m inputs, output a sequence of m outputs such that, for every i, if the ith input is in the ith output is 1 w.p.≥2/3, whereas if the input is -far from then the output is 1 w.p. ≥ 2/3.)

Re the lower bound: In the model of query complexity, it is easy to decouple the execution of the multiple-instance procedure into a sequence of single-instance executions, and the only issue at hand is the possibly uneven and adaptive allocation of resources among the executions.

We need to consider the allocation of resources w.r.t some distribution on instances; which one? The one provided by the MiniMax Principle!

The real contents of the MMP is not that the worst-case performance of each randomized algorithm is bounded by the average-case performance (of all deter’ algorithms) w.r.t some fixed input distribution, but rather that this bound is tight!


THM 2: m-DP() = (m∙PT()).(m-DP = given a sequence of m inputs, output 1 w.p. ≥2/3 if all inputs are in , and 0 w.p.≥2/3 if some input is -far from .)

In iteration j, run DS on the instances with index in I, with error parameter exp(-j), and reset I to be the set of indices with output 0. If |I|>m/2j, then halt with output 0. If I is empty, halt with output 1.

Re the upper bound: A straightforward reduction of DP to DS will require error reduction (and so we would lose a (log m) factor).

LEM: m-DP can be reduced to O(j) instances of 2-(j-1)m-DS, for j=1,…,log m.

Idea: Proceed in iterations, initializing I (the set of “far” suspects) to [m].

Re the lower bound: Via an adaptation of the proof of THM1.

Illustration for the proof of LEM

In iteration j, run DS on the instances with index in I, with error parameter exp(-j), and reset I to be the set of indices with output 0. If |I|>m/2j, then halt with output 0. If I is empty, halt with output 1.

LEM: m-DP can be reduced to O(j) instances of 2-(j-1)m-DS, for j=1,…,log m.

Idea: Proceed in iterations, initializing I (the set of “far” suspects) to [m].

Case: All inputs in Case: an input far from

01 101 1 1 1 1 01 001 1 1 1 1

11

*

11 0

0


THM 3: Typically(*), m-DP() = Õ(PT()).(m-CP = given a sequence of m inputs, output 1 w.p. ≥2/3 if all inputs are in , and 0 w.p.≥2/3 if the average distance of the inputs from is at least .)*) “Typically” = if PT() increases at least linearly with 1/

Suppose Es[q(s)] > , for q:[N][0,1].(Invested work is proportional to 1/q(s), unknown a priori.)

Then, exists j[l] such that Probs[q(s)>2-j] > 2j/4l.

Re the upper bound: A straightforward algorithm would sample O(1/) instances and run the -tester for on each of them. Complexity O(PT/).

One can do better using Levin’s economical work investment strategy.Let l = log(2/). For j=1,…,l, take a sample of O(l/2j) instances and invoke a 2-j-tester on each.

Additional results and comments

Non-adaptive and/or one-sided error testers

The only deviation from the general case is for the one-sided error version of DP: Its complexity is (m∙PT()+PTose()).(m-DP = given a sequence of m inputs, output 1 w.p. ≥2/3 if all inputs are in , and 0 w.p.≥2/3 if some input is -far from .) (OSE is the adaptive version)

it selects a random i in I, and invokes the one-sided error tester on the ith instance, and decides accordingly.In contrast, in the invocations of the reduction procedure, we use the two-sided error tester.

Re the upper bound: We adapt the procedure presented in the proof of the efficient reduction of DP to DS (cf., Lemma for THM2). Recall that this procedure proceeds in iterations halting with output 1 if I (the set of “far” suspects) becomes empty and outputting 0 if I is ever too big. We modify the procedure such that in the latter case

End

The slides of this talk are available athttp://www.wisdom.weizmann.ac.il/~oded/T/2pt13.ppt

The “CC Methodology” paper is available at http://www.wisdom.weizmann.ac.il/~oded/p_ccpt.html

The “Multiple Input” paper is available at http://www.wisdom.weizmann.ac.il/~oded/p_mi-pt.html

Gothic cathedral?

Property Testing: an illustration

Property Testing: informal definition

A relaxation of a decision problem:For a fixed property P and any object O,determine whether O has property Por is far from having property P (i.e., O is far from any other object having P).

Focus: sub-linear time algorithms – performing the task by inspecting the object at few locations.

? ?

?

??

Objects viewed as functions.

Inspecting = querying the function/oracle.

Property Testing: the standard (one-sided error) def’n

A property P = n Pn , where Pn is a set of functions

with domain Dn.

The tester gets explicit input n and , and oracle access to a function with domain Dn.

• If f Pn then Prob[Tf(n,) accepts] = 1. (or > 2/3)

• If f is -far from Pn then Prob[Tf(n,) rejects] > 2/3.

(Distance is defined as fraction of disagreements.)Focus: query complexity, q(n,) « |Dn|Special focus: q(n,)=q(),

independent of n.Terminology: is called the proximity parameter.

The Methodology of Blais, Brody, and Matulef In order to derive a lower bound on testing the property , reduce a two-party communication problem to .That is, present a mapping F of pairs of inputs (x,y)0,1n+n for the CC-problem to l(n)-bit long inputs for testing such that (x,y) implies F(x,y) and (x,y) implies that F(x,y) is far from .

In [BBM], l(n)=n and each fi is a function of xi and yi only.This restriction complicates the use of the methodology.

Let fi(x,y) be the i-th bit of F(x,y), and suppose that B is an upper bound on the (deterministic) communication complexity of each fi and that C is a lower bound on the randomized communication complexity of . Then, testing requires at least C/B queries.

A Property Testing Double-Feature of Short Talks Oded Goldreich Weizmann Institute of Science Talk at Technion, June 2013.

Documents

cc methodology

inputs x

randomized cc

low cc

deterministic cc

cc world

query complexity of

query complexity of