Top Banner
On the Approximability of Geometric and Geographic Generalization and the Min-Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint with Wenliang Du, David Eppstein, and George Lueker
25

On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

Jan 05, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

On the Approximability of Geometric and Geographic Generalization and the Min-Max Bin Covering Problem

Michael T. Goodrich

Dept. of Computer Science

joint with Wenliang Du, David Eppstein, and George Lueker

Page 2: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

Motivation

• Privacy is a concern with respect to information in relational data bases– rows are associated with

people– columns are attributes

• K-anonymity– No query should reveal less

than K individuals

image source: http://neodv8.blogspot.com/2007/09/neutral-mask-masterclass.html

Page 3: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

Generalization

• Replace specific attributes with more general ones, so no category has fewer than K members.

source: ℓ-Diversity: Privacy Beyond k-Anonymity Ashwin Machanavajjhala Johannes Gehrke Daniel Kifer Muthuramakrishnan Venkitasubramaniam Department of Computer Science, Cornell University

Page 4: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

Data Types

• Linear: Easy greedy algorithm is optimal• Unordered: arbitrary groupings possible• GPS coordinates: group using rectangles• Zip codes: should use proximity, not text

image source: http://eagereyes.org/Applications/ZIPScribbleMap.html

Page 5: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

Previous Work• [Samarati, Sweeney, 98] introduce concept of k-

anonymization and generalization to achieve it.• [Meyerson, Williams, 04] show optimal generalization

or unordered data is NP-hard, but their proof requires as many attributes as people. And similar proofs are due to [Aggarwal et al., 05] and [Byun et al., 07].

• [Khanna, Muthukrishnan, Paterson, 98] study a rectangle tiling problem similar to GPS coordinate generation, showing 5/4-approximations are not possible unless P=NP.

• Lots more work on k-anonymization and its variants…

Page 6: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

Our Results

• Zip codes: has a 4-approximation, but no 4/3-approximation unless P=NP

• GPS coordinates: has a 5-approximation, but no 4/3-approximation unless P=NP

• Unordered: is NP-hard but has a PTAS. Also, this version of the problem gives rise to a new type of bin-packing problem.

Page 7: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

Min-Max Bin Covering

max

min (k)

image source: http://www.developerfusion.com/article/5540/bin-packing/4/

Page 8: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

Min-Max Bin Cover is NP-hard

• Reduction from:

• Reduction method:

Page 9: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

A Next-Fit Method: “Fold”• Theorem: There is a linear-time algorithm, A,

guaranteeing• Proof idea: Put items of size at least k into their

own bins, and use Next Fit for remaining items.– all but the last bin have level at

most 2k − 2, as they each have at most k − 1 before the last item.

– There may be one leftover bin with level less than k, which must be merged with some other bin.

Page 10: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

Our PTAS: “Spread”

• Theorem: For each fixed ϵ > 0, there is a polynomial time algorithm Aϵ that, given some instance X of Min-Max Bin Covering, finds a solution satisfying Aϵ(X) ≤ (1 + ϵ)(Opt(X) + 1).

• Note: Normalize so k=1 and note that if there is an item of size > 3, then Next-Fit Theorem gives an optimal solution.

• We can assume, wlog, that the optimal solution has cost at most 3

Page 11: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

The Spread Algorithm Warm-up• Call items < ϵ “small” and others “large”

– Note that any solution will have at most 3n bins.

• For any packing P, let the type of P be a packing where we throw out all small items and round all large items down to largest smaller value that is a product of ϵ and a power of (1+ϵ).

ϵ(1+ ϵ)5

Page 12: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

More Warm-up• There are a constant number of rounded

values, for fixed ϵ; hence, a constant number of configurations – ways of filling a bin to at most 3 with rounded values.

• Represent a type by counts of each configuration, so that there are a polynomial number of types (with at most 3n bins).

configurations: 1 432 5

bin counts: 4 0 6 8 1

(constant number)

Page 13: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

The Spread Algorithm

• For each type T:1. Let T’ be packing with rounded values replaced

with corresponding original (large) values.

2. Pack small values into T’ using greedy method of choosing bin with lowest level.

3. Merge pairs of smallest bins until every bin has a level of at least 1.

• Pick the one that minimizes the size of the largest bin.

Page 14: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

Why it Works

• The type for the optimal solution is considered by the Spread algorithm.

• The T’ in this instance has cost at most (1+ϵ) times the optimal cost.

• During the greedy completion, the maximum bin must be at most (1+ϵ)Opt + ϵ, for otherwise we would have used more than the original set of items

• When we merge bins, we may merge one with level less than 1 with one of level (1+ϵ)Opt + ϵ; hence max of (1+ϵ)Opt + 1 + ϵ

Page 15: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

Experimental Results

• Apply to names in the U.S. Census data:– FEMALE-1990: Female first names and their

frequencies, for names with frequency at least 0.001%.

– MALE-1990: Male first names and their frequencies, for names with frequency at least 0.001%.

– LAST-1990: Surnames and their frequencies, for surnames with frequency at least 0.001%.

Page 16: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

Fold versus Spread

• Apply to random and sorted orders, since both algorithms consider items according to their input order.

• Test each algorithm for increasing k.

• At certain threshold levels of k, the number of bins is reduced, which causes some “jaggedness” in the results.

Page 17: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

Female-1990

Page 18: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

Male-1990

Page 19: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

Last-1990

Page 20: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

Zip Code Generalization is NP-Hard

• Formally, 3-Regular Planar Partition into Paths of Length 2 (3PPPL2): Given a 3-regular planar graph G, can G be partitioned into paths of length 2?

Page 21: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

Proof Sketch

• Reduction from 3-Dimensional Matching:– Given triples (x,y,z) from sets X,Y,Z, find a set

of triples such that each member of X, Y, and Z belong to exactly one triple.

Page 22: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

Proof Sketch

• Crossover gadget:

Page 23: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

Proof Sketch

• Crossover gadget:

Page 24: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

Additional Results

• An 4/3-approximation algorithm for planar graphs

• NP-hardness and 4/3-approximation algorithm for two-dimensional points.

Page 25: On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

Conclusion• We have shown that generalization is NP-hard and in

some cases cannot be arbitrarily approximated unless P=NP.

• We have given approximation algorithms for the versions we study:– unordered data– planar graphs (generalized into connected components)– two-dimensional points (generalized with rectangles)