Top Banner
1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University
62

1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

1

Ph.D. Thesis Proposal

Data Caching in Ad Hoc and Sensor Networks

Bin Tang

Computer Science DepartmentStony Brook University

Page 2: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

2

Summary of My Work Data Caching

Update cost constraint Optimal algorithm for tree; approximation algorithm for

general graph. Memory constraint with multiple data items

Approximation algorithm for general graph number constraint w/h read/write/storage cost

Optimal algorithm for tree

Localized distributed implementations. Compare with existing work

Page 3: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

3

Motivation Ad hoc and sensor networks are resource

constrained Limited bandwidth, battery energy, and

memory

Caching can save access (communication) cost, and thus, bandwidth and energy Under update cost, memory, number constraint

Page 4: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

4

Rooted in…

Facility location problem: set up facilities in a network to minimize total access cost and setting up cost

K-median problem: set up k facilities to minimize total access cost

Page 5: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

5

1. Cache Placement in Sensor Networks Under Update Cost Constraint

Page 6: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

6

Problem Statement Sensor Network Model

A data item stored at a server node. Updated at a certain frequency. Other nodes access the data item at a

certain frequency.

Problem StatementSelect nodes to cache the data item to:

Goal: Minimize “total access cost” Constraint: Total update cost.

Page 7: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

7

Why update cost constraint?

Nodes close to the server bear most of the update cost.

Page 8: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

8

Problem Formulation Given:

Network graph G(V,E). A data item stored at a server node Update frequency Access frequency for each other node Update cost constraint Δ

Goal: Select cache nodes to minimize the “total access

cost” Total update cost is less than Δ

Page 9: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

9

Total Access/Update Cost Total Access Cost =

∑ i є V (hop length between i and its nearest cache x access frequency of i)

Total Update cost = cost of the optimal Steiner tree over server and all caches

Page 10: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

10

Algorithm Design Outline Tree Networks

Optimal dynamic programming algorithm.

General Networks Multiple-unicast update model --

Approximation algorithm.

Steiner-tree update model – Heuristic and Distributed.

Page 11: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

11

Tree Networks

Page 12: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

12

Subtree notation

Server: “r”

Consider a subtree Tv.

Let path (v,x) on its leftmost branch be all caches.

Let C_v be the optimal access cost in Tv using additional update cost δ

Next: Recursive equation for C_v

r

Tr

v

Tvx

Page 13: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

13

Dynamic Programming Algorithm for Tvunder update cost constraint δ

Let u = leftmost deepest node in the optimal set of caches in Tv

Path(v,u) can be all caches (update cost doesn’t increase)

For a fixed u, C_v =

Constant + optimal access cost in Rv,u for constraint (δ – δ_u)

Here, δ_u is the cost to update u (using path(v,x)).

Tv = Lv,u + Tu + Rv,u

Page 14: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

14

DP recursive equation for Tv

C_v = minu є Tv (access cost in Lv,u using path(v,x) or path(v,u)

+ access cost in Tu using u + optimal cost in Rv,u with

constraint δ – δ_u)

Here, δ_u is the cost in updating u (using path(v,x)).Note that Rv,u has a path (v, parent(u)) of caches on its leftmost branch.

Page 15: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

15

Time complexity Time complexity: O(n4+n3 Δ)

Analysis Precomputation takes O(n4)

Lv,u with cache path (v,x): O(n4), for all v,u,x Tu: O(n2), for all u

Recursive equation takes O(n3 Δ) n2Δ entries: for each pair of (v,x) and all values of Δ Each entry takes O(n): n possible u

Page 16: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

16

General Graph Network Two Update Cost Models

Multiple-Unicast

Optimal Steiner Tree

Page 17: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

17

Multiple-Unicast Update Model Update cost: Sum of shortest path lengths

from server to each cache node

Benefit of node A: Decrease in total access cost due to selection of A as a cache

Benefit per unit update cost.

Page 18: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

18

Greedy Algorithm

Iteratively: Select the node with the highest benefit per unit update cost, until the update cost is exhausted

Theorem: Greedy solution’s benefit is at least 63% of the optimal benefit.

Page 19: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

19

Steiner-Tree Update Cost Model Steiner-tree update cost: Cost of 2-

approximation Steiner tree over cache nodes

Incremental Steiner update cost of node A: Increase in Steiner-tree update cost due to A becoming a cache

Greedy-Steiner Algorithm:Iteratively, select the node with the highest benefit per unit above-defined update cost.

Page 20: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

20

Distributed Greedy-Steiner Algorithm

Each non-cache node estimates its benefit per unit update cost

If the estimate is maximum among all its non-cache neighbors, then it decides to cache

Algorithm: In each rounds, each node decides to cache based

on above. The server gathers new cache node information,

and computes the total update cost The remaining update cost is broadcast to the

network, and the new round begins

Page 21: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

21

Performance Evaluation (i) network-related -- number of nodes and

transmission radius, (ii) application-related -- number of clients.

Random network of 2,000 to 5,000 nodes in a 30 x 30 region.

Page 22: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

22

Compared Caching Schemes Centralized Greedy

Centralized Greedy-Steiner

Distributed Greedy-Steiner

Dynamic Programming on Shortest Path Tree of Clients

Dynamic Programming on Steiner Tree over Clients and Server

Page 23: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

23

Varying Network Size – Transmission radius =2, percentage of clients = 50%, update cost = 25% of the Steiner tree cost

Page 24: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

24

Varying Transmission Radius - Network size = 4000, percentage of clients = 50%, update cost = 25% of the Steiner tree cost

Page 25: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

25

Varying number of clients – Transmission Radiu =2, update cost = 50% of the Steiner tree cost, network size = 3000

Page 26: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

26

To Recap: Data caching problem under update cost

constraint.

Optimal algorithm for tree; an approximation algorithm for general graph.

Efficient distributed implementations.

More general cache placement problem: (a) under memory constraint; (b) multiple data items.

Page 27: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

27

2. Data Caching under Memory Constraint

Page 28: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

28

Problem Addressed

In a general ad hoc network with limited memory at each node, where to cache data items, such that the total access (communication) cost is minimized?

Page 29: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

29

Problem Formulation Given:

Network graph G(V,E) Multiple data items Access frequencies (for each node and data item) Memory constraint at each node

Select data items to cache at each node under memory constraint

Minimize total access cost = ∑nodes ∑data items [(distance from node to the nearest

cache for that data item) x (access frequency) ]

Page 30: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

30

Related Work Related to facility-location problem and K-

median problem; No memory constraint

Baev and Rajaraman 20.5-approximation algorithm for uniform-size data

item For non-uniform size, no polynomial-time

approximation unless P = NP We circumvent the intractability by

approximating “benefit” instead of access cost

Page 31: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

31

Related Work - continued

Two major empirical works on distributed caching Hara [infocom’99] Yin and Cao [Infocom’ 04] (we compare our work

with theirs)

Our work is the first to present a distributed caching scheme based on an approximation algorithm

Page 32: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

32

Algorithms

Centralized Greedy Algorithm (CGA) Delivers a solution whose “benefit” is at least 1/2 of

the optimal benefit

Distributed Greedy Algorithm (DGA) Purely localized

Page 33: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

33

Centralized Greedy Algorithm (CGA)

Benefit of caching a data item at a node

= the reduction of total access cost

i.e., (total access cost before caching) – (total access cost after caching)

Page 34: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

34

Centralized Greedy Algorithm (CGA)

CGA iteratively selects the most beneficial (data item, node to cache at) pair.

I.e., we pick (at each stage) the pair that has the maximum benefit.

Theorem: CGA is (1/2)–approximate for uniform data item.

¼-approximate for non-uniform size data item

Page 35: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

35

CGA Approximation Proof Sketch

G’: modified G, where each node has twice memory of that in G caches data items selected by CGA and optimal

B(Optimal in G)

< B(Greedy + Optimal in G’)

= B(Greedy) + B(Optimal) w.r.t Greedy

< B(Greedy) + B(Greedy) [Due to greedy choice]

= 2 x B(Greedy)

Page 36: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

36

Distributed Greedy Algorithm (DGA)

Each node caches the most beneficial data items, where the benefit is based on “local traffic” only.

“Local Traffic” includes: Its own data requests Data requests to its data items Data requests forwarding to others

Page 37: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

37

DGA: Nearest Cache Table

Why do we need it? Forward requests to the nearest cache Local Benefit calculation

What is it? Each nodes keeps the ID of nearest cache for

each data item Entries of the form: (data item, the nearest cache) Above is on top of routing table.

Maintenance – next slide

Page 38: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

38

Maintenance of Nearest-cache Table

When node i caches data Dj

broadcast (i, Dj) to neighbors Notify server, which keeps a list of caches

On recv (i, Dj) if i is nearer than current nearest-cache of Dj,

update and forward

Page 39: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

39

Maintenance of Nearest-cache Table -II

i deletes Dj get list of caches Cj from server of Dj

broadcast (i, Dj, Cj) to neighbors

On recv (i, Dj, Cj) if i is current nearest-cache for Dj, update

using Cj and forward

Page 40: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

40

Maintenance of Nearest-cache Table -III

More details pertaining to Mobility Second-nearest cache entries (needed for

benefit calculation for cache deletions) Benefit thresholds

Page 41: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

41

Performance Evaluation

CGA vs. DGA Comparison

DGA vs. HybridCache Comparison

Page 42: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

42

CGA vs. DGA

Summary of simulation results: DGA performs quite close to CGA, for

wide range of parameter values

Page 43: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

43

Varying Number of Data Items and Memory Capacity – Transmission radius =5, number of nodes = 500

Page 44: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

44

DGA vs. Yin and Cao’s work.

Yin and Cao:[infocom’04] CacheData – caches passing-by data item CachePath – caches path to the nearest cache HybridCache – caches data if size is small

enough, otherwise caches the path to the data Only work of a purely distributed cache placement

algorithm with memory constraint

Page 45: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

45

DGA vs. HybridCache Simulation setup:

Ns2, routing protocol is DSDV Random waypoint model, 100 nodes move at a

speed within (0,20m/s), 2000m x 500m area Tr=250m, bandwidth=2Mbps

Performance metrics: Average query delay Query success ratio Total number of messages

Page 46: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

Server Model: 1000 data items, divided into two

servers. Data item size: [100, 1500] bytes

Data access models Random: Each node accesses 200 data

items randomly from the 1000 data items Spatial: (details skipped)

Naïve caching algorithm: caches any passing-by data, uses LRU for cache replacement

Page 47: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

Varying query generate time on random access pattern

Page 48: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

48

Summary of Simulation Results

Both HybridCache and DGA outperform Naïve approach

DGA outperforms HybridCache in all metrics Especially for frequent queries and small

cache size For high mobility, DGA has slightly worse

average delay, but much better query success ratio

Page 49: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

49

To Recap: Data caching problem for multiple items

under memory constraint Centralized approximation algorithm Localized distributed implementation No update or storage cost are considered

(otherwise, no performance guarantee)

Can we consider and minimize the total cost of read/write/storage ?

Page 50: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

50

3. Data Caching Under Number Constraint

Page 51: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

51

Problem Formulation Given:

Network graph G(V,E). A data item to be stored in the network Access (read) frequency for each node Write frequency for each node Caching (storage) cost for each node Number of allowable caching node: P

Goal: Select cache nodes to minimize the “total cost” Under number constraint

Page 52: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

52

Total Cost

= Total read cost + total write cost + total storage cost

= ∑ i є V (hop length between i and its nearest cache x access frequency of i)

+ ∑ i є V (cost of optimal steiner tree over i and all caches x write frequency of i)

+ ∑ i є cache nodes (storage cost at i)

Page 53: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

53

Related Work

K-median problem (access and storage cost)

Tamir attains the best time complexity in tree

We generalize it with write cost in both tree ( O(n2P3) ) and general graph Kalpakis et al. solves the same problem, with time

complexity O(n6P3)

Page 54: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

54

Tree Topology

Page 55: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

55

Tamir’s DP Algorithm on tree Tr

Transform arbitrary tree into full binary tree

Each non-leaf node v has two children: v1, v2

For each v in binary tree, compute and sort the distance from v to all nodes

“leaves to root” dynamic programming algorithm

Page 56: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

56

Our DP Algorithm

Ideal: For each node v in Tr:

the cost of sub-tree Tv =

access cost of nodes in Tv

+ storage cost of caching nodes in Tv

+ write cost of all the writer nodes in Tr due to edges in Tv

Page 57: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

57

DP Algorithm - Definitions G(v, q, r): optimal cost for subtree Tv, exact q

caches in Tv, closest to v is at most r hops away

F(v, q, r): optimal cost for Tv, exact q caches in Tv; some cache nodes outside of Tv, closest to v is r hops away

F’(v, r): optimal cost for Tv, no cache in Tv; some cache nodes outside of Tv, closest to v is r hops away

Page 58: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

58

Recursive DP Equations: p cache nodes allowed

1. G(v, q, 0) -- v is cache node= storage cost at v

+ the cost of Tv1, Tv2 + the write cost on vv1, vv2

2. G(v, q<p, r>0) – there is some cache node outside of Tv = min{ G(v, q, r-1), // there is cache in Tv r-1 hops from

v cost in “closest cache to v is r hops away” }

Page 59: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

59

Recursive DP Equations - continued

3. G(v, q=P, r>0) – no cache node outside of Tv = min{ G(v, q, r-1),

the cost of “closest cache is r hops away” }

4. F(v, q, r) – there is cache node outside of Tv= min {G(v, q, r-1),

the cost of “closest cache to v is r hops away

}

Page 60: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

60

Minimum total cost of original tree Tr = min {1≤p≤P} G(r, p, L}, L is the hops of

r to the farthest node in Tr

Time Complexity – O(n2P3) For each p, vary q from 1 to q For each (v, q), vary closest cache node to v

(n possibilities) and spit q in to Tv1, Tv2 (q such possibilities)

Page 61: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

61

Conclusion

We design optimal, near optimal and heuristics for data caching under different constraint in ad hoc and sensor networks

We show our algorithms can be implemented in distributed way

Page 62: 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

62

Questions?