Wing Ho Yuen and Henning Schulzrinne Department of Computer Science Columbia University

Improving Search Efficiency Using Bloom Filters in

Partially Connected Ad Hoc Networks:A Node-centric Analysis

Wing Ho Yuen and

Henning Schulzrinne

Department of Computer ScienceColumbia University

www.cs.columbia.edu/IRT/projects

7DS Application

• Motivation: Internet access is not ubiquitous– More Wi-Fi hotspots?– Ad hoc network to extend coverage of hotspots?

• Node density insufficient to sustain connected networks– Instead of dropping packets, should store and

forward in node and AP encounters

• Goal– To emulate Internet services when users are

disconnected such as email delivery and web access

3 Categories of 7DS Application

Upload AppEmail delivery

P2P Appmusicexchanges

Download AppSubway mapdownload

Data Retrieval Problem

Push-based • Data holder (DH) acts as server

• Data querier (DQ) acts as client

• Push-based – Small query overhead– Summary overhead

• Pull-based (Query-based)– Large query

overhead

Pull-based

summary querydata

querydata

DQDH

DH DQ

Bloom Filter

• Used in supporting membership queries– DH transmits a Bloom filter– DQ queries an object only if a match occurs

• Data Structure– Bloom filter consists of a binary m-bit vector

– DH has n data objects x1,…,xn

– Each object hashed by k independent hash functions h1,…,hk with range{0,1,2…,m-1}

– if h(x1)=a, set BF[a]=1

Bloom Filter Example

0 1 0 0 1 0 1 0 1 0 1 0

0 1 0 0 1 0 1 0 1 0 1 0

x1x2

y1x2

Bloom filter length (m=12)# hash functions (k=3)

Data Holder has {x1,x2},

Data Querier wants y1,x2,y2

Testing with Bloom filter

y1 not availablex2 is available(false negative not possible)Y2 is available(false positive possible)

y2

Single Neighbor Scenario

• One DH; one DQ• Utilization

– Fraction of time used for data transmission– measures inefficiency due to query overhead

and Bloom Filter overhead– Assuming query success probability is

constant

• ResultE[Z]: mean data Tx time E[Y]: mean query

Tx time

DH DQ

-

• E[T]: expected connection time

• Small # hash functions, small complexity

• High utilization• Small Bloom filter

overhead

Utilization

E[T]E[T]

E[T]

BF length

#hash fcns

Multiple Neighbors Scenario

• Querying is more effective– More DHs to answer a query

• Multiple Bloom filter transmissions in a single busy period of an observer node

• Node-centric model

DH

DQ

DQ

DQ

DQ

DQ

BFBFBFBFBF

DHBFBFBFBF

DHBFBFBF

DHDQ

DH DH

DH

Search Efficiency

• Timeline

• Search Efficiency

– Fraction of effective busy time fraction of green colored blocks over 1 cycle

– utilization gives the fraction of data transmission time in the green colored blocks

BF BF BFK=0 K=1 K=2

BF

B I1 cycle

Queueing Formulation

• Observer node is a server, providing service to every node in range

• Arrival occurs when a node enters observer range

• N(t) neighbors receives service at time t• Modeled by M/M/∞ queue

– n neighbors, departure rate is n – N(t) is the system state =/ is the average number of nodes seen by

observer

0 1 2 3 4 5 6

32 54 6 7

Bloom Filter Based Scheme

Binit Bsub,1 Bsub,2 Bsub,KtK tK+1

TBFData+Query

Sub-busy period begins at random N(tk), e.g. N(t1)=3Busy period ends at tK+1 when N(tk+1)=0

t0 t1 t3t2

Efficiency vs.

High bandwidthscenario

Low bandwidthscenario

Conclusion

• Push is better than pull– Suitable for web access where query

success probability is small

• Node-centric model is more versatile than location-centric model– Realistic mobility model

• Both node encounters and residence time can be measured online

– Realistic interference model• Poisson field of interferers rather than collocated

nodes

Wing Ho Yuen and Henning Schulzrinne Department of Computer Science Columbia University

Documents