Improving Search Efficiency Using Bloom Filters in Partially Connected Ad Hoc Networks: A Node-centric Analysis Wing Ho Yuen and Henning Schulzrinne Department of Computer Science Columbia University www.cs.columbia.edu/IRT/projects
Jan 25, 2016
Improving Search Efficiency Using Bloom Filters in
Partially Connected Ad Hoc Networks:A Node-centric Analysis
Wing Ho Yuen and
Henning Schulzrinne
Department of Computer ScienceColumbia University
www.cs.columbia.edu/IRT/projects
7DS Application
• Motivation: Internet access is not ubiquitous– More Wi-Fi hotspots?– Ad hoc network to extend coverage of hotspots?
• Node density insufficient to sustain connected networks– Instead of dropping packets, should store and
forward in node and AP encounters
• Goal– To emulate Internet services when users are
disconnected such as email delivery and web access
3 Categories of 7DS Application
Upload AppEmail delivery
P2P Appmusicexchanges
Download AppSubway mapdownload
Data Retrieval Problem
Push-based • Data holder (DH) acts as server
• Data querier (DQ) acts as client
• Push-based – Small query overhead– Summary overhead
• Pull-based (Query-based)– Large query
overhead
Pull-based
summary querydata
querydata
DQDH
DH DQ
Bloom Filter
• Used in supporting membership queries– DH transmits a Bloom filter– DQ queries an object only if a match occurs
• Data Structure– Bloom filter consists of a binary m-bit vector
– DH has n data objects x1,…,xn
– Each object hashed by k independent hash functions h1,…,hk with range{0,1,2…,m-1}
– if h(x1)=a, set BF[a]=1
Bloom Filter Example
0 1 0 0 1 0 1 0 1 0 1 0
0 1 0 0 1 0 1 0 1 0 1 0
x1x2
y1x2
Bloom filter length (m=12)# hash functions (k=3)
Data Holder has {x1,x2},
Data Querier wants y1,x2,y2
Testing with Bloom filter
y1 not availablex2 is available(false negative not possible)Y2 is available(false positive possible)
y2
Single Neighbor Scenario
• One DH; one DQ• Utilization
– Fraction of time used for data transmission– measures inefficiency due to query overhead
and Bloom Filter overhead– Assuming query success probability is
constant
• ResultE[Z]: mean data Tx time E[Y]: mean query
Tx time
DH DQ
-
• E[T]: expected connection time
• Small # hash functions, small complexity
• High utilization• Small Bloom filter
overhead
Utilization
E[T]E[T]
E[T]
BF length
#hash fcns
Multiple Neighbors Scenario
• Querying is more effective– More DHs to answer a query
• Multiple Bloom filter transmissions in a single busy period of an observer node
• Node-centric model
DH
DQ
DQ
DQ
DQ
DQ
BFBFBFBFBF
DHBFBFBFBF
DHBFBFBF
DHDQ
DH DH
DH
Search Efficiency
• Timeline
• Search Efficiency
– Fraction of effective busy time fraction of green colored blocks over 1 cycle
– utilization gives the fraction of data transmission time in the green colored blocks
BF BF BFK=0 K=1 K=2
BF
B I1 cycle
Queueing Formulation
• Observer node is a server, providing service to every node in range
• Arrival occurs when a node enters observer range
• N(t) neighbors receives service at time t• Modeled by M/M/∞ queue
– n neighbors, departure rate is n – N(t) is the system state =/ is the average number of nodes seen by
observer
0 1 2 3 4 5 6
32 54 6 7
Bloom Filter Based Scheme
Binit Bsub,1 Bsub,2 Bsub,KtK tK+1
TBFData+Query
Sub-busy period begins at random N(tk), e.g. N(t1)=3Busy period ends at tK+1 when N(tk+1)=0
t0 t1 t3t2
Efficiency vs.
High bandwidthscenario
Low bandwidthscenario
Conclusion
• Push is better than pull– Suitable for web access where query
success probability is small
• Node-centric model is more versatile than location-centric model– Realistic mobility model
• Both node encounters and residence time can be measured online
– Realistic interference model• Poisson field of interferers rather than collocated
nodes