Page 1
© 2012 UZH, CSG@IFI
Cooperative Database Caching within Cloud Environments
Andrei Vancea1, Guilherme Sperb Machado1, Laurent d’Orazio2, Burkhard Stiller1
1 Department of Informatics IFI, Communication Systems Group CSG, University of Zürich UZH, Switzerland
2Blaise Pascal University - LIMOS, Francevancea,[email protected] , [email protected]
AIMS, Luxembourg, Luxembourg, June 6, 2012
Page 2
© 2012 UZH, CSG@IFI
Background
Databases – Client: asks a query (SQL)– Server: returns the result (tuples)
Client-side caching– Page Caching, Tuple Caching – Semantic Caching
• Clients store the results of old queries
• Old results used for answering new queries
Page 3
© 2012 UZH, CSG@IFI
Background - Semantic Caching
QUERYREWRITING
Query
Probe Remainder
Semanticcache
Server
Queriesdescriptions
Semantic Regions– Query description– Result set
Query rewriting– Probe– Remainder
Page 4
© 2012 UZH, CSG@IFI
Database Caching & Cloud Computing
Most cloud providers charge data transfer between cloud environment and “outside world” in a pay-as-you-go matter
Database caching within cloud environment– Improves performance– Economic benefits
• Amount of data transferred decreases
Payments for data transferred reduced
Page 5
© 2012 UZH, CSG@IFI
Approach
Page 6
© 2012 UZH, CSG@IFI
Cooperative Semantic Caching
Share local semantic caches between clients
Use cache entries of other clients
Performance improvements
Sem
antic
Ca
che
Sem
antic
Ca
che
Sem
antic
Ca
che
Page 7
© 2012 UZH, CSG@IFI
Cooperative Semantic Caching
Q1 : select * from persons where age > 10
Q3 : select * from persons where age > 7
result
select * from persons where age > 7 and age <= 10
R1 : age > 10
result
resultselect * from R1
Page 8
© 2012 UZH, CSG@IFI
Potential Use Cases
GIS (Geographic Information System) storage– Large amount of data (e.g. seismic events)– Processing done on client side – Two-dimensional range selections (area)
NetFlow-based architectures– Routers collect flow records and store them in databases– Analyzers (intrusion detection, accounting,… ) access them– Range selections (Start Time, IP)
Page 9
© 2012 UZH, CSG@IFI
Query Rewriting
Query rewriting– Probe– Remote probes– Remainder QUERY
REWRITING
Query
Probe Remainder
LocalSemantic
cache
Server
All queriesdescriptions
Remote probe
RemoteSemantic
cache
Remote probe
RemoteSemantic
cache
. . .
Page 10
© 2012 UZH, CSG@IFI
System Design
Page 11
© 2012 UZH, CSG@IFI
CoopSC
CoopCooperative SSemantic CCaching Query types
– Selection (n-Dimensional range predicates)– select id, name, age from persons where 20 < age and
age < 30 Cache organization
– Semantic regions– Distributed Index – built on top of a P2P overlay
Page 12
© 2012 UZH, CSG@IFI
CoopSC - Query Rewriting
Local Rewriting– Probe
– Local Remainder
• Portion of the query which is
not available in the local cache
Distributed Rewriting– Remote Probes
– Remainder
Query
Local Cache
RemoteProbe
RemoteProbe
Remainder
…
Probe
Local Rewriting
Local Remainder
Distributed RewritingDistributed
Index
Page 13
© 2012 UZH, CSG@IFI
Distributed Index
Built on top of P2P overlay Regions and queries represented as
rectangular shapes MX-CIF Quad Tree
– Efficiently find intersection between rectangular shapes
Each region is indexed in the smallest quad which totally contains it
Easy to adapt to n-Dimensional regions/queries
Page 14
© 2012 UZH, CSG@IFI
Update Handling
Issues– Invalidation of old entries– Combining different snapshots can generate inconsistencies
Quad space division (specified update level) Virtual timestamps stored in database Each modification increments the virtual timestamp of
corresponding quad Regions store virtual timestamps of quads that they
intersect
Page 15
© 2012 UZH, CSG@IFI
Cloud Computing Scenarios
Page 16
© 2012 UZH, CSG@IFI
Cloud Scenario A
Database server running outside the cloud
Clients located inside in the cloud
Non-operational use cases– Example: cloud environment
used for running scientific experiments
Page 17
© 2012 UZH, CSG@IFI
Cloud Scenario B
Database server running inside the cloud
Clients located inside in the cloud
Operational use cases– Example: corporation
using cloud environment as an alternative to building a datacenter
Page 18
© 2012 UZH, CSG@IFI
Evaluation
Page 19
© 2012 UZH, CSG@IFI
Experiment Design
Measurements– Response time– Amount of data transferred– Payments for data transfer
Experiments – Cache size– Update level
Testing sessions– 5 select testing sessions (50 queries each)– Update sessions interleaved
Page 20
© 2012 UZH, CSG@IFI
Evaluation
Wisconsin benchmark dataset (10.000.000 tuples) Scenario A
– Database Server: Zurich testbed– 5 Client: Rackspace
Scenario B– Database server
• Amazon EC2
– 5 Clients: EmanicsLab Queries
– About 10.000 tuples– Semantic locality
Page 21
© 2012 UZH, CSG@IFI
Scenario A
Page 22
© 2012 UZH, CSG@IFI
Data transferred/Payments
CoopSC significantly reduces the number of tuples sent by database server
Amount of money also reduced
Page 23
© 2012 UZH, CSG@IFI
Response Time
Rackspace behaves unstable
No performance improvements noticed
Page 24
© 2012 UZH, CSG@IFI
Scenario B
Page 25
© 2012 UZH, CSG@IFI
Data transferred/Payments
CoopSC significantly reduces the number of tuples sent by database server
Bandwidth payments also reduced
Page 26
© 2012 UZH, CSG@IFI
Response Time
CoopSC improves response time
Page 27
© 2012 UZH, CSG@IFI
Data transferred/Payments (Updates)
Good behavior for low update rate
Economic and performance benefits
Page 28
© 2012 UZH, CSG@IFI
Response Times (Updates)
Response increases with the grow of update rate
Page 29
© 2012 UZH, CSG@IFI
Summary & Conclusion
Summary– Cooperative caching approach used for reducing the load of
the database server
– Update statements supported
– CoopSC applied in the context of cloud environments CoopSC reduces the amount of data transferred
between cloud and outside world which has economic benefits
Performance benefits as long as cloud providers are stable
Page 30
© 2012 UZH, CSG@IFI
Questions?
Page 31
© 2012 UZH, CSG@IFI
Update Handling - Algorithm
procedure Execute(query)quads = query.getIntersecteQuad(updateLevel);
before = database.getTimestamps(quads);
plan = rewrite(query, before);result = plan.execute();
after = database.getTimestamps(quads);
if (before == after) return result;
elseresult database.execute(query);