Cooperative Database Caching within Cloud Environments

© 2012 UZH, CSG@IFI

Cooperative Database Caching within Cloud Environments

Andrei Vancea1, Guilherme Sperb Machado1, Laurent d’Orazio2, Burkhard Stiller1

1 Department of Informatics IFI, Communication Systems Group CSG, University of Zürich UZH, Switzerland

2Blaise Pascal University - LIMOS, Francevancea,[email protected], [email protected]

AIMS, Luxembourg, Luxembourg, June 6, 2012


Background

Databases – Client: asks a query (SQL)– Server: returns the result (tuples)

Client-side caching– Page Caching, Tuple Caching – Semantic Caching

• Clients store the results of old queries

• Old results used for answering new queries


Background - Semantic Caching

QUERYREWRITING

Query

Probe Remainder

Semanticcache

Server

Queriesdescriptions

Semantic Regions– Query description– Result set

Query rewriting– Probe– Remainder


Database Caching & Cloud Computing

Most cloud providers charge data transfer between cloud environment and “outside world” in a pay-as-you-go matter

Database caching within cloud environment– Improves performance– Economic benefits

• Amount of data transferred decreases

Payments for data transferred reduced


Approach


Cooperative Semantic Caching

Share local semantic caches between clients

Use cache entries of other clients

Performance improvements

Sem

antic

Ca

che

Sem

antic

Ca

che

Sem

antic

Ca

che


Cooperative Semantic Caching

Q1 : select * from persons where age > 10

Q3 : select * from persons where age > 7

result

select * from persons where age > 7 and age <= 10

R1 : age > 10

result

resultselect * from R1


Potential Use Cases

GIS (Geographic Information System) storage– Large amount of data (e.g. seismic events)– Processing done on client side – Two-dimensional range selections (area)

NetFlow-based architectures– Routers collect flow records and store them in databases– Analyzers (intrusion detection, accounting,… ) access them– Range selections (Start Time, IP)


Query Rewriting

Query rewriting– Probe– Remote probes– Remainder QUERY

REWRITING

Query

Probe Remainder

LocalSemantic

cache

Server

All queriesdescriptions

Remote probe

RemoteSemantic

cache

Remote probe

RemoteSemantic

cache

. . .


System Design


CoopSC

CoopCooperative SSemantic CCaching Query types

– Selection (n-Dimensional range predicates)– select id, name, age from persons where 20 < age and

age < 30 Cache organization

– Semantic regions– Distributed Index – built on top of a P2P overlay


CoopSC - Query Rewriting

Local Rewriting– Probe

– Local Remainder

• Portion of the query which is

not available in the local cache

Distributed Rewriting– Remote Probes

– Remainder

Query

Local Cache

RemoteProbe

RemoteProbe

Remainder

…

Probe

Local Rewriting

Local Remainder

Distributed RewritingDistributed

Index


Distributed Index

Built on top of P2P overlay Regions and queries represented as

rectangular shapes MX-CIF Quad Tree

– Efficiently find intersection between rectangular shapes

Each region is indexed in the smallest quad which totally contains it

Easy to adapt to n-Dimensional regions/queries


Update Handling

Issues– Invalidation of old entries– Combining different snapshots can generate inconsistencies

Quad space division (specified update level) Virtual timestamps stored in database Each modification increments the virtual timestamp of

corresponding quad Regions store virtual timestamps of quads that they

intersect


Cloud Computing Scenarios


Cloud Scenario A

Database server running outside the cloud

Clients located inside in the cloud

Non-operational use cases– Example: cloud environment

used for running scientific experiments


Cloud Scenario B

Database server running inside the cloud

Clients located inside in the cloud

Operational use cases– Example: corporation

using cloud environment as an alternative to building a datacenter


Evaluation


Experiment Design

Measurements– Response time– Amount of data transferred– Payments for data transfer

Experiments – Cache size– Update level

Testing sessions– 5 select testing sessions (50 queries each)– Update sessions interleaved


Evaluation

Wisconsin benchmark dataset (10.000.000 tuples) Scenario A

– Database Server: Zurich testbed– 5 Client: Rackspace

Scenario B– Database server

• Amazon EC2

– 5 Clients: EmanicsLab Queries

– About 10.000 tuples– Semantic locality


Scenario A


Data transferred/Payments

CoopSC significantly reduces the number of tuples sent by database server

Amount of money also reduced


Response Time

Rackspace behaves unstable

No performance improvements noticed


Scenario B


Data transferred/Payments

CoopSC significantly reduces the number of tuples sent by database server

Bandwidth payments also reduced


Response Time

CoopSC improves response time


Data transferred/Payments (Updates)

Good behavior for low update rate

Economic and performance benefits


Response Times (Updates)

Response increases with the grow of update rate


Summary & Conclusion

Summary– Cooperative caching approach used for reducing the load of

the database server

– Update statements supported

– CoopSC applied in the context of cloud environments CoopSC reduces the amount of data transferred

between cloud and outside world which has economic benefits

Performance benefits as long as cloud providers are stable


Questions?


Update Handling - Algorithm

procedure Execute(query)quads = query.getIntersecteQuad(updateLevel);

before = database.getTimestamps(quads);

plan = rewrite(query, before);result = plan.execute();

after = database.getTimestamps(quads);

if (before == after) return result;

elseresult database.execute(query);

Cooperative Database Caching within Cloud Environments

Technology