HOLISTIC OPTIMIZATION BY PREFETCHING QUERY RESULTS Karthik Ramachandra & S. Sudarshan Indian Institute of Technology Bombay SUPPORTED BY MSR India PhD fellowship Yahoo! Key Scientific Challenges Award 2011
Feb 23, 2016
HOLISTIC OPTIMIZATION BY PREFETCHING QUERY RESULTS
Karthik Ramachandra & S. SudarshanIndian Institute of Technology Bombay
SUPPORTED BY MSR India PhD fellowship Yahoo! Key Scientific Challenges Award 2011
2
THE LATENCY PROBLEM Applications that interact with databases/web
services experience lot of latency due to Network round trips to the data source Disk IO and processing at the data source
ApplicationDatabase
Disk IO and query executionNetwork time
Query
Result
3
MOTIVATION
Multiple queries could be issued concurrently Allows the database to share work across multiple queries
Application performs other processing while query executes Significantly reduces the impact of network round-trip and
server/disk IO latency
Performance of applications can be significantly improved by prefetching query results.
Manually inserting prefetch instructions is hard. Need to identify earliest and safe points in the code to perform
prefetching For queries within nested procedures prefetching has to be done
in the calling procedure to get benefits Hard to manually maintain as code changes occurOur Goal: Automate the insertion of prefetches
4
EXAMPLE OF PREFETCHING
executeQuery () – normal execute query submit() – non-blocking call that initiates query and returns
immediately; once the results arrive, they are stored in a cache executeQuery() – checks the cache and blocks if results are
not yet available
for (…) {
… genReport(custId, city);}void genReport(int custId, String city) { city = … while (…){
… } rs1 = executeQuery(q1, custId); rs2 = executeQuery(q2, city); …}
for (…) {
… genReport(custId, city);}void genReport(int custId, String city) { submit(q1, custId); city = … submit(q2, city); while (…){
… } rs1 = executeQuery(q1, custId); rs2 = executeQuery(q2, city); …}
5
EXAMPLE OF PREFETCHINGfor (…) {
… genReport(custId, city);}void genReport(int custId, String city) { city = … while (…){
… } rs1 = executeQuery(q1, custId); rs2 = executeQuery(q2, city); …}
for (…) {
… genReport(custId, city);}void genReport(int custId, String city) { submit(q1, custId); city = … submit(q2, city); while (…){
… } rs1 = executeQuery(q1, custId); rs2 = executeQuery(q2, city); …}
What is the earliest point when we can prefetch? Will prefetch potentially get wasted?
6
EXAMPLE OF PREFETCHINGfor (…) {
… genReport(custId, city);}void genReport(int custId, String city) { city = … while (…){
… } rs1 = executeQuery(q1, custId); rs2 = executeQuery(q2, city); …}
for (…) { submit(q1, custId); … genReport(custId, city);}void genReport(int custId, String city) { city = … submit(q2, city); while (…){
… } rs1 = executeQuery(q1, custId); rs2 = executeQuery(q2, city); …}
What is the earliest point when we can prefetch? Will prefetch potentially get wasted? Intra- vs. Inter- procedural prefetching
7
RELATED WORK Software prefetching extensively used in
compilers, databases and other areas of computer science
Predicting future accesses is based on Spatial and temporal locality Request patterns and statistical methods Static analysis
Query result prefetching based on request patterns
Fido (Palmer et.al 1991), AutoFetch (Ibrahim et.al ECOOP 2006), Scalpel (Bowman et.al. ICDE 2007), etc.
Predict future queries using traces, traversal profiling, logs
Missed opportunities due to limited applicability Potential for wasted prefetches
8
STATIC ANALYSIS BASED APPROACHES
Manjhi et. al. 2009 – insert prefetches based on static analysis No details of how to automate Only consider straight line
intraprocedural code Prefetches may go waste
Our earlier work Guravannavar et. al. VLDB 08 –
given query in a loop, rewrite loop to create batched query Chavan et. al. ICDE 11 – as above but using asynchronous
query submission Consider: Loop calls procedure, which executes query
Common in many database applications Our earlier work is applicable, but requires very intrusive
rewriting of procedures
getAllReports() { for (custId in …) { … genReport(custId); }}void genReport(int cId) { … r = executeQuery(q, cId); …}
9
OUR CONTRIBUTIONS IN THIS PAPER
Prefetching algorithm purely based on static analysis Inserts prefetches at earliest possible point in the
program Uses notion of query anticipability; no wasted prefetches*
Works in the presence of loops and interprocedural code Enhancements that optimize prefetches
Code motion, chaining and rewriting prefetch requests Increasing applicability
Integrating with loop fission Applicability for Hibernate, Web Services
Experimental study on real world applications
* except due to exceptions
11
INTRAPROCEDURAL PREFETCHING
void report(int cId,String city){ city = … while (…){ … } c = executeQuery(q1, cId); d = executeQuery(q2, city); …}
Approach: Identify valid points of prefetch
insertion within a procedure Place prefetch request submit(q, p)
at the earliest point
Valid points of insertion of prefetch All the parameters of the query
should be available, with no intervening assignments No intervening updates to the database Should be guaranteed that the query will be
executed subsequently Systematically found using Query Anticipability analysis
extension of a dataflow analysis technique called anticipable expressions analysis
13
QUERY ANTICIPABILITY ANALYSIS
void report(int cId,String city){ city = … while (…){ … } rs1 = executeQuery(q1, cId); rs2 = executeQuery(q2, city); …}
Bit vector = (q1,q2) = anticipable (valid) = not anticipable (invalid) Backward data flow in the
control flow graph
n1n2n3
n4n5
( , )
( , )
( , )( , )
( , )
( , )( , )
( , )
( , )
( , )( , )
start
n1
n2
n4
n5
n3
end
( , )
14
QUERY ANTICIPABILITY ANALYSISDefinition 3.1. A query execution statement q is anticipable at a program point u if every path in the CFG from u to End contains an execution of q which is not preceded by any statement that modifies the parameters of q or affects the results of q.
Data flow information Stored as bit vectors (1 bit per query) Propagated against the direction of control flow
(Backward Dataflow Analysis) Captured by a system of data flow equations Solve equations iteratively till a fixpoint is
reached Details in the paper
u
End
q
15
Data dependence barriers Due to assignment to
query parameters or UPDATEs
Append prefetch to the barrier statement
Control dependence barriers Due to conditional
branching(if-else or loops)
Prepend prefetch to the barrier statement
INTRAPROCEDURAL PREFETCH INSERTION Analysis identifies all points in the program where q
is anticipable; we are interested in earliest points
n1: x =…
n2
nq: executeQuery(q,x)
n2
nq: executeQuery(q,x)n3
n1: if(…)
submit(q,x)submit(q,x)
16
INTRAPROCEDURAL PREFETCH INSERTION
q2 only achieves overlap with the loop q1 can be prefetched at the beginning of the
method
void report(int cId,String city){
city = …
while (…){ … } rs1 = executeQuery(q1, cId); rs2 = executeQuery(q2, city); …}
void report(int cId,String city){ submit(q1, cId); city = … submit(q2, city); while (…){ … } rs1 = executeQuery(q1, cId); rs2 = executeQuery(q2, city); …}
17
INTRAPROCEDURAL PREFETCH INSERTION
q2 only achieves overlap with the loop q1 can be prefetched at the beginning of the
method Can be moved to the method that invokes report()
void report(int cId,String city){
city = …
while (…){ … } rs1 = executeQuery(q1, cId); rs2 = executeQuery(q2, city); …}
void report(int cId,String city){ submit(q1, cId); city = … submit(q2, city); while (…){ … } rs1 = executeQuery(q1, cId); rs2 = executeQuery(q2, city); …}
18
INTERPROCEDURAL PREFETCHING
Benefits of prefetching can be greatly improved by moving prefetches across method invocations
Intuition: if a prefetch can be submitted at the beginning of a procedure, it can instead be moved to all its call sites
Use call graph of the program, and CFGs of all procedures
Assumption: Call graph is a DAG (we currently do not handle recursive calls)
19
INTERPROCEDURAL PREFETCHING ALGORITHM (INTUITION)
Iterate through the vertices of the call graph in reverse topological order
Perform intraprocedural prefetching at each method M If first statement is a submit(), then move it to all
callers of M at the point of invocation Replace formal parameters with actual arguments
void generateAllReports() { … genReport(custId, city);}void genReport(int cId, String city) { submit(q2, cId); … rs1 = executeQuery(q2, cId); …}
void generateAllReports() { … submit(q2, custId); genReport(custId, city);}void genReport(int cId, String city) { … rs1 = executeQuery(q2, cId); …}
20
PREFETCHING ALGORITHM: SUMMARY Our algorithms ensure that:
The resulting program preserves equivalence with the original program.
All existing statements of the program remain unchanged.
No prefetch request is wasted. At times, prefetches may lead to no benefits
Enhancements to getbeneficial prefetcheseven in presence ofbarriers
Equivalence preserving program and query transformations
void proc(int cId){ int x = …; while (…){ … } if (x > 10) c = executeQuery(q1, cId); …}
21
ENHANCEMENTS
INCREASING APPLICABILITY
SYSTEM DESIGN AND EXPERIMENTAL EVALUATION
PREFETCH INSERTION ALGORITHM
22
1. TRANSITIVE CODE MOTION (STRONG ANTICIPABILITY)
General Algorithm: Control dependence barrier:
Transform it into a data dependence barrier by rewriting it as a guarded statement
Data dependence barrier: Apply anticipability analysis on the barrier statements Move the barrier to its earliest point followed by the prefetch
void genReport(int cId){ int x = …;
while (…){ … } if (x > 10) rs1 = executeQuery(q1, cId); …}
void genReport(int cId){ int x = …; boolean b = (x > 10); if (b) submit(q1, cId); while (…){ … } if (b) rs1 = executeQuery(q1, cId); …}
23
2. CHAINING PREFETCH REQUESTS Output of a query forms a parameter to another –
commonly encountered Prefetch of query 2 can be issued immediately
after results of query 1 are available. submitChain similar to submit ; details in paper
void report(int cId,String city){ … c = executeQuery(q1, cId); while (c.next()){ accId = c.getString(“accId”); d = executeQuery(q2, accId); }}
void report(int cId,String city){ submitChain({q1, q2’}, {{cId}, {}}); … c = executeQuery(q1, cId); while (c.next()){ accId = c.getString(“accId”); d = executeQuery(q2, accId); }}
q2’ is q2 with its ? replaced by q1.accId
q2 cannot be beneficially prefetchedas it depends on accId which comesfrom q1
Typo in paper: In Section 5.2 on chaining and in Figure 10: replace all occurrences of q2 by q4
24
3. REWRITING CHAINED PREFETCH REQUESTS
Chained SQL queries have correlating parameters between them (q1.accId)
Can be used to rewrite them into one query using known techniques such as OUTER APPLY or LEFT OUTER LATERAL operators
Results are split into individual result sets in cache Reduces network round trips, aids in selection of set
oriented query plans
submitChain({“SELECT * FROM accounts WHERE custid=?”, “SELECT * FROM transactions WHERE
accId=:q1.accId”}, {{cId}, {}});
SELECT ∗FROM (SELECT ∗ FROM accounts WHERE custId = ?)
OUTER APPLY(SELECT ∗ FROM transactions WHERE transactions.accId =
account.accId)
25
INCREASING APPLICABILITY
PREFETCH INSERTION ALGORITHM
ENHANCEMENTS
SYSTEM DESIGN AND EXPERIMENTAL EVALUATION
26
INTEGRATION WITH LOOP FISSION
for (…) { … genReport(custId);}
void genReport(int cId) { … r=executeQuery(q, cId); …}
for (…) { … submit(q,cId); genReport(custId);}
void genReport(int cId) { … r=executeQuery(q, cId); …}
for (…) { … addBatch(q, cId);}submitBatch(q);for (…) { genReport(custId);}
void genReport(int cId) { … r=executeQuery(q, cId); …}
Original program Loop Fission
Interprocedural Prefetching enables our earlier work (VLDB08 and ICDE11) on loop fission for Batching/Asynchronous submission
27
HIBERNATE AND WEB SERVICES Lot of enterprise and web applications
Are backed by O/R mappers like Hibernate They use the Hibernate API which internally generate
SQL Well known performance problems when accessing data
in a loop Are built on Web Services
Typically accessed using APIs that wrap HTTP requests and responses
To apply our techniques here, Transformation algorithm has to be aware of the
underlying data access API Runtime support to issue asynchronous prefetches
28
SYSTEM DESIGN AND EXPERIMENTAL EVALUATION
PREFETCH INSERTION ALGORITHM
ENHANCEMENTS
INCREASING APPLICABILITY
29
SYSTEM DESIGN: DBRIDGE Our techniques have been incorporated into the
DBridge holistic optimization tool Two components:
Java source-to-source program Transformer Uses SOOT framework for static analysis and
transformation (http://www.sable.mcgill.ca/soot/) Preserves readability
Prefetch API (Runtime library) For issuing prefetch requests Thread and cache management Can be used with manual writing/rewriting or automatic
rewriting by DBridge transformer Currently works for JDBC API; being extended
for Hibernate and Web services
30
EXPERIMENTS Conducted on 4 applications
Two public benchmark applications (Java/JDBC) A real world commercial ERP application(Java/JDBC) Twitter Dashboard application (Java/Web Service)
Environments A widely used commercial database system – SYS1 PostgreSQL
Both running on a 64 bit dual-core machine with 4 GB of RAM
31
AUCTION APPLICATION (JAVA/JDBC): INTRAPROCEDURAL PREFETCHING
Single procedure with nested loop Overlap of loop achieved; varying iterations of outer loop Consistent 50% improvement
for(…) { for(…) { … } exec(q);}
for(…) { submit(q); for(…) { … } exec(q);}
32
WEB SERVICE (HTTP/JSON): INTERPROCEDURAL PREFETCHING
Twitter dashboard: monitors 4 keywords for new tweets (uses Twitter4j library)
Interprocedural prefetching; no rewrite possible 75% improvement at 4 threads Server time constant; network overlap leads to significant gain
Note: Our system does not automatically rewrite web service programs, this example was manually rewritten using our algorithms
33
ERP APPLICATION: IMPACT OF OUR TECHNIQUES
Intraprocedural: moderate gains Interprocedural: substantial gains (25-30%) Enhanced (with rewrite): significant gain(50% over Inter) Shows how these techniques work together
34
FUTURE WORK Which calls to prefetch? And where to place
them? Cost-based speculative prefetching Implementation
Cross-thread transaction support in runtime library Completely support Hibernate, web services
CONCLUSION Automatically prefetching query results using
static analysis is widely applicable in many real
applications can lead to significant gains.
35
MORE QUESTIONS?
Today, 15:00 – 16:30PODS/SIGMOD Research Plenary Poster SessionLocation: Vaquero Ballroom B–C
PROJECT WEBSITE: http://www.cse.iitb.ac.in/infolab/dbridge
QUESTIONS?