24/06/2004 Programming Language Design and Implementation
1
Optimizations in XSLT
http://www-sato.cc.u-tokyo.ac.jp/schuko/XSLT-opt.ppt
24/June/04
24/06/2004 Programming Language Design and Implementation
2
Evaluation of Stylesheets
• 1. Find the template which matches ‘/’2. Evaluate the template
Evaluate each child in orderif (child is LiteralResultElement) {
Make the same node, and Evaluate Children as its children(Depends on Recursive Structure of TREEs)
} else
24/06/2004 Programming Language Design and Implementation
3
Need for Optimization
• Example: Matrix Multiplication(Itanium2 1.5GHz 6MB 3rd Cache:
Intel Fortran Ver. 8.0)
Option -O0 -O1 -O2 -O3
MFLOPS 8.53 94.8 140.1 3762.2
24/06/2004 Programming Language Design and Implementation
4
List of Optimizations
• -O2– Architecture Specific Optimizations such as Gl
obal code scheduling, software pipelining, predication, speculation.
– Inlining of intrinsics.– Architecture Independent Optimizations such a
s
24/06/2004 Programming Language Design and Implementation
5
List of Optimizations (2)
• Higher Level Optimization– Constant propagation, copy propagation, dead-code
elimination, global register allocation, global instruction scheduling, control speculation, loop unrolling, code selection, partial redundancy elimination, strength reduction, induction variable simplification, variable renaming, exception optimization, tail recursion elimination, peephole optimization, structure assignment lowering, dead store elimination
24/06/2004 Programming Language Design and Implementation
6
List of Optimizations (3)
• -O3– Prefetching, scalar replacement– Loop transformations
Source of Performance Gain in most Technical Computing.
24/06/2004 Programming Language Design and Implementation
7
Points of Optimizations
• They are NEVER magic or ad-hoc technologies.– Program Analysis– Dataflow Equation based Global Analysis– Symbolic Evaluation/Partial Evaluation
Semantics Based Optimizations.
24/06/2004 Programming Language Design and Implementation
8
Points of Optimizations(2)
• Architecture Specific Optimizations– New Features of Architectures
• SuperScalar/VLIW
• Vector Processing
• Speculation
• Prefetching
• …
24/06/2004 Programming Language Design and Implementation
9
Points of Optimizations(3)
• Source-to-source conversion
• Accelerator + meta instruction
• Algorithm transformation
24/06/2004 Programming Language Design and Implementation
10
Source-to-Source Conversion
do I
do J
do K
c(I,j) = c(I,j) +
a(I,k)*b(k,j)
end do
end do
end do
do J
do K
do I
c(I,j)=c(I,j)+
a(I,k)*b(k,j)
end do
end do
end do
24/06/2004 Programming Language Design and Implementation
11
Accelerator + meta instruction
do I
do J
do K
c(I,j) = c(I,j) +
a(I,k)*b(k,j)
end do
end do
end do
!$omp parallel do
do I
do J
do K
c(I,j) = c(I,j) +
a(I,k)*b(k,j)
end do
end do
end do
24/06/2004 Programming Language Design and Implementation
12
Algorithm Transformation
Call Bubblesort(a) Call quicksort(a)
Needs to Ensure the transformation Preserves program semantics
24/06/2004 Programming Language Design and Implementation
13
Optimization and Tuning
• Tune – Adjust an engine to run smoothly– Performance tuning– Human side Job
• Optimize – make the most effective use of– Performance or other metrics– Complicated in general – Computer side Job– A Kind of (automatic) Program transformation
• They are Very Similar.
• Profiling is Critical for Tuning
24/06/2004 Programming Language Design and Implementation
14
Tuning
• Find HOT SPOT– Most resource consuming part
• Profiling Tools
profiling by sampling– cc –p gcc –pg– prof gprof
24/06/2004 Programming Language Design and Implementation
15
Tuning(2)
• Profiling with hardware support– Most modern processors
• instruction counts
• CPU time
• Cache miss rate/cache hit rate
• Hardware utilization (vector unit etc.)
24/06/2004 Programming Language Design and Implementation
16
HOT SPOTS in XSLT Evaluation
• Tuning of XSLT Engine + Stylesheet Optimization Performance Improvement
• HOT SPOTS in XSLT Evaluation =– Evaluation of XPATH Expression– Template Instantiation
(IN GENERAL, WHERE LOOP EXISTS)
24/06/2004 Programming Language Design and Implementation
17
Template Instantiation
• Evaluation of <xsl:apply-templates/>For each (target node)
1. select matching template (query required)2. make frame
3. call template
24/06/2004 Programming Language Design and Implementation
18
Procedure Call Optimization
• Interprocedural Optimization– Dataflow across Calls
• Inlining (Inline Expansion, Procedure Integration)– Save Call Overhead
• Tail Call Elimination– Save Frame Overhead
24/06/2004 Programming Language Design and Implementation
19
Inlining
void a()
{
b(2);
}
void b(int x)
{
printf(“%d\n”, x+1);
}
Void a()
{
printf(“%d\n”, 2+1);
}
Void b(int x)
{
printf(“%d\n”, x+1);
}
24/06/2004 Programming Language Design and Implementation
20
Effect of Inlining
• Call Overhead Reduction– Execution of Call:
• Arguments Stack• Return address Stack• Address of subroutine Program Counter• Make frame• Save registers• Execute• Destroy frame (stack unreel)• Return address Program Counter• Destroy arguments
Heavy
24/06/2004 Programming Language Design and Implementation
21
Effect of Inlining(2)
• Very Effective for small functions– Methods– Inline prefix in C++
• Similar to macros
24/06/2004 Programming Language Design and Implementation
22
Effect of Inlining(3)
• Further Optimization across Calls– Most optimizations are done within a procedur
e.
• Code Size Increase (Drawback)
• Harder Program Analysis (Drawback)
24/06/2004 Programming Language Design and Implementation
23
Effect of Inlining(4)
• Alias problem in Fortran– Fortran does not assume aliases among argume
nts (exists in reality, though)– If inlined, Compiler must check if there is not a
ny alias among arguments (often fails)– Then, poor code may be generated.
24/06/2004 Programming Language Design and Implementation
24
Note on Inlining
• Note that You must not do manual inlining.Be sure to write a program for inlining.
24/06/2004 Programming Language Design and Implementation
25
Inlining in XSLT
…<xsl:call-template name=“a”/><xsl:with-param name=“prefix” select
=“..”/>…
<xsl:template name=“a”> <xsl:param name=“prefix”/><path><xsl:value-of select=“$prefix”/> / <xsl:value-of select=“.”/></path></xsl:template>
…
<path>
<xsl:value-of select=“..”/>
/
<xsl:value-of select=“.”/>
</path>
…
24/06/2004 Programming Language Design and Implementation
26
Tail Call Elimination
• Observation:
<xsl:template name=“x”><xsl:param name=“n”/><xsl:choose>
<xsl:when test=“…”><xsl:call-template name=“a”/><xsl:with-param name=“$n”/></xsl:when><xsl:otherwise><xsl:call-template name=“x”/><xsl:with-param name=“$n – 1”/></xsl:otherwise>
</xsl:choose></xsl:template>
24/06/2004 Programming Language Design and Implementation
27
Tail Call Elimination(2)
• Return from template a immediate return – Tail Call
• Return from template x immediate return– Tail Recursion– Used as LOOP.
• In this case, caller’s frame can be destroyed at calls of a or x.
• However, Call is done, and a new frame is allocated for a and x.
24/06/2004 Programming Language Design and Implementation
28
Tail Call Elimination(3)
• Ordinary…… destroy frame of xcall of a jump to amake frame of a make frame of aexecute executedestroy frame of a destroy frame of areturn to x return from a destroy frame of x = return from x.return from x.
24/06/2004 Programming Language Design and Implementation
29
Tail Call Elimination(4)
Ret address
Ret address
frame
frame
call
Ret address
Before Elimination After Elimination
jump frame
24/06/2004 Programming Language Design and Implementation
30
Tail Call Elimination(5)
• This Optimization Requires a Jump to a procedure– Low Level Stack Manipulation such as
• Frame Destruction
• Return Address Identification
– Are Required.
24/06/2004 Programming Language Design and Implementation
31
Tail Recursion Elimination
• Tail Recursion ⊆ Tail Call Tail Call to Self.
• Most Programming Languages have goto constructCan do Source-to-source conversion
• Frame creation/destruction =rewrite local variables
24/06/2004 Programming Language Design and Implementation
32
Tail Recursion Elimination(2)
• Observation:
int f(int n) int ff(int n){ {top:
if (n==0) return 0; if (n==0) return 0;else f(n-1); else {n=n-1;
} goto top; } }
Jump
Frame destruction/creation
24/06/2004 Programming Language Design and Implementation
33
Tail Recursion Elimination(3)
Before Optimization After Optimization
Ret address
n=…
Ret address
n=…
call
call
Ret address
n=n-1
24/06/2004 Programming Language Design and Implementation
34
Effect of Tail Call Elimination
• Save Frame Creation/Destruction Cost
• Save Space for Frame Creation– Significant when LOOP is implemented as Tail
Recursion (XSLT, most Functional Languages).
24/06/2004 Programming Language Design and Implementation
35
Tail Call Elimination in XSLT
• No GOTO Construct– × Source-to-source conversion
• Optimization of XSLT Engine– Recognition of Tail Call/Recursion.– Frame Adjustment, and Jump.
24/06/2004 Programming Language Design and Implementation
36
Tail Recursion Elimination in General
int fact(int n)
{
if (n == 0) return 1;
else return n * fact(n-1);
}
Not Tail Recursion
24/06/2004 Programming Language Design and Implementation
37
Tail Recursion Elimination in General(2)
int fact2(int n, int res)
{
if (n==0) return res;
else return fact2(n-1, n * res);
}
Tail Recursion
24/06/2004 Programming Language Design and Implementation
38
Tail Recursion Elimination in General(3)
int fact2(int n, int res)
{
if (n==0) return res;
else return fact2(n-1, n*res);
}
int fact(int n)
{ return fact2(n, 1);}
int fact2(int n, int res){top:
if (n==0) return res;else {
n = n-1;res = n * res;goto top;
}
int fact(int n){ return fact2(n,1);}
24/06/2004 Programming Language Design and Implementation
39
Tail Recursion Elimination in General(4)
• How to Rewrite non tail-recursion to tail-recursion.– Commutative, associative operations– Some Linearity in Call– Introduction of Intermediate Variables.
24/06/2004 Programming Language Design and Implementation
40
Accumulator type
fun f(n) fun f(n)
… top:
call f(n-1); …
some-instructions(n); r *=some-instruction(n)
return; n n-1;
goto top;
24/06/2004 Programming Language Design and Implementation
41
Accumulator type(2)
• If call graph is of the typef f f f … (linear),then, we can use r as Accumulator:
r insn(n); r r*insn(n-1);r r*insn(n-2);
…
24/06/2004 Programming Language Design and Implementation
42
XPATH Expression Optimization
• Loop is a major source for improving performance.
• In XSLT, We have Loop in Recursion and Xpath Expressions.
24/06/2004 Programming Language Design and Implementation
43
XML Tree Structure
• Not the same as Unix file system.
x
b
aa
bb
24/06/2004 Programming Language Design and Implementation
44
Simple Evaluator
• Evaluate(current, a/b/c):S φ;
for-each x (child-node of current) {
if (name(x) == a) {
S S Evaluate (x, b/c);∪}
}
Loop
24/06/2004 Programming Language Design and Implementation
45
Menu of Optimizations
• Partial Evaluation/Symbolic Evaluation– Statically Obtain Result before Evaluation.
• Dataflow Equation Based Optimization– Solve Equation for Optimality in Dataflow
• Redundancy Elimination
24/06/2004 Programming Language Design and Implementation
46
Menu of Optimizations(2)
• Loop Optimization
• Memory Hierarchy Optimization
• Hardware Resource Utilization
• Semantics Based Optimization
24/06/2004 Programming Language Design and Implementation
47
Partial Evaluation/Symbolic Evaluation
• Definition:Specialize Code by Replacing a Part o
f Code by Statically Evaluated Code.
• Static Evaluation and Specialization are Essential.
24/06/2004 Programming Language Design and Implementation
48
Example of Specialization
f(n, t){
if (t == 0)return g(n);
elsereturn h(n);
}
p(n){
return f(n, 0);}
p(n)
{
return f0(n);
}
f0(n)
{
h(n);
}
24/06/2004 Programming Language Design and Implementation
49
Partial Evaluation in General
• Strictly, Partial Evaluation is a Specialization.
• However, together with Symbolic Evaluation, constant propagation and constant folding are also classified as Partial Evaluation.
24/06/2004 Programming Language Design and Implementation
50
Constant Propagation/Folding
a = 1;
if (a+1 == 1)
return 1;
else
return 2;
a=1;
if (1+1==1)
return 1;
else
return 2;
return 2;
24/06/2004 Programming Language Design and Implementation
51
PE/SE in XSLT
• Evaluation of Variables, or Predicates– a/b/[position() > = 0]– <xsl:variable name=“a” select=“1”/>
… $a… “1”
24/06/2004 Programming Language Design and Implementation
52
Redundancy Elimination
• Eliminate Redundant Computation in Dataflow.
A=x+3+y;
B=x+3+y;
24/06/2004 Programming Language Design and Implementation
53
Redundancy Elimination(2)
• Redundancy:
• Dataflow Analysis Required.
– Compute the Same Expression– Compute the Known Value
24/06/2004 Programming Language Design and Implementation
54
Redundancy Elimination in XPATH
• Common SubExpression Elimination– A/B|A/C A/(B|C)
– Interpreted in operational semantics:• for-each x (node current)∈
If (name()==A) {
for-each y (node child of x) {∈if (name() == B or C) {
Sol = Sol {y}∪}
}
24/06/2004 Programming Language Design and Implementation
55
Redundancy Elimination in XPATH(2)
• Loop Invariant Hoisting– A/B[../@category = ‘fiction’]
A[@category = ‘fiction’]/B– Interpreted in operational semantics:
for-each x (node=current)if (name() = A) {
for-each y (node child of x) {∈ if (name() = B) { if (parent(y).@category = ‘fiction’) {
Sol = Sol {y}∪ } }}}
24/06/2004 Programming Language Design and Implementation
56
Redundancy Elimination in XPATH(3)
for-each x (node=current)
if (name() = A) {
if (x.@category = ‘fiction’) {
for-each y (node child of x) {∈ if (name() = B) {
Sol = Sol {y}∪ }
}}}
24/06/2004 Programming Language Design and Implementation
57
Redundancy Elimination in XPATH(4)
• Value Number– DAG representation of expressions:
• a + a * (b – c) + (b – c) * a * c
+
+
*
a -
cb
*
*
24/06/2004 Programming Language Design and Implementation
58
DAG Instructions
1 mknode(id, a)2 mknode(id, a) = 13 mknode(id, b)4 mknode(id, c)5 mknode(-, 3, 4)6 mknode(*, 2=1, 5)7 mknode(+, 1, 6)8 mknode(id, b) = 39 mknode(id, c) = 410 mknode(-, 8=3,9=4)=5
11 mknode(id, a) = 1
12 mknode(*,10=5,11=1)
13 mknode(id, c) = 4
14 mknode(*,12,13=4)
15 mknode(+,7, 14)+
+
*
a -
cb
*
*
24/06/2004 Programming Language Design and Implementation
59
Instructions Register Transfer
Load a, r1
Load b, r3
Load c, r4
iSub r3, r4, r5
Imul r1, r5, r6
Iadd r1, r6, r7
Imul r5, r1, r12
Imul r12,r4, r14
Iadd r7, r14, r15
1 mknode(id, a)2 mknode(id, a) = 13 mknode(id, b)4 mknode(id, c)5 mknode(-, 3, 4)6 mknode(*, 2=1, 5)7 mknode(+, 1, 6)8 mknode(id, b) = 39 mknode(id, c) = 410 mknode(-, 8=3,9=4)=511 mknode(id, a) = 112 mknode(*,10=5,11=1)13 mknode(id, c) = 414 mknode(*,12,13=4)15 mknode(+,7, 14)
24/06/2004 Programming Language Design and Implementation
60
Redundancy Elimination in XPATH
• Dead Code Elimination
24/06/2004 Programming Language Design and Implementation
61
Redundancy Elimination in XPATH(4)
• There can be many other optimizations of Evaluation of Xpath Expressions.
• Node-set calculation includes loops, which are major source for performance improvement.
24/06/2004 Programming Language Design and Implementation
62
Other Optimizations
• Dataflow Equations
• Type Checking
• Semantics Based Optimizations