Data Flow Analysis for Software Prefetching Linked Data Structures in Java Brendon Cahoon Dept. of Computer Science University of Massachusetts Amherst, MA Kathryn S. McKinley Dept. of Computer Sciences University of Texas at Austin Austin, TX
Jan 14, 2016
Data Flow Analysis for Software Prefetching Linked Data
Structures in Java
Brendon Cahoon
Dept. of Computer Science
University of Massachusetts
Amherst, MA
Kathryn S. McKinley
Dept. of Computer Sciences
University of Texas at Austin
Austin, TX
Motivation
• Object-oriented languages are mainstream
• Key performance issues– Same old: processor-memory gap, parallelism
Combination of modern processors and languages results in poor memory performance
RSIM PerformanceCompiled Java (Vortex) – no GC
0
20
40
60
80
100
Health M
st
Perim
eter
Treea
dd BH
Bisort
Tsp
Voronoi
Em3d
Power
% E
xe
cu
tio
n T
ime
Busy Data Memory Stalls
Prefetching Arrays vs. Objects
• Most prior work concentrates on arrays– Compilers directly prefetch any element– Loop transformations enable effective scheduling– Successful results using both hardware and software
• Cannot use same techniques on linked data structures– Objects are small and disjoint– Access patterns are less regular and predictable– Only know the address of directly connected objects
Software Data Prefetching for Java
Hide memory latency from linked structure traversals
• Introduced by Luk and Mowry for C programs:– We add data flow and interprocedural analysis
– Identify pointer structures from declaration
– Find pointer chasing in loops and self recursive calls
• Challenges introduced by Java– Dynamically allocated objects make analysis difficult
– Small methods obscure context
Outline
• Data flow analysis for identifying linked structures– New intra and interprocedural analysis
• Greedy prefetching
• Jump-pointer prefetching
• Experimental results
Identifying Linked Structure Traversals
We define a data flow solution:– Intraprocedural for loops
– Interprocedural for recursion
Benefits:– Independent of program representation
– Many compilers use data flow frameworks
– May be composed with other analyses
Loopwhile (o != null) { t = o; … o = t.next;}
Recursionmethod visit() { …. if (this.next != null) visit(this.next);}
Data Flow Analysis
• Data flow information– Sets of tuples: <variable, field name, statement, status>
• Status values: not recurrent, possibly, recurrentNot recurrent : initial valuePossibly : first use of a field referenceRecurrent : an object accessed in linked structure traversal
• Intraproceedural: forward, flow-sensitive, may analysis• Interprocedural: bidirectional, context-sensitive
Analysis Examples
while (o != null) {s1: t = o.next;s2: o = t;
}
while (o != null) {s1: o = o.next;s2: o = bar();
}
s1: o = o.next;s2: o = o.next;
1st Iteration s1: o is not recurrent, set t to possibly s2: t is possibly, set o to possibly
s1: set o to possiblys2: set o to possibly
1st Iteration s1: set o to possibly s2: set o to not recurrent
2nd Iteration s1: o is possibly, set t to recurrent s2: t is recurrent, set o to recurrent
Analysis Extensions for Common Idioms
• Track objects in fields or arrays– Class based field assignments
– Arrays are monolithic
• Indirect recurrent objects– Unique objects referenced by linked structures
while (e.f != null) { o = e.f; e.f = o.next; o.compute();}
while (e.hasMoreElements()) { o = (ObjType)e.nextElement(); o.compute();}
Greedy Prefetching
• Prefetch directly connected objects
• Algorithm consists of two steps:– Detect accesses to linked structures
– Schedule prefetches• When object is not null
• Completely hiding latency is difficult
Greedy Prefetching Example
Doubly linked list
int sum (Dlist l) { int s = 0; while (l != null) { s =+ l.data; l = l.next; } return s;}
Greedy Prefetching Example
Doubly linked list
Greedy prefetching
int sum (Dlist l) { int s = 0; while (l != null) { prefetch(l.next); s += l.data; l = l.next; } return s;}
Jump-Pointer Prefetching
• Prefetch indirectly connected objects– Tolerates more latency than greedy prefetching
• Algorithm contains three steps: – Find linked data structure traversal and creation sites
– Create jump-pointers• When creating or traversing the linked structure
– Schedule prefetches• Prefetch special jump-pointer field
Inserting Jump-Pointers at Creation Time
Void add(ObjType o) { ListNode n = new ListNode(o); jumpObj = jumpQueue[i]; jumpObj.jmp = n; jumpQueue[i++%size] = n; if (head == null) {
head = n; } else { tail.next = n; } tail = n;}
jumpObj n
1 2 3 4 5
Jump-Pointer Prefetching Example
int sum (Dlist l) { int s = 0; while (l != null) { prefetch(l.jmp); s += l.data; l = l.next; } return s;}
Doubly linked list
Jump-pointer prefetching
Experimental Results
• Object-oriented Olden benchmarks in Java
• Simulation using RSIM– Out-of-order, superscalar processor
• Compile programs using Vortex– Translate Java programs to Sparc assembly– Contains object-oriented, traditional optimizations– Linked structure analysis, greedy and jump-pointer
prefetching
Prefetching Performance
0
20
40
60
80
100
No
rmal
ized
exe
cuti
on
tim
e (%
)
N G J N G J N G J N G J N G J N G J N G J N G J N G J N G J
Busy Data Memory Stalls
health mst perimtr treeadd bh bisort tsp voronoi em3d power
Prefetch Effectiveness
0
20
40
60
80
100
Per
cen
tag
e o
f P
refe
tch
es
G J G J G J G J G J G J G J G J G J G J
Useful Late Early Unnec.
health mst perimtr treeadd bh bisort tsp voronoi em3d power
Static Prefetch Statistics
Program Interprocedural Intra-
Procedural
Fields
Mono Poly
Health 8 1 5
Mst 3
Perimeter 9 8
Treeadd 2
BH 16 8 10
Bisort 4 4
Tsp 6 14
Voronoi 14 1
Em3d 20
Power 4
Contributions and Future Work
• New interprocedural data flow analysis for Java• Evaluation of prefetching on Java programs
Prefetching hides latency, butRoom for improvement
Other uses for analysis (work in progress)– Garbage collection: prefetching, object traversal
– Prefetching arrays of objects