Top Banner
Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes th Causes of Low Locality and Hints Program Optimizations
20

Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations.

Dec 14, 2015

Download

Documents

Eloise Caryl
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations.

Kristof Beyls,Erik D’Hollander,

Frederik Vandeputte

ICCS 2005 – May 23

RDVIS: A Tool That Visualizes theCauses of Low Locality and Hints

Program Optimizations

Page 2: Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations.

Overview

1. Motivation: cache bottleneck

2. Some theoretical background: reuses

3. View 1: cache-missing reuses

4. View 2: reuse pair clusters corresponding to program optimizations

5. Experimental results

6. Implementation details

7. Conclusion

Page 3: Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations.

1. Motivation

• Many programs incur large cache bottlenecks.

• Mainly caused by poor locality (temporal or spatial)

• Temporal locality is hard to optimize automatically in a compiler

• Therefore: need to help programmer to pin-point sources of low temporal locality.

Page 4: Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations.

2. Theoretical background

• Stream of memory accesses:accesses:a b c a a breferences: r1 r1 r2 r1 r1 r1basic block: bb1 bb1 bb2 bb1 bb1 bb1

• Reuses / Reuse Distance• Reference pair / Reference pair histogram• Basic Block Vector of Intermediately

executed code.

20

2

Cache miss

reuse distance ≥ cache size

Page 5: Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations.

2. Theoretical background

• Stream of memory accesses:accesses:a b c a a breferences: r1 r1 r2 r1 r1 r1basic block: bb1 bb1 bb2 bb1 bb1 bb1

• Reuses / Reuse Distance• Reference pair / Reference pair histogram• Basic Block Vector of Intermediately

executed code.

Reference pair r1-r1

Reuse distance

1

2

10 2

20

2

Page 6: Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations.

3. RDVIS by example:matrix multiplication

Reuses between a[i*N+k] at distance 2^9 Reuses between b[k*N+j] at distance 2^17

How to bring reuses of b[k*N+j] closer together?

What separates reuses?

What code is executed between reuses?

Page 7: Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations.

3. RDVIS by example:matrix multiplication

Reuses occur betweeniterations of i-loop

Solution: bring iterationsof i-loop inwards

Page 8: Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations.

3. RDVIS by example:matrix multiplication

Next to optimize: reuses of A[i*N+k]

Page 9: Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations.

3. Matrix multiplication:final resultL1 cache

L2 cacheMain memory

Exec. Time on P4:

Orig: 0.740s

Opt.: 0.223s

Speedup: 3.3

Page 10: Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations.

4. Cluster Analysis

• In more complex programs, there can be many arrows.

• Many arrows can often by optimized by the same program transformation.

• Key idea: “When the same code is executed between use and reuse, probably the same program transformation is needed.”

Page 11: Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations.

4. Cluster Analysisby example: equake

Many different arrows

contribute to long-distance

reuse

Page 12: Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations.

2(bis). Theoretical background

• Stream of memory accesses:accesses:a b c a a breferences: r1 r1 r2 r1 r1 r1basic block: bb1 bb1 bb2 bb1 bb1 bb1

• Reuses / Reuse Distance• Reference pair / Reference pair histogram• Basic Block Vector of Intermediately

executed code of a reference pair.

BBV(Reference pair r1-r1)

.66.66% exec. betw. reuses

bb2bb1Basic block

Page 13: Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations.

4. Cluster Analysisby example: equake

LOOP FUSION!

Page 14: Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations.

5. Experimental Results

0123456789

101112

athlonXP Alpha Itanium average

Speedup

mcf

art

equake

Page 15: Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations.

6. Some Implementation Details

• Instrumentation added to GCC 4:– Exact source location info added to all abstract syntax

tree nodes.– Source location info is added in language-specific

front-end (currently only C, Fortran is being added).– Instrumentation occurs in language-independent

middle-end.• Inserts function call for each memory reference• Inserts function call at begin of each basic block• Writes out source location info for memory references and

basic blocks

Page 16: Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations.

7. Conclusion

• Visualization indicates reuses at a long distance, and the code that is executed between those reuses.

• Clustering of intermediately executed code leads to reference pairs that are optimizable with the same program transformation.

• Give RDVIS a try: http://www.elis.UGent.be/~kbeyls/rdvis

Page 17: Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations.

QUESTIONS?

Page 18: Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations.

MCF

Page 19: Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations.

AMMP

Page 20: Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations.

AMMP