Top Banner
Compiler-assisted Performance Analysis Adam Nemet Apple [email protected]
90

Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple [email protected]

Dec 26, 2019

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Compiler-assisted Performance Analysis

Adam Nemet Apple

[email protected]

Page 2: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Compiler Optimization

X, Y

2

User

Hotspot

Bottleneck

Page 3: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Some Optimizations?Compiler

Optimization X, Y

2

User

Hotspot

Bottleneck Compiler

Hotspot

LegalityCost Model

Page 4: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Some Optimizations?Compiler

Optimization X, Y

2

User

Hotspot

Bottleneck Compiler

Hotspot

LegalityCost Model

Page 5: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Some Optimizations?Compiler

Optimization X, Y

Disassemble

2

User

Hotspot

Bottleneck Compiler

Hotspot

LegalityCost Model

Page 6: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

-debug-only

Some Optimizations?Compiler

Optimization X, Y

2

User

Hotspot

Bottleneck Compiler

Hotspot

LegalityCost Model

Page 7: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Optimization Diagnostics

Some Optimizations?Compiler

Optimization X, Y

2

User

Hotspot

Bottleneck Compiler

Hotspot

LegalityCost Model

Page 8: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Optimization Diagnostics in LLVM

• Supported in LLVM

• Only a small number of passes emit them

• -Rpass options to enable them in the compiler output

3

foo.c:8:5: remark: accumulate inlined into compute_sum[-Rpass=inline] accumulate(arr[i], sum); ^

Page 9: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Optimization Diagnostics in LLVM

• Supported in LLVM

• Only a small number of passes emit them

• -Rpass options to enable them in the compiler output

• For large programs, the output of -Rpass is noisy and unstructured

3

Page 10: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

4

Page 11: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

4

Remarks for hot and coldcode are intermixed

Messages appearin no particular order

Messages from successful and failedoptimizations are dumped together

How can we make this information

accessible and actionable?

Page 12: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Wish List

• All in one place: Optimizations Dashboard

• At a glance: See high-level interaction between optimizations for targeted low-level debugging

• Filtering: Noise-level should be minimized by focusing on the hot code

• Integration: Display hot code and the optimizations side-by-side

5

Page 13: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

opt-viewer

6

Page 14: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Approach

• Extend existing optimization remark infrastructure

• Add the new optimizations

• Add ability to output remarks to a data file

• Visualize data in HTML

• Targeting compiler developers initially

7

Page 15: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Example

9

Page 16: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Work Flow

$ clang -O3 —fsave-optimization-record -c foo.c

$ utils/opt-viewer/opt-viewer.py foo.opt.yaml html

$ open html/foo.c.html

11

Page 17: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Successful Optimizations

13

Remarks appear inline under the referenced line

Name of the passGreen for successful

optimization

Further details about the optimization

Page 18: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Successful Optimizations

14

Column aligned with the expression

HTML link to facilitate further

analysis

Page 19: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Successful Optimizations

15

Remarks in white are Analysis remarks

Optimizations can expose interesting

analyses

Page 20: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Missed Optimizations

15

Page 21: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Missed Optimizations

16

Red means failed optimization

Page 22: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

22

ORE.emit(OptimizationRemarkAnalysis("inline", "CanBeInlined", Call) << NV("Callee", Callee) << " can be inlined into “ << NV("Caller", Caller) << " with cost=" << NV("Cost", IC.getCost()) << " threshold=“ << NV("Threshold", Threshold));

OptimizationRemarkEmitter

foo.c:8:5: remark: accumulate can be inlined into compute_sum with cost=-5 (threshold=487) [-Rpass-analysis=inline] accumulate(arr[i], sum); ^

LLVM ChangesInliner LoopVectorizer

Pass pipeline

-Rpass-analysis=inline

old

new

IR IR

Page 23: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

22

ORE.emit(OptimizationRemarkAnalysis("inline", "CanBeInlined", Call) << NV("Callee", Callee) << " can be inlined into “ << NV("Caller", Caller) << " with cost=" << NV("Cost", IC.getCost()) << " threshold=“ << NV("Threshold", Threshold));

OptimizationRemarkEmitter

YAML

LLVM ChangesInliner LoopVectorizer

-fsave-optmization-record

Pass pipeline

enables source line debug info

(-gline-tables-only)

old

new

IR IR

Page 24: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

22

ORE.emit(OptimizationRemarkAnalysis("inline", "CanBeInlined", Call) << NV("Callee", Callee) << " can be inlined into “ << NV("Caller", Caller) << " with cost=" << NV("Cost", IC.getCost()) << " threshold=“ << NV("Threshold", Threshold));

OptimizationRemarkEmitter

YAML

LLVM ChangesInliner LoopVectorizer

-fsave-optmization-record

Pass pipeline

enables source line debug info

(-gline-tables-only)

old

new

IR IR

--- !Analysis Pass: inline Name: CanBeInlined DebugLoc: { File: s.cc, Line: 8, Column: 5 } Function: compute_sum Args: - Callee: accumulate DebugLoc: { File: s.cc, Line: 1, Column: 0 } - String: ' can be inlined into ' - Caller: compute_sum DebugLoc: { File: s.cc, Line: 5, Column: 0 } - String: ' with cost=' - Cost: '-5' - String: ' (threshold=' - Threshold: '487' - String: ')' ...

Page 25: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

opt-viewerYAML

utils/opt-viewer/opt-viewer.py

index.htmlfoo.o.html

23

old

new

Page 26: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Index

24

Page 27: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Index

24

Noisy:Most of this code not hot

Sort by hotness

Page 28: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

IR

Use PGO for HotnessInliner LoopVectorizer

OptimizationRemarkEmitter

YAML

LazyBlockFrequencyInfo

--- !Analysis Pass: inline Name: CanBeInlined DebugLoc: { File: s.cc, Line: 8, Column: 5 } Function: compute_sum Hotness: 3 Args: - Callee: accumulate DebugLoc: { File: s.cc, Line: 1, Column: 0 } - String: ' can be inlined into ' - Caller: compute_sum DebugLoc: { File: s.cc, Line: 5, Column: 0 } - String: ' with cost=' - Cost: '-5' - String: ' (threshold=' - Threshold: '487' - String: ')' ...

BlockFrequencyInfo25

old

newPass pipeline

IR

Page 29: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Hotness

Relative tomaximum hotness,NOT total time %

27

Page 30: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Optimizations RecordedFunction Inliner

Loop Vectorizer

Loop Unroller

LoopDataPrefetch

28

LICM

GVN

Loop Idiom

Loop Deletion

SLP Vectorizer

… more to follow

Page 31: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Test Drive on

LLVM test suite

29

Page 32: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Improve & Evaluate

1. Does the information presented in this high-level view contain sufficient detail to reconstruct what happened?

2. Can we discover the interactions between optimizations?

3. With the improved visibility, can we quickly find real performance opportunities?

30

Page 33: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

DhryStone (SingleSource/Benchmark)

Interaction of Optimizations

31

Page 34: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

DhryStone

33

Inlining Context

Page 35: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

DhryStone

36

Page 36: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

DhryStone

38

Page 37: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

DhryStone

40

Page 38: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

DhryStone

42

Page 39: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

DhryStone

45

Page 40: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

DhryStone

46

Page 41: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

DhryStone

48

Page 42: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

DhryStone

50

Page 43: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

DhryStone: Summary• Without low-level debugging, quickly reconstructed what happened

• Even though it involved interaction between multiple optimizations

• Inlining and Alias Analysis/GVN

• Missed optimizations: Extra analysis to manage with false positives

1. Filter trivially false positives

2. Expose enough information for quick detection by user

51

Page 44: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Freebench/distray (MultiSource/Benchmarks)

Finding Performance Opportunity

52

Page 45: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com
Page 46: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com
Page 47: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com
Page 48: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Not modified via LinP, maybe writes through other

pointers

Page 49: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Not modified via LinD, maybe writes through other

pointers

Page 50: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com
Page 51: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Reads and writes don’t alias

Page 52: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Reads and writes don’t aliasLoop versioning

with array overlap checks?

Page 53: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

55

LICM-based LoopVersioning (-enable-loop-versioning-licm)

Page 54: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

55

LICM-based LoopVersioning (-enable-loop-versioning-licm)

Performance opportunity if we can improve this pass

Page 55: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

55

LICM-based LoopVersioning (-enable-loop-versioning-licm)

Performance opportunity if we can improve this passApproximate the opportunity by

manually modifying the source

Page 56: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com
Page 57: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com
Page 58: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Dynamic Instruction Count Reduced by

11%

Page 59: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Dynamic Instruction Count Reduced by

11%Performance headroom

11%

Page 60: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Freebench/distray: Summary

• Found optimization opportunity while staying in the high-level view

• Reconstructed the reason for missed optimization

• High-level view exposed that the gain may be substantial

• Got immediate feedback of the desired effect on the prototype

• Identified the pass for low-level debugging

58

Page 61: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Check Out More Examples

http://lab.llvm.org:8080/artifacts/opt-view_test-suite

59

Page 62: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Development Timeline

60

Code Author Tool

Compiler Developer Tool

Initial version on LLVM trunk

NowNew tools

using Optimization

Records

Page 63: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Compiler Developer Tool: Status

• Written in Python

• Hook up new passes

• Improve diagnostics quality for existing passes

• Perform extra analysis for insightful messages

• Improve UI

61

Page 64: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Compiler Developer Tool: Status

• Written in Python

• Hook up new passes

• Improve diagnostics quality for existing passes

• Perform extra analysis for insightful messages

• Improve UI

61

Request for H

elp

Page 65: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Code Author Tool: Wishlist• Suggest specific actions

• E.g. for the LICM case: if the two pointers can never point to the same object consider using ‘restrict’

• Add new “recommendation” analysis passes to detect opportunity and suggest:

• Source annotation to enable off-by-default passes (aggressive loop transformations, non-temporal stores)

• Refactoring: data transformations62

Page 66: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Code Author Tool: Wishlist• Suggest specific actions

• E.g. for the LICM case: if the two pointers can never point to the same object consider using ‘restrict’

• Add new “recommendation” analysis passes to detect opportunity and suggest:

• Source annotation to enable off-by-default passes (aggressive loop transformations, non-temporal stores)

• Refactoring: data transformations62

Request for H

elp

Page 67: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Optimization Records: New Tools• llvm-opt-report

• Performance regression analysis

• Optimization statistics with the ability to zoom into the particular optimization

• Bottom-up search for performance opportunities

• See all the LICM opportunities like in Freebench/distray

63

Page 68: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Optimization Records: New Tools• llvm-opt-report

• Performance regression analysis

• Optimization statistics with the ability to zoom into the particular optimization

• Bottom-up search for performance opportunities

• See all the LICM opportunities like in Freebench/distray

63

SELECT benchmark, hotspot, hotnessFROM optimizationsWHERE pass = ‘licm’ AND type = ‘missed’ AND name = ‘LoadWithLoopInvariantAddressInvalidated’ORDER BY hotness

Page 69: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Optimization Records: New Tools• llvm-opt-report

• Performance regression analysis

• Optimization statistics with the ability to zoom into the particular optimization

• Bottom-up search for performance opportunities

• See all the LICM opportunities like in Freebench/distray

• Allows finding opportunities that occur with high frequency but not in the hottest code

63

Page 70: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Acknowledgement

• Tyler Nowicki

• John McCall

• Hal Finkel

64

Page 71: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Q&A

65

Page 72: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

SIBsim4 (MultiSource/Applications)

Finding Performance Opportunity

66

Page 73: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

SIBsim4

67

Page 74: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

SIBsim4

68

Page 75: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

SIBsim4

69

Page 76: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

SIBsim4

70

Page 77: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

SIBsim4

71

Page 78: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

SIBsim4

72

Page 79: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

SIBsim4

Look at the loads

73

Page 80: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

SIBsim4

Look at the loads

74

Page 81: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

SIBsim4

75

Page 82: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

SIBsim4

Look atthe stores

76

Page 83: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

SIBsim4

Look atthe stores

77

Page 84: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

SIBsim4

Can ‘m’ and ’n’ really alias?

78

Page 85: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

SIBsim4Probably not!

exon_p_t m = mCol->e.exon[i];

79

Page 86: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

SIBsim4We need to use ‘restrict’

or hoist manually

exon_p_t m = mCol->e.exon[i];

80

Page 87: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

SIBsim4

81

Page 88: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

SIBsim4

82

Page 89: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

SIBsim4

83

Page 90: Compiler-assisted Performance Analysis - LLVM · Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

SIBsim4

84