Top Banner
PAGE 1 Open Source Open Possibilities Open Source Open Possibilities Porting LLVM to a Next Generation DSP Presented by: L . Taylor Simpson LLVM Developers’ Meeting: 11/18/2011
18

Porting LLVM to a Next Generation DSPllvm.org/devmtg/2011-11/Simpson_PortingLLVMToADSP.pdfPorting LLVM to a Next Generation DSP Presented by: L . Taylor Simpson LLVM Developers’

Mar 25, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Porting LLVM to a Next Generation DSPllvm.org/devmtg/2011-11/Simpson_PortingLLVMToADSP.pdfPorting LLVM to a Next Generation DSP Presented by: L . Taylor Simpson LLVM Developers’

PAGE 1 Open Source Open Possibilities

Open Source Open Possibilities

Porting LLVM to a Next Generation DSP Presented by: L . Taylor Simpson LLVM Developers’ Meeting: 11/18/2011

Page 2: Porting LLVM to a Next Generation DSPllvm.org/devmtg/2011-11/Simpson_PortingLLVMToADSP.pdfPorting LLVM to a Next Generation DSP Presented by: L . Taylor Simpson LLVM Developers’

PAGE 2 Open Source Open Possibilities

Agenda

Hexagon DSP

Initial porting

Performance improvement

Future plans

Page 3: Porting LLVM to a Next Generation DSPllvm.org/devmtg/2011-11/Simpson_PortingLLVMToADSP.pdfPorting LLVM to a Next Generation DSP Presented by: L . Taylor Simpson LLVM Developers’

PAGE 3

Open Source Open Possibilities

Hexagon DSP

Page 4: Porting LLVM to a Next Generation DSPllvm.org/devmtg/2011-11/Simpson_PortingLLVMToADSP.pdfPorting LLVM to a Next Generation DSP Presented by: L . Taylor Simpson LLVM Developers’

PAGE 4 Open Source Open Possibilities

Hexagon – Typical DSP Features

Wide computation engine

8-MAC design, dual 64-bit loads or stores

Performance meets or exceeds highest-performance industry DSPs

Native numerical support

Fractionals, complex

Saturation, scaling, rounding

Exploits parallelism at 3 levels

Unique multi-threaded architecture

VLIW (up to 4 instructions in parallel)

SIMD

Page 5: Porting LLVM to a Next Generation DSPllvm.org/devmtg/2011-11/Simpson_PortingLLVMToADSP.pdfPorting LLVM to a Next Generation DSP Presented by: L . Taylor Simpson LLVM Developers’

PAGE 5 Open Source Open Possibilities

Hexagon – Typical CPU features

Not your grandfather’s DSP!

Capable of supporting RTOS or high-level OS

Can run all of SPEC on target

Supports C/C++ modern programming environment

High-quality compilers and tools

Reduces development cost of extensive assembly programming

Cache-based, hardware-managed memory

Simplifies programming model and reduces power

Advanced system architecture

Precise exceptions

MMU with address translation and protection

HW support for virtual machines

Excellent control code performance

Can offload work from main CPU

Page 6: Porting LLVM to a Next Generation DSPllvm.org/devmtg/2011-11/Simpson_PortingLLVMToADSP.pdfPorting LLVM to a Next Generation DSP Presented by: L . Taylor Simpson LLVM Developers’

PAGE 6 Open Source Open Possibilities

Hexagon Instruction Example

Single packet from inner loop of FFT

Performs 29 “RISC ops” in 1 cycle

All threads can all be doing this (or something else) in parallel

{ R17:16 = MEMD(R0++M1)

MEMD(R6++M1) = R25:24

R20 = CMPY(R20, R8):<<1:rnd:sat

R11:10 = VADDH(R11:10, R13:12)

}:endloop0

+ + + +

Complex Multiply

Vector 4x16-bit Add

64-bit Load and

64-bit Store with post-update addressing

HW-loop end •Dec count

•Compare

•Jump top

Rs

Add

I R

Rt

*

32

<<0-1

*

32

<<0-1

Rd

I R

Add

I R

*

32

<<0-1

*

32

<<0-1

I R

Rs

Rt

-0x80000x8000

Sat_32 Sat_32

High 16bitsHigh 16bits

I R

Page 7: Porting LLVM to a Next Generation DSPllvm.org/devmtg/2011-11/Simpson_PortingLLVMToADSP.pdfPorting LLVM to a Next Generation DSP Presented by: L . Taylor Simpson LLVM Developers’

PAGE 7

Open Source Open Possibilities

Initial Porting

Page 8: Porting LLVM to a Next Generation DSPllvm.org/devmtg/2011-11/Simpson_PortingLLVMToADSP.pdfPorting LLVM to a Next Generation DSP Presented by: L . Taylor Simpson LLVM Developers’

PAGE 8 Open Source Open Possibilities

LLVM for Hexagon – Initial Porting Effort

It took 2 engineers 23 days to get Hexagon back end working

Passing DSP benchmark suite

It took 107 calendar days to get to 87% performance of GCC

Leveraged existing assembler, linker, test suite

Points of efficacy for LLVM

Robust and easy to port

Very well designed and documented

Carefully engineered for compiler construction

Excellent infrastructure for writing mid-level compiler optimizations

Page 9: Porting LLVM to a Next Generation DSPllvm.org/devmtg/2011-11/Simpson_PortingLLVMToADSP.pdfPorting LLVM to a Next Generation DSP Presented by: L . Taylor Simpson LLVM Developers’

PAGE 9 Open Source Open Possibilities

0

10

20

30

40

50

60

70

80

90 Q

Ma

rk S

co

re

CFG

optimizations

Add addasl

Improve

predicate spills

Scheduler improvements

LLVM Project Starts

First port to Hexagon complete

Dependence

pruning

Base+offset, super-

regs improvements

Min-Max recognition

Sign-extension

optimizations

Hexagon front-end Align returns

Packetization

Timeline: LLVM-Hexagon Improvements Normalized; gcc at -O3 = 100.00

Higher numbers indicate better performance

Dot-new jumps

LTO

Enable and Tune

Post-increment

Improve

Jump

Scheduling

Eliminate sign-extensions

LTO on

libraries

.new transfers

Remat.

zero extends

Packetizer

lookahead

Days Since Project

Page 10: Porting LLVM to a Next Generation DSPllvm.org/devmtg/2011-11/Simpson_PortingLLVMToADSP.pdfPorting LLVM to a Next Generation DSP Presented by: L . Taylor Simpson LLVM Developers’

PAGE 10 Open Source Open Possibilities

Transition Time

Simultaneously to LLVM work, GCC moved forward

New version of GCC for Hexagon released

Version 4 of Hexagon core released with significant support in GCC

LLVM only 72% performance of GCC

Quickly improved pass rate to 98%

Leverage existing compiler test suite

Initial pass rate for –O0: 49%

Initial pass rate for –O3: 63%

Most of the remaining issues are corner cases in C++ front end

Current status

LLVM achieves 89% performance of GCC for Hexagon

Page 11: Porting LLVM to a Next Generation DSPllvm.org/devmtg/2011-11/Simpson_PortingLLVMToADSP.pdfPorting LLVM to a Next Generation DSP Presented by: L . Taylor Simpson LLVM Developers’

PAGE 11

Open Source Open Possibilities

Performance Improvement

Page 12: Porting LLVM to a Next Generation DSPllvm.org/devmtg/2011-11/Simpson_PortingLLVMToADSP.pdfPorting LLVM to a Next Generation DSP Presented by: L . Taylor Simpson LLVM Developers’

PAGE 12 Open Source Open Possibilities

Performance Improvement – Instruction Scheduling

Optimal performance for VLIW requires precise scheduling

Hexagon packetizer

Originally a post-pass to form packets from scheduled code

Alias information in scheduler

Use machine resource constraints during scheduling

Page 13: Porting LLVM to a Next Generation DSPllvm.org/devmtg/2011-11/Simpson_PortingLLVMToADSP.pdfPorting LLVM to a Next Generation DSP Presented by: L . Taylor Simpson LLVM Developers’

PAGE 13 Open Source Open Possibilities

Performance Improvement – Loop Unroller

Enable loops with runtime trip counts

We have seen both large improvements and losses

We will likely need some target-specific information

Patch currently under review

Page 14: Porting LLVM to a Next Generation DSPllvm.org/devmtg/2011-11/Simpson_PortingLLVMToADSP.pdfPorting LLVM to a Next Generation DSP Presented by: L . Taylor Simpson LLVM Developers’

PAGE 14 Open Source Open Possibilities

Performance Improvement - Miscellaneous

Hardware loop support

Post-increment

Loop strength reduction

Addressing modes: base+offset, post-increment, base+index

New version of core released

Numerous new instruction combinations

More relaxed packet forming rules

Enhanced predication support

Page 15: Porting LLVM to a Next Generation DSPllvm.org/devmtg/2011-11/Simpson_PortingLLVMToADSP.pdfPorting LLVM to a Next Generation DSP Presented by: L . Taylor Simpson LLVM Developers’

PAGE 15 Open Source Open Possibilities

What is a hardware loop

Execute loops with zero overhead

Hexagon has two special instructions

Hexagon sets up two registers

Loop start address, SA0/SA1

Loop count, LC0/LC1

for (i =0; i < n; i++) { a += b[i]; }

.L1: { r3 = memw(r1++#4) r0 = add(r0, #-1) } { p0 = cmp.eq(r0, #0) r2 = add(r3, r2) if (!p0.new) jump:t .L1 }

loop0(.L1, r0) .L1: { r3 = memw(r1++#4) } { r2 = add(r3, r2) }:endloop0

Here’s a loop The generated code With hardware loop

Page 16: Porting LLVM to a Next Generation DSPllvm.org/devmtg/2011-11/Simpson_PortingLLVMToADSP.pdfPorting LLVM to a Next Generation DSP Presented by: L . Taylor Simpson LLVM Developers’

PAGE 16

Open Source Open Possibilities

Next Steps

Page 17: Porting LLVM to a Next Generation DSPllvm.org/devmtg/2011-11/Simpson_PortingLLVMToADSP.pdfPorting LLVM to a Next Generation DSP Presented by: L . Taylor Simpson LLVM Developers’

PAGE 17 Open Source Open Possibilities

Next Steps

Upstreaming our changes

Code size reduction

Represent VLIW packets in back end

Multi-basic-block scheduling

Enable loop unrolling for loops with multiple exits

Improve alias analysis

Very important for VLIW scheduling

Have seen issues with type-based disambiguation

Expose machine-dependent information to optimizer

Which addressing modes are supported?

Which loop unrolling factor is best for target?

Software pipelining

Page 18: Porting LLVM to a Next Generation DSPllvm.org/devmtg/2011-11/Simpson_PortingLLVMToADSP.pdfPorting LLVM to a Next Generation DSP Presented by: L . Taylor Simpson LLVM Developers’

PAGE 18

Open Source Open Possibilities

Questions?