Top Banner
1 New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow Director of Microprocessor Research Labs Intel Corporation [email protected] Contributors: Jesse Fang Tin-Fook Ngai
48

New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

May 30, 2018

Download

Documents

vocong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

1

New Challenges in Microarchitecture and Compiler

Design

Fred PollackIntel Fellow

Director of Microprocessor Research LabsIntel Corporation

[email protected]:

Jesse FangTin-Fook Ngai

Page 2: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

2

’70 ’73 ’76 ’79 ’82 ’85 ’88 ’91 ’94 '97 2000’70 ’73 ’76 ’79 ’82 ’85 ’88 ’91 ’94 '97 2000

TransistorsTransistorsPer DiePer Die

101088

101077

101066

101055

101044

101033

101022

101011

101000

1K1K4K4K 16K16K

64K64K256K256K

1M1M

16M16M4M4M

64M64M

4004400480808080

808680868028680286 i386™i386™

i486™i486™PentiumPentium®®

MemoryMemoryMicroprocessorMicroprocessor

Source: Intel Source: Intel

PentiumPentium® ® IIII

Moore’s Law

PentiumPentium® ® IIIIII

256M256M

PentiumPentium®® ProPro

Page 3: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

3

Moore’s Law

• The number of transistors on a chip will double every generation (18-24 months)– 2,300 transistors on the 4004 in 1971 – About 120 million transistors on the Pentium® III Xeon

processor in 2000 – An increase of 50000x in 29 years

Page 4: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

4

Performance Doubles Every 18 Months

100X Increase in last 10 years– Intel 486 processor at 33Mhz in 1990– To the Pentium® 4 processor at 1.5 GHz

Sources of Performance (approximate)– 20X from Process and Circuit Technology– 4X from Architecture– 1.4X from Compiler Technology

Page 5: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

5

Architecture Performance

Microarchitecture– Deeper Pipelining to increase frequency– Execution of more instructions in parallel– On die Caches

System Architecture– Buses, e.g. 33Mhz 486 bus to 400Mhz Pentium® 4 bus

• 132Mbytes/sec vs. 3.2Gbytes/sec• On die caches grew from 4Kbytes to as much as 2Mbytes on

the Pentium III Xeon™ processor– Memory Bandwidth

• 66 Mbytes/sec in a 486 system• 3.2 Gbytes/sec in a Pentium 4 system

– IO Bandwidth• 3 Mbytes/sec on an ISA bus• 1 Gbyte/sec on AGP4X

Page 6: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

6

In the Last 25 Years Life was Easy

Doubling of transistor density every 30 monthsIncreasing die sizes, allowed by

– Increasing Wafer Size– Process technology moving from “black art” to

“manufacturing science”

⇒ Doubling of transistors every 18 months

And, only constrained by cost & mfg limits

But how efficiently did we use the transistors?

Page 7: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

7

Performance Efficiency of µarchitectures

Tech Old µArch mm (linear) New µArch mm (linear) Area Ratio 1.0µ i386C 6.5 i486 11.5 3.1

0.7µ i486C 9.5 Pentium® proc 17 3.2 0.5µ Pentium® proc 12.2 Pentium Pro proc 17.3 2.1

0.18µ Pentium III proc 10.3 Pentium 4 proc ? 2+

Implications: (in the same technology)1. New µArch ~ 2-3X die area of the last µArch2. Provides 1.4-1.7X integer performance of thelast µArch

We are on the Wrong Side of a Square Law

Page 8: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

1

10

100

1000

1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ 0.1µ 0.07µ

Wat

ts/c

m2

i386i486

Pentium ® processor

Pentium Pro ® processor

Pentium II ® processor

Pentium III ® processor

Power density continues to get worse

Surpassed hot-plate power density in 0.5µ

Not too long to reach nuclear reactor

Hot plate

Nuclear Reactor RocketNozzle

Sun’sSurface

If we continue on the current trend, which we can’t

Page 9: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

9

Implications

• We can’t build microprocessors with ever increasing die sizes

• The constraint is power – not manufacturability• Must use transistors efficiently and target for

valued performance

Page 10: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

10

Microarchitecture DirectionsLogic Transistor growth constrained by power – not mfg

– At constant power, 50% per process generation vs. over 200% in past

Current Directions in microarchitecture that help– SIMD ISA extensions– On-die L2 caches– Multiple CPU cores on die– Multithreaded CPU

Key Challenges for future Microarchitectures– Special purpose performance– Increase execution efficiency: improved prediction and

confidence– Break the data-flow barrier, but in a power efficient manner

Page 11: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

11

Changing Landscape of Computing

Server applications– Higher throughput demands– Multithread apps on multiprocessor

Internet applications– Independent user tasks/threads – Across internet on different platforms– Security

Peer-to-peer applicationsPervasive ComputingMove from Machine-based to human-based

interfaces

Page 12: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

12

New Computing Environment

New languages & language support– Java, C#, XML– Runtime environment: Virtual Machine– Portability for data and code

New challenges in the environment– Driven by run-time program/data – Multithreaded program execution

Page 13: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

13

SW Application Trends

Beyond performance– Maintainability, Reliability, Availability, Scalability – Ease of Use

Shorter time-to-marketNew software development techniques

– Object oriented– Software reuse (component based)

Performance without excessive tuning/profiling

Page 14: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

14

Opportunity

Break the dataflow barrier by increasing the cooperation between the compiler and microarchitecture in the execution of the new computing models

Page 15: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

15

New Challenges

Beyond ILP: thread level parallelism– Thread level parallelism– Speculative multithreaded microarchitecture

Dynamic compilation and optimization– Dynamic compilation of ILs for new languages

Page 16: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

16

New Challenges

Beyond ILP: thread level parallelism– Thread level parallelism– Speculative multithreaded microarchitecture

Dynamic compilation and optimization– Dynamic compilation of ILs for new languages

Page 17: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

17

Thread Level Parallelism

• High throughput for multithreaded applications– Not only for servers but also clients

• How to boost single thread application performance on multithreaded microarchitecture– Speculation: from instruction level to block level– Prediction: from branch (control flow) to value (data

flow)– Locality: from instruction/data to computation– New optimization techniques to best exploit the above

Page 18: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

18

0.0

1.0

2.0

3.0

4.0

5.0

1 2 3 4 5 6 7 8 9 10 11 12

Issue Width

IPC

Benchmark GCC: Issue Width vs IPC

Static ILP is hitting its limitIn-order scheduling microarchitecture with perfect memory

From Intel Microprocessor Research Labs

Page 19: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

19

after call after looploop bodysequential/run-ahead

Speculative Multithreading

An application program is decomposed into multiple threads

• Call and after-call threads• Loop iteration threads• Main and run-ahead threads

Page 20: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

20

Speculative Multithreading

• With microarchitectural support, the threads are speculatively executed in parallel– If correct, increased parallelism– Otherwise (hopefully only occasionally), squash

the speculative threads and re-execute• Re-execution is faster due to data prefetching

and early branch resolution by the speculative execution

Page 21: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

21

Thread-Level Parallelism with Value Prediction

• Some real data dependences between threads can be removed by correctly predicting the data value – Loop induction variables – Procedure return values – Variables with almost constant runtime values

• Common value predictors– Last value predictors– Stride predictors– Finite context methods (FCM)

Page 22: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

22

Value Prediction Accuracy

0

10

20

30

40

50

60

70

80

90

SPECint SPECfp SPEC95

Not predictedIncorrectCorrect

From [Rychlik/Faistl/Krug/Shen98]

percentageUsing aStride+FCMpredictor

Page 23: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

23

An ExampleA difficult-to-parallelize loop in decompress() from compress:while (1) {

code = getcode(); // get next code if ( code == -1 ) goto exit; // exit if no more code if (…) { // if special marker, reset and get next code

free_ent = … ; code = getcode(); if ( code == -1 ) goto exit; } // exit if no more code

incode = code;if ( code >= free_ent ) { // for a special case, process the last code

*stackp++ = … ; code = oldcode; }

while ( code >= 256 ) { // lookup code sequence and push onto stack *stackp++ = … ; …; }

do ... while ( stackp > base ); // pop and output code sequence from stack if (…) free_ent++ ; // if a new code, generate a new table entry oldcode = incode; } // update the last code

Page 24: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

24

code = getcode();if ( code == -1 ) goto exit;if (…) {

free_ent = …;code = getcode();if ( code == -1 ) goto exit; }

incode = code;if ( code >= free_ent ) {

*stackp++ = …;code = oldcode; }

while ( code >= 256 ) {*stackp++ = …; … ; }

do … while ( stackp > base );if (…) free_ent++;oldcode = incode;

Are these real dependences?

code = getcode();if ( code == -1 ) goto exit;if (…) {

free_ent = …;code = getcode();if ( code == -1 ) goto exit; }

incode = code;if ( code >= free_ent ) {

*stackp++ = …;code = oldcode; }

while ( code >= 256 ) {*stackp++ = …; … ; }

do … while ( stackp > base );if (…) free_ent++;oldcode = incode;

Current iterationSpeculative next iteration

Page 25: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

25

code = getcode();if ( code == -1 ) goto exit;if (…) {

free_ent = …;code = getcode();if ( code == -1 ) goto exit; }

incode = code;if ( code >= free_ent ) {

*stackp++ = …;code = oldcode; }

while ( code >= 256 ) {*stackp++ = …; … ; }

do … while ( stackp > base );if (…) free_ent++;oldcode = incode;

code = getcode();if ( code == -1 ) goto exit;if (…) {

free_ent = …;code = getcode();if ( code == -1 ) goto exit; }

incode = code;if ( code >= free_ent ) {

*stackp++ = …;code = oldcode; }

while ( code >= 256 ) {*stackp++ = …; … ; }

do … while ( stackp > base );if (…) free_ent++;oldcode = incode;

No real output dependences

Current iterationSpeculative next iterationMemory output dependences removed by

local speculative stores

Page 26: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

26

code = getcode();if ( code == -1 ) goto exit;if (…) {

free_ent = …;code = getcode();if ( code == -1 ) goto exit; }

incode = code;if ( code >= free_ent ) {

*stackp++ = …;code = oldcode; }

while ( code >= 256 ) {*stackp++ = …; … ; }

do … while ( stackp > base );if (…) free_ent++;oldcode = incode;

code = getcode();if ( code == -1 ) goto exit;if (…) {

free_ent = …;code = getcode();if ( code == -1 ) goto exit; }

incode = code;if ( code >= free_ent ) {

*stackp++ = …;code = oldcode; }

while ( code >= 256 ) {*stackp++ = …; … ; }

do … while ( stackp > base );if (…) free_ent++;oldcode = incode;

Current iterationSpeculative next iteration

Page 27: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

27

code = getcode();if ( code == -1 ) goto exit;if (…) {

free_ent = …;code = getcode();if ( code == -1 ) goto exit; }

incode = code;if ( code >= free_ent ) {

*stackp++ = …;code = oldcode; }

while ( code >= 256 ) {*stackp++ = …; … ; }

do … while ( stackp > base );if (…) free_ent++;oldcode = incode;

Effective value prediction

Current iterationSpeculative next iterationcode = getcode();

if ( code == -1 ) goto exit;if (…) {

free_ent = …;code = getcode();if ( code == -1 ) goto exit; }

incode = code;if ( code >= free_ent ) {

*stackp++ = …;code = oldcode; }

while ( code >= 256 ) {*stackp++ = …; … ; }

do … while ( stackp > base );if (…) free_ent++;oldcode = incode;

Each new iteration begins with • free_ent increment by one (mostly)• the same stackp value (always)

Often true

Often false

Page 28: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

28

code = getcode();if ( code == -1 ) goto exit;if (…) {

free_ent = …;code = getcode();if ( code == -1 ) goto exit; }

incode = code;if ( code >= free_ent ) {

*stackp++ = …;code = oldcode; }

while ( code >= 256 ) {*stackp++ = …; … ; }

do … while ( stackp > base );if (…) free_ent++;oldcode = incode;

code = getcode();if ( code == -1 ) goto exit;if (…) {

free_ent = …;code = getcode();if ( code == -1 ) goto exit; }

incode = code;if ( code >= free_ent ) {

*stackp++ = …;code = oldcode; }

while ( code >= 256 ) {*stackp++ = …; … ; }

do … while ( stackp > base );if (…) free_ent++;oldcode = incode;

Current iterationSpeculative next iteration

Page 29: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

29

False dependences during runtime

Current iterationSpeculative next iterationcode = getcode();

if ( code == -1 ) goto exit;if (…) {

free_ent = …;code = getcode();if ( code == -1 ) goto exit; }

incode = code;if ( code >= free_ent ) {

*stackp++ = …;code = oldcode; }

while ( code >= 256 ) {*stackp++ = …; … ; }

do … while ( stackp > base );if (…) free_ent++;oldcode = incode;

code = getcode();if ( code == -1 ) goto exit;if (…) {

free_ent = …;code = getcode();if ( code == -1 ) goto exit; }

incode = code;if ( code >= free_ent ) {

*stackp++ = …;code = oldcode; }

while ( code >= 256 ) {*stackp++ = …; … ; }

do … while ( stackp > base );if (…) free_ent++;oldcode = incode;

Often false

Often false

Page 30: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

30

code = getcode();if ( code == -1 ) goto exit;if (…) {

free_ent = …;code = getcode();if ( code == -1 ) goto exit; }

incode = code;if ( code >= free_ent ) {

*stackp++ = …;code = oldcode; }

while ( code >= 256 ) {*stackp++ = …; … ; }

do … while ( stackp > base );if (…) free_ent++;oldcode = incode;

Runtime parallel MT execution

code = getcode();if ( code == -1 ) goto exit;if (…) {

free_ent = …;code = getcode();if ( code == -1 ) goto exit; }

incode = code;if ( code >= free_ent ) {

*stackp++ = …;code = oldcode; }

while ( code >= 256 ) {*stackp++ = …; … ; }

do … while ( stackp > base );if (…) free_ent++;oldcode = incode;

Current iterationSpeculative next iteration

Page 31: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

31

New Challenges

Beyond ILP: thread level parallelism– Thread level parallelism– Speculative multithreaded microarchitecture

Dynamic compilation and optimization– Dynamic compilation of ILs for new languages

Page 32: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

32

Multithreaded Microarchitectures

• Dedicated local context per running thread• Efficient resource sharing

– Time sharing – Space sharing

• Fast thread synchronization/communication – Explicit instructions– Implicit via shared registers/cache/buffer

Page 33: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

33

Speculation Support

• Checkpointing • Runtime dependence checking

– Register data dependence – Memory store-load dependence

• Recovery if misspeculated – Squash speculative threads and re-execute

• Committing speculative results

Page 34: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

34

Speculative MT microarchitecture

Cache / Main MemoryMemory dependencychecking

Master Thread Speculative Thread

Speculative

Forward

Commit

RegisterFile

DependenceChecker

PC

ResultBuffers

RegisterFile

DependenceChecker

PC

ResultBuffers

Page 35: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

35

Procedure Speculation

r = foo( );a = …;…

speculativeexecution

a = ……

after-callthread

foo: …

return

call thread

fork

commitspeculative result

Page 36: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

36

Loop Speculation

for (i = 0; i < n; i++){

…}

iteration iiteration i+1

iteration i+2

speculativeexecution

fork

commit

commit

Page 37: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

37

Performance Potential in SpecMT

0

1

2

3

4

5

6

SingleLevelLoops

ProceduresMultipleLevelLoops

Last ValueStride Value

Value Prediction

Loopsand

Procedures

Speedup With perfect memory and optimal synchronization

SPEC95int

From [Oplinger/Heine/Lam99]

Page 38: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

38

Challenges in Speculative MT

• Speculative thread computation model• Speculative MT microarchitecture support

– Extremely low cost thread communication/synchronization– Fast and effective checkpointing – Fast recovery and commit: To squash or to selectively

reexecute– Speculative thread scheduling and throttling– Load balancing– Cache/memory subsystem support

• Speculative MT compilation– Dependence analysis for speculative threads– Identification of the most opportunistic threads – Code optimization to minimize misspeculations

Page 39: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

39

New Challenges

Beyond ILP: thread level parallelism– Thread level parallelism– Speculative multithreaded microarchitecture

Dynamic compilation and optimization– Dynamic compilation of ILs for new languages

Page 40: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

40

Dynamic Compilation

• Dynamic compilation of ILs for new languages such as bytecode for Java and MSIL for C#

• Runtime platform includes– Runtime environment: Virtual Machine– Dynamic memory management: Garbage

collection– Dynamic loading and unloading– Dynamic optimization: Just-In-Time compiler– Security: Runtime security check

Page 41: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

Dynamic Compilation Environment

Source codein new

languages

Static Compiler

IL files(like Java bytecode class file)

The IL files move locallyor through

internet

Dynamic loader(IL file

Verification)

libraries

Interpreteror

Fast JIT

OptimalJust-in-time

compiler

Run-time System

VirtualMachine

Linux Win/NT Unix

Hardware

Page 42: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

42

Dynamic Compilation Strategies

• Lightweight optimization – Fast compilation time– Reasonable good performance

• Heavyweight optimization – Slow compilation time– Better performance

• 90-10 rule– 90% methods: lightweight– 10% methods: heavyweight

• Tradeoff: Compilation time vs code execution time

Page 43: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

43

JIT Overhead and Performance

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

201_

compre

ss/fas

t

201_

compre

ss/op

t

202_

jess/fa

st

202_

jess/o

pt

209_

db/fas

t

209_

db/op

t

213_

javac

/fast

213_

javac

/opt

222_

mpega

udio

/fast

222_

mpega

udio

/opt

207_

mtrt/fa

st

207_

mtrt/op

t

228_

jack/fa

st

228_

jack/o

pt

SpecJVM98 Benchmarks

Opt JIT

VM

App byopt JIT

Source: Intel/MRL JIT 1999

Exe

cuti

on

Tim

e (

Sec

on

d)

PII/350MHz

App byfast JIT

FastJIT

Page 44: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

44

JIT Compiler Infrastructure

UnoptimizedNative

UnoptimizedNative

Profiling Data Representation

CountersCounters

BytecodeBytecode

Fast CodeGenerator

Fast CodeGenerator

OptimizedNative

OptimizedNative

OptimizingCompiler

OptimizingCompiler

Input IL file

Call fast JIT first

Gather profiling infowhen execute fast JITed code

Call optimizing JIT

Execute optimizing JITed code

Page 45: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

45

Challenges of JIT Compiler

• Efficient exception handling• Bounds checking elimination• Efficient synchronization• Efficient support for garbage collection• Effective use of profiling information

– Path profiling– Reduce profiling overhead

Page 46: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

46

Summary

• Moore’s Law is still valid for the next 10+ years• But significant challenges

– Power– Increasing performance efficiently

• Opportunity: Increase cooperation between compiler and microarchitecture

• Efficient JIT and Runtime for intermediate languages

• Move from Instruction-level parallelism to thread-level parallelism

Page 47: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

47

Microprocessor 2010

• At Least 100x Performance of the Pentium® 4 Processor

• 20 Ghz

• Multiple Processors on a Die

• Multiple Threads per Processor

• Specialized Processors to Accelerate Human-Interface and Communications

Page 48: New Challenges in Microarchitecture and Compiler Designpiak/teaching/ads/ads... ·  · 2002-01-29New Challenges in Microarchitecture and Compiler Design Fred Pollack Intel Fellow

48

Cooperation between Industry & Academia

• Open source is a good way to coordinate research activities between industry and academia– Intel MRL open source Computing Vision Lib– Intel MRL open source Open Runtime Platform– Several other Intel open source utilities

http://developer.intel.com/software/opensource• Check out MRL web pages and Intel tools

and compilershttp://intel.com/research/mrl

http://developer.intel.com/vtune