Top Banner
NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud, Ahmed S. AL- Zawawi, Aravindh Anantaraman, and Eric Rotenberg Virtual Multiprocessor: An Analyzable, High-Performance Microarchitecture for Real-Time Computing
25

NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

Jan 17, 2016

Download

Documents

Gerard Matthews
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

NC STATE UNIVERSITY

Center for Embedded Systems Research (CESR)Department of Electrical & Computer Eng’g

North Carolina State University

Ali El-Haj-Mahmoud, Ahmed S. AL-Zawawi, Aravindh Anantaraman, and Eric Rotenberg

Virtual Multiprocessor: An Analyzable, High-Performance

Microarchitecture for Real-Time Computing

Page 2: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

2El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Embedded Processor Trends

Inheriting desktop high-performance features Examples

• ARM11: 8-stage pipeline, caches, dynamic br. pred.

• Ubicom IP3023: 8 hardware threads

• PowerPC 750: 2-way superscalar, OOO execution

Page 3: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

3El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Real-Time Systems and Analyzability

Schedulability of task-set determined a priori• Requires worst-case execution times (WCET) statically analyzable microarchitecture

Dynamic microarchitecture features complicate real-time design

A trade-off between performance and analyzability

A

B

Page 4: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

4El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Multiple Simple Processors

+ Analyzable

+ Natural fit with real-time systems

− Rigid resource partitioning

− Higher cost/performance metric

proc 1 proc 2

A BC

Page 5: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

5El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Simultaneous Multithreading (SMT)

+ Flexible resource sharing

+ Better cost/performance metric

− Unanalyzable

SMT

A B C

Page 6: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

6El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Unanalyzability of SMT

− Violates single-task WCET assumption (tasks analyzed separately)

− Arbitrary periods arbitrary overlap of tasks

− Dynamic interference

Cannot derive WCETs

Cannot perform schedulability

Page 7: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

7El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Real-Time Virtual Multiprocessor

Combine MP analyzability and SMT flexibility Key idea: Interference-free multithreading

• SMT performance• WCET of each task independent of task-set

RVMP substrate: two parts• Highly reconfigurable multithreaded superscalar

Space: multiple arbitrary interference-free partitionsTime: rapidly reconfigure partitions

• Static schedule orchestrates partitioning

Page 8: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

8El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Big Picture

Co-design processor and real-time scheduling for analyzable high-performance

Page 9: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

9El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

RVMP Architecture

Superscalar “ways” are natural partitioning granularity

Different-sized virtual processors carved out of single superscalar

Page 10: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

10El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Processor Architecture

Starting point• Alpha 21164: 4-way in-order superscalar• Ubicom IP3023: 8 hardware threads

(4 in RVMP 4 VPs)

Simplifications for analyzability• In-order issue within VPs• Software-managed scratchpads • Static branch prediction

Not limitations of RVMP!

Page 11: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

11El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Processor Architecture

FetchUnit

PC

InterleavedInstruction Scratchpad

DecodeSlotter and Scoreboard

(Issue Logic)

Int RF

ShadowBuffers

FP RF

Data Scratchpad

FU4: FPU

FU0: INT

FU1: INT/MUL/DIV

FU2: INT/AGEN

FU3: INT/AGEN

4 4

1

ShadowBuffers

RD

RDWR

HRT

Fetch Buffer

Page 12: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

12El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Fetch Buffer

InstructionScratchpad

FetchUnit

Decode Issue

Backend

FV

PV

Shadow Buffers

Shadow Buffers

FV

PV

HRT

Instruction Fetch

Page 13: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

13El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Real-Time Scheduling

Too complicated• Must schedule entire hyper-period• Overwhelming # of possible space/time schedules• High dedicated-storage cost for schedule

A

BC

D

Page 14: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

14El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

WCET2WCET1

Task A

WCET3WCET4

period

1

2

3

4#

of

way

s

4 ways

3 ways

2 ways

1 way

…Round

Page 15: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

15El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Task Bperiod

1

2

3

4#

of

way

s

4 ways

3 ways

2 ways

1 way

Page 16: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

16El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Task Cperiod

1

2

3

4#

of

way

s

4 ways

3 ways

2 ways

1 way

Page 17: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

17El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Task Dperiod

1

2

3

4#

of

way

s

4 ways

3 ways

2 ways

1 way

Page 18: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

18El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

FAILEDSUCCEEDEDCyclic schedule

Page 19: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

19El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Interaction: Scheduling and Architecture

HRT LTC

FU0FU1FU2FU3FU4

FU0FU1FU2FU3FU4

60

40

100 cycles

60 40

FV PV CVs

0

EOT

1

INVALID

INVALID

Page 20: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

20El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Experiments Tasks from C-lab and MiBench benchmarks 100 task-sets

• 4 tasks per task-set (also 8 tasks in paper)• grouped according to scalar utilization (U_scalar)

Two experiments• Worst-case schedulability analysis • Run-time experiments

Page 21: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

21El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Worst-Case Schedulability Tests

25 25 25 25

1614 15

5

16

97

2

7

1 1

25

911 10

20

25

9

1618

2325

18

24 24 2525

0%

25%

50%

75%

100%

Sca

lar

RV

MP

4x1

2x2

1x4

Sca

lar

RV

MP

4x1

2x2

1x4

Sca

lar

RV

MP

4x1

2x2

1x4

Sca

lar

RV

MP

4x1

2x2

1x4

0 < U_scalar <= 1 1 < U_scalar <= 2 2 < U_scalar <= 3 3 < U_scalar <= 4

Task-set bins

Su

cces

s r

ate

(%)

Failure

Success

Page 22: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

22El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

RVMP Configurations

0

1

2

3

4

5

6

7

1-1-

1-1

2-2

1-3/

2-2

1-3/

2-1-

14/

1-1-

24/

4/2-

24/

4/1-

34/

4/4/

4

1-1-

1-1

2-2

1-3/

2-2

1-3/

2-1-

14/

1-1-

24/

4/2-

24/

4/1-

34/

4/4/

4

1-1-

1-1

2-2

1-3/

2-2

1-3/

2-1-

14/

1-1-

24/

4/2-

24/

4/1-

34/

4/4/

4

1 < U_scalar <= 2 2 < U_scalar <= 3 3 < U_scalar <= 4

Task-set bins

# o

f ta

sk-s

ets

Page 23: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

23El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Run-Time Experiments

25 25 25 25 25

1715 15

5

18 17 16

97

3

16

7

1

96

810 10

20

7 8 9

1618

22

911

18

24 24 25

1619

14

25

10%

25%

50%

75%

100%

RV

MP

4x1

2x2

1x4

SM

T-E

DF

SM

T-I

CN

T

RV

MP

4x1

2x2

1x4

SM

T-E

DF

SM

T-I

CN

T

RV

MP

4x1

2x2

1x4

SM

T-E

DF

SM

T-I

CN

T

RV

MP

4x1

2x2

1x4

SM

T-E

DF

SM

T-I

CN

T

0 < U_scalar <= 1 1 < U_scalar <= 2 2 < U_scalar <= 3 3 < U_scalar <= 4

Task-set bins

Su

cces

s ra

te (

%)

Failure

Success

Page 24: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

24El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Unsafe Behavior of SMT

16

12

6

Only SMT-EDF successOnly SMT-ICNT successBoth successBoth failure

Failure40%

Success60%

1 < U_scalar <= 2

Page 25: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,

25El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Summary Novel contributions

• Virtualize a single processor− Space: variable-size interference-free partitions

− Time: rapid reconfiguration

• Simple real-time scheduling approach

Analyzability of MP with flexibility of SMT Co-design processor and real-time scheduling

for analyzable high-performance