Top Banner
NCKU, Low power, high performance VLSI design lab Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms Advisor: Lih-Yih Chiou Student: H i-Ho Chen 23 June 2008
58
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Defense

NCKU, Low power, high performance VLSI design lab

Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms

Advisor: Lih-Yih Chiou Student: Hi-Ho Chen

23 June 2008

Page 2: Defense

2

NCKU, Low power, high performance VLSI design lab

Outline

Motivation and ContributionsPrevious WorksProposed Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms

RepresentationDesign Flow OverviewBlock Level

MethodologyTranslation

Platform LevelDevelop Library for CoWareSystem Control Generator

ExperimentsScalar 176*144DWT 44*36

Conclusions and Future worksReferences

Page 3: Defense

3

NCKU, Low power, high performance VLSI design lab

Introduction

Entering SoC era, more and more IPs are integrated onto one single chip

ESL (Electronic System Level) design is proposed to rapidly allow designer to simulate the system function behavior at higher level before hardware implementation

Communication design has become one of the important criteria for SoC design

Page 4: Defense

4

NCKU, Low power, high performance VLSI design lab

Top-down Design Flow

Product Requirements from customer

Specification Model

Architecture Model

Communication Model

Implement Model

Algorithm select Optimization

AllocationBehavior partitioning

scheduling

Protocol selectionChannel partitioning

arbitration

Cycle schedulingProtocol Scheduling

1

2

3

4

5[1]S. S. Pasricha, N. Dutt, and M. Ben-Romdhane, "Using TLM for exploring bus-based SoC communication architectures," 16th IEEE International Conference on Application-Specific Systems, Architecture Processors, 2005, pp. 79-85, 2005

Page 5: Defense

5

NCKU, Low power, high performance VLSI design lab

Arbitration Level vs. Simulation Speed

[2]C. Lennard and D. Mista, "Taking Design to the System Level," 2006 [Online]. Available:(http://www.arm.com/pdfs/ARM_ESL_20_3_JC.pdf)

Page 6: Defense

6

NCKU, Low power, high performance VLSI design lab

High Level Synthesis

Behavior Synthesis

Separate the Control and Data path from the behavior description

Control

If then else

Switch case

Data PathData flow

x=a+b;c=a<b;If(c){

d=c-f;}

Else{ g=h+I;

}J=d*g;L=e+x;

x=a+b;c=a<b;

c

d = e-f; g =h+i;

j = d*g;l=e+x;

control

Memory

MUX MUX MUX

x ALU

MUX

Data pathControl

[3]SPARK. Methodology, http://mesl.ucsd.edu/spark/methodology.shtml

Page 7: Defense

7

NCKU, Low power, high performance VLSI design lab

Contributions

Rapid system explorationFast exploration of multiple micro-architecture alternatives

Shorter verification/simulation cycleSpeed up with behavior-level to transaction level

Quickly obtain the power and performance informationEarlier estimation of design specifications

Increase the performance Reduce the communication & computation

Page 8: Defense

8

NCKU, Low power, high performance VLSI design lab

Outline

Motivation and ContributionsPrevious WorksProposed Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms

RepresentationDesign Flow OverviewBlock Level

MethodologyTranslation

Platform LevelDevelop Library for CoWareSystem Control Generator

ExperimentsScalar 176*144DWT 44*36

Conclusions and Future worksReferences

Page 9: Defense

9

NCKU, Low power, high performance VLSI design lab

Previous Works - SPARK(1)

Input : C

Output C VHDL

Advantages :They define a new synthesis tool for parallel design

Disadvantages :No platform architecture

No communication issue

[4]SPARK:A High-Level Synthesis Frame work For Applying Parallelizing Compiler Transformations VLSI Design, 2003. Proceedings. 16th International Conference on 4-8 Jan. 2003 Page(s):461 – 466

Phase 1

Phase 2

Phase 3

Page 10: Defense

10

NCKU, Low power, high performance VLSI design lab

Previous Works - xPilot(2)

Input: c/SystemC

Output: Verilog/SystemC

MethodPhase 1

SSDM

Phase 2Synthesis

Advantages:Directly mapping to FPGA

Quick Verification

Disadvantages:No communication issue

[5]“Platform-Based Behavior-Level and System-Level Synthesis“International SOC Conference, 2006 IEEE Sept. 2006 Page(s):199 – 202

Phase 1

Phase 2

Page 11: Defense

11

NCKU, Low power, high performance VLSI design lab

Previous Works - MFASE(3)

MFASE:(Multiple Functions SoCs Analysis Environment)

Design Flow HW/SW Partition.

Architecture mapping. communication analysis.

…..Advantage

HW/SW co-design

Limitation IP Data Base

[6]MFASE: Multiple Functions SoCs Analysis Environment the VLSI Desing/CAD Symposium, Taiwan, Augest 2007

Page 12: Defense

12

NCKU, Low power, high performance VLSI design lab

Summary

Previous worksSynthesis tool

SPARK & xPilot Synthesis from hardware C code to RTL Verilog code

SPARK & xPilot did not consider communication issue

MFASE did not mention about how to generate automatically

ThesisBuilding a automation tool from Functional Level to Transaction Level for virtual Bus-based Platform

Computation & Communication issues

Automation tool from Behavior Level to Transaction Level

Page 13: Defense

13

NCKU, Low power, high performance VLSI design lab

Outline

Motivation and ContributionsPrevious WorksProposed Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms

RepresentationDesign Flow OverviewBlock Level

MethodologyTranslation

Platform LevelDevelop Library for CoWareSystem Control Generator

ExperimentsScalar 176*144DWT 44*36

Conclusions and Future worksReferences

Page 14: Defense

14

NCKU, Low power, high performance VLSI design lab

Representation

Example C to CDFG

Example for “If the else”

Example for

“for loop”

Condition

END

If Body

ElseBody

TrueFalseif(a==0){b=c+d;}else{b=c-d}

a==0

END

b=c-d

TrueFalse c d

+

b

for(i=0;i<5;i++){c=a+b;}

i=0

i<5

Body

i=i+1

END

True

False

a b

+

C

Initial

Condition

Body

Update

END

True

False

Page 15: Defense

15

NCKU, Low power, high performance VLSI design lab

Outline

Motivation and ContributionsPrevious worksProposed Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms

RepresentationDesign Flow OverviewBlock Level

MethodologyTranslation

Platform LevelDevelop Library for CoWareSystem Control Generator

ExperimentsScalar 176*144DWT 44*36

Conclusions and Future worksReferences

Page 16: Defense

16

NCKU, Low power, high performance VLSI design lab

Design Flow Overview 1/2

Profiling & Analysis

Translation

SystemC.cpp

SystemC.h

Until all Spec C have been translate

Spec C to CDFGTranslation

Spec C Spec CSpec C

Link Port Setting

tcl Transaltion

Wrapper Library

Simulation on Coware CoWare

Library

ConnectsConnects

PMUGenerator

CTLGenerator

Link Port

Approximated-TimeSimulation

Platform Level using simple Bus

Platform Level using

CoWare

Block Level

Page 17: Defense

17

NCKU, Low power, high performance VLSI design lab

Design Flow Overview 2/2

Block LevelMethodology

Parallel

Cascade (Multi cycle)

TranslationState & Edge Reduction

STG to SystemC generator

Platform Level using Simple BusApproximate time simulation

Platform Level using CoWare*.tcl generator

Peripheral generator

Page 18: Defense

18

NCKU, Low power, high performance VLSI design lab

Outline

Motivation and ContributionsPrevious WorksProposed Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms

RepresentationDesign Flow OverviewBlock Level

MethodologyTranslation

Platform LevelDevelop Library for CoWareSystem Control Generator

ExperimentsScalar 176*144DWT 44*36

Conclusions and Future worksReferences

Page 19: Defense

19

NCKU, Low power, high performance VLSI design lab

Block Level

Input

Functional Level CDFG

Block Inside Configuration

Max Parallel deep

Buffer Size

Boundary Case

Block to Bus Configuration

Max Burst size

Initial Address

Address offset

Output

TLM SystemC

CDFG

CDFG Analysis

Power Lib

Parallel analysis

W r a p p e r

L i b

Boundary analysis

Block synthesis&& interface synthesis

Synthesisconfigure

Irregularity analysis

State && Edge Reduction

Performance Estimation

For loop condition

No

Yes

Approximate time Cycle time

Implement SelectState Reduction

Parallel&

internalCommunication

analysis

Page 20: Defense

20

NCKU, Low power, high performance VLSI design lab

Outline

Motivation and ContributionsPrevious worksProposed Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms

RepresentationDesign FlowBlock Level

MethodologyTranslation

Platform LevelDevelop librarySystem Control generator

ExperimentScalar 176*144DWT 44*36

Conclusions and Future worksReferences

Page 21: Defense

21

NCKU, Low power, high performance VLSI design lab

ForBegin

ForBegin

Body(0,4)

Body(0,6)

Body(1,4)

Body(1,6)

ForEnd

ForEnd

Cycle1 2 3 4 5 6 7 8

Body(0,3)

Body(0,5)

Body(1,3)

Body(1,5)

ForBegin

ForBegin

Body(0,3)

Body(0,4)

Body(0,5)

Body(0,6)

Body(1,3)

Body(1,4)

ForEnd

ForEnd

Cycle1 2 3 4 5 6 7 8 9 10

Body(1,5)

Body(1,6)

11 12

Block Level - Methodology1/10

Computation Reduction

Parallel analysisStep 1: C to CDFG format

Step 2 : un-rolling the “for loop” to know the cycle counts

Step 3 : find the Solution to fit the “for loop” condition

Under Hardware constrain

GCD Methodology

Step 4: We will find the closed solution based on the Hardware condition

Step 5: update CDFG

for(j=0;j<2;j++){ for(i=3;i<7;i++){ b[j][i] = (a[j][i]+a[j][i+1])>>1; }}

Page 22: Defense

22

NCKU, Low power, high performance VLSI design lab

Assume a[6][8]

Address

a[0][3] = address 12a[0][4] = address 16a[0][5] = address 20

a[1][3] = address 44a[1][4] = address 48a[1][5] = address 52

Memory

Addr 12

Addr 44

Block Level – Methodology 2/10

Communication factorsWe assume the array will be located in the external MemoryHow can we get data from external memory?Bus Transform

SingleBurst

Buffer Size requirementParallel & size of data transformation will influence the performance and power

Mem 1

Bus

A[j][i]

IBuff OBuff

Mem 2

Read Write

Burst

New Transform

Page 23: Defense

23

NCKU, Low power, high performance VLSI design lab

Block Level - Methodology3/10

Communication Reduction

Case 1 ForBegin

ForBegin

Body(0,4)

Body(0,6)

Body(1,4)

Body(1,6)

ForEnd

ForEnd

Cycle1 2 3 4 5 6 7 8

Body(0,3)

Body(0,5)

Body(1,3)

Body(1,5) Case 2 For

BeginFor

Begin

Body(0,4)

Body(0,6)

Body(1,4)

Body(1,6)

ForEnd

ForEnd

Cycle1 2 3 4 5 6 7 8

Body(0,3)

Body(0,5)

Body(1,3)

Body(1,5)

Case 3 ForBegin

ForBegin

Body(0,4)

Body(0,6)

Body(1,4)

Body(1,6)

ForEnd

ForEnd

Cycle1 2 3 4 5 6 7 8

Body(0,3)

Body(0,5)

Body(1,3)

Body(1,5) Case 4 For

BeginFor

Begin

Body(0,4)

Body(0,6)

Body(1,4)

Body(1,6)

ForEnd

ForEnd

Cycle1 2 3 4 5 6 7 8

Body(0,3)

Body(0,5)

Body(1,3)

Body(1,5)

Page 24: Defense

24

NCKU, Low power, high performance VLSI design lab

T(1) WR

B(3) B( 2)

T(2) WR

B(3) B( 2)

T(3) WR

B(3) B( 2)

T(4) WR

B(3) B( 2)

b[0][3] = (a[0][3]+a[0][4])>>1;

b[0][4] =

(a[0][4]+a[0][5])>>1;

b[0][5] = (a[0][5]+a[0][6])>>1;

b[0][6] =

(a[0][6]+a[0][7])>>1;

b[1][3] = (a[1][3]+a[1][4])>>1;

b[1][4] =

(a[1][4]+a[1][5])>>1;

b[1][5] = (a[1][5]+a[1][6])>>1;

b[1][6] =

(a[1][6]+a[1][7])>>1;

Block Level - Methodology4/10

Case 1:

parallel deep 2operator 1 cycle

Irregularity: 1Buss Access times: Read : 4 Write : 4

Max Buffer Size usage :3

B(): Burst sizeT(): Transaction numberR: Read from busW: Write to bus

ForBegin

ForBegin

For For For For

ForEnd

ForEnd

Cycle1 2 3 4 5 6 7 8

For For For For

Case 1

Page 25: Defense

25

NCKU, Low power, high performance VLSI design lab

Block Level - Methodology5/10

Case 2 :

parallel deep 2 operator 2 cycles

Irregularity : 1Bus Access times: Read 2: Write 2Max Buffer Size usage :5

b[0][3] = (a[0][3]+a[0][4])>>1;

b[0][4] =

(a[0][4]+a[0][5])>>1;

b[0][5] = (a[0][5]+a[0][6])>>1;

b[0][6] =

(a[0][6]+a[0][7])>>1;

b[1][3] = (a[1][3]+a[1][4])>>1;

b[1][4] =

(a[1][4]+a[1][5])>>1;

b[1][5] = (a[1][5]+a[1][6])>>1;

b[1][6] =

(a[1][6]+a[1][7])>>1;

T(1)R

B(5)

T(2) W

B( 4)

R

B(5)

T(4) W

B( 4)

T(3)

B(): Burst sizeT(): Transaction numberR: Read from busW: Write to bus

ForBegin

ForBegin

For For For For

ForEnd

ForEnd

Cycle1 2 3 4 5 6 7 8

For For For For

Case 2

Page 26: Defense

26

NCKU, Low power, high performance VLSI design lab

Block Level - Methodology6/10

Case 3:

parallel deep 2operator 3 cycles

Irregularity : 2Bus Access times: Read 3: Write 3Max Buffer Size usage :8

T(1)R

B(3)

T(2) W

B( 2)

R

B(3)

W

B( 2)

T(3)R

B(5)

T(3) W

B( 4)

b[0][3] = (a[0][3]+a[0][4])>>1;

b[0][4] =

(a[0][4]+a[0][5])>>1;

b[0][5] = (a[0][5]+a[0][6])>>1;

b[0][6] =

(a[0][6]+a[0][7])>>1;

b[1][3] = (a[1][3]+a[1][4])>>1;

b[1][4] =

(a[1][4]+a[1][5])>>1;

b[1][5] = (a[1][5]+a[1][6])>>1;

b[1][6] =

(a[1][6]+a[1][7])>>1;

B(): Burst sizeT(): Transaction numberR: Read from busW: Write to bus

ForBegin

ForBegin

For For For For

ForEnd

ForEnd

Cycle1 2 3 4 5 6 7 8

For For For For

Case 3

Page 27: Defense

27

NCKU, Low power, high performance VLSI design lab

for(j=0; j<2; j++){ for(i=3; i<7; i++){ b[j][i] = (a[j][i]+a[j][i+1])>>1; }}

boundary

Memory

ADDR relation

New Transform

j

i

Block Level - Methodology7/10

Boundary case

Limitation: high address relation

Relation with the Memory location

Page 28: Defense

28

NCKU, Low power, high performance VLSI design lab

Block Level - Methodology8/10

Case 4:

parallel deep 2operator 4 cycles

Irregularity :1Bus Access times: Read 2: Write 2Max Buffer Size usage :10

b[0][3] = (a[0][3]+a[0][4])>>1;

b[0][4] =

(a[0][4]+a[0][5])>>1;

b[0][5] = (a[0][5]+a[0][6])>>1;

b[0][6] =

(a[0][6]+a[0][7])>>1;

b[1][3] = (a[1][3]+a[1][4])>>1;

b[1][4] =

(a[1][4]+a[1][5])>>1;

b[1][5] = (a[1][5]+a[1][6])>>1;

b[1][6] =

(a[1][6]+a[1][7])>>1;

B(): Burst sizeT(): Transaction numberR: Read from busW: Write to bus

ForBegin

ForBegin

For For For For

ForEnd

ForEnd

Cycle1 2 3 4 5 6 7 8

For For For For

Case 4

T(1)R

B(5)

T(2) W

B( 4)

T(4)R

B(5)

T(3) W

B( 4)

Page 29: Defense

29

NCKU, Low power, high performance VLSI design lab

Block Level - Methodology9/10

Which case is better for implement?Problem

Case 1single operator cycleBus Access times

Case 3Control is so complexityNo considering the Boundary case

Case 4Buffer size

We choose “Case 2” to implementUnder Boundary case conditionUnder Buffer size constrainBus Access issueregular

Case 1 Case 2 Case 3 Case 4

Irregularity 1 1 2 1

Boundary Case

O O X O

Max Buffer size

3 5 8 10

Read Bus Access times

4 2 3 2

Write Bus Access times

4 2 3 2

Page 30: Defense

30

NCKU, Low power, high performance VLSI design lab

Boundary condition Cycle 8Parallel deep 2

O(2) O(3) O(4)O(1)

B(5)

S(4) W(2)R(2)

Ir(1) Ir(1) Ir(2) Ir(1)

B(8)B(5)

S(4) W(4)R(4)

Block Level - Methodology10/10

Under condition Parallel deepBoundary Case

AnalysisStep 1: Trace states by operator cycles

Step 2: separate the Read and Write part,find the period

Step 3: estimation the cycles and hardware cost

Step 4: find the best solution

O(): operator cyclesB(): buffer sizeR(): Read countsW(): Write countsS(): state sizesIr(): Irregularity

case1 case2 case3 case4

Page 31: Defense

31

NCKU, Low power, high performance VLSI design lab

Outline

Motivation and ContributionsPrevious WorksProposed Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms

RepresentationDesign Flow OverviewBlock Level

MethodologyTranslation

Platform LevelDevelop Library for CoWareSystem Control Generator

ExperimentsScalar 176*144DWT 44*36

Conclusions and Future worksReferences

Page 32: Defense

32

NCKU, Low power, high performance VLSI design lab

Translation 1/3

Example for CDFG to state transaction graph (STG)Fit to time step

Easily to FSM Generator

If Then Else

Begin

If Body

Else Body

If Then Else END

a==0

END

b=c-d

False

b=c+d

a==0

a!=0

Cycle 1

Cycle 2

Cycle 3

b=c+d

b=c-d

a==0

a!=0

i=0

i<5

c=a+b

i=i+1

END

True

False

for Begin for Body for End

True

False

i=0c=a+bi=i+1

i<5

i>=5

Cycle 3

Cycle 2

Cycle 1

Example for ”If then else”

Example for ”for loop”

Page 33: Defense

33

NCKU, Low power, high performance VLSI design lab

Translation 2/3

Step 1

CDFG to STG

Un-rolling “for loop” condition

Step 2

Methodology

Reduce Computation

Parallel

Reduce Communication

Cascade

Architecture definition

Step 3

Translate to TLM SystemCHeader

Function

for begin for end

for begin Body for end

ForBegin

ForBegin

For For For For

ForEnd

ForEnd

Cycle1 2 3 4 5 6 7 8

For For For For

Case 2

for Body

RD_REQ READ EXE Write WT_REQ

!Grant

Grant

!Read done

Read done

!Writedone

Writedone

!Grant

Grant

Page 34: Defense

34

NCKU, Low power, high performance VLSI design lab

Translation 3/3

Block Level

Interface

Block to Wrapper

Block to Block

Control

FSM

Data path

Operator assignment

Control signalsBlock to Wrapper

Block to Data path

Block to Buffer

DataPath(1)

DataPath(2)

Input Buffer

CTL

Output BufferWrapper

Bus

Wrapper Interface

Block Interface

Page 35: Defense

35

NCKU, Low power, high performance VLSI design lab

Outline

Motivation and ContributionsPrevious WorksProposed Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms

RepresentationDesign Flow OverviewBlock Level

MethodologyTranslation

Platform LevelDevelop Library for CoWareSystem Control Generator

ExperimentsScalar 176*144DWT 44*36

Conclusions and Future worksReferences

Page 36: Defense

36

NCKU, Low power, high performance VLSI design lab

Platform Level

Input :Port mapping

Library location

CoWare setting

Output*.tcl for CoWare based

Communication GeneratorSystem Control

Wrapper

Mux

PMU

Interrupt

Communication generator

Communication &&Wire

configure

Peripheral Generator

Platform Level

SystemControl

Generator

WrapperGenerator

PMUGenerator

MuxGenerator

InterruptGenerator

Platform Generator

Page 37: Defense

37

NCKU, Low power, high performance VLSI design lab

Outline

Motivation and ContributionsPrevious WorksProposed Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms

RepresentationDesign Flow OverviewBlock Level

MethodologyTranslation

Platform LevelDevelop Library for CoWareSystem Control Generator

ExperimentsScalar 176*144DWT 44*36

Conclusions and Future worksReferences

Page 38: Defense

38

NCKU, Low power, high performance VLSI design lab

Develop Library for CoWare 1/3

Master Wrapper Generator

Base on CoWare API fo

r AMBA AHB

Advantage

Support any burst type

Burst Lock

LimitationBuffer size

IP_Dout

IP_OutValid

IP_OuMemType

IP_OuMemReq

IP_OuMemRNW

IP_OuMemCounts

IP_OuMemAccess

IP_OuMemAddr

finish3

WR_InValid

WR_Din

WR_RelReq

Data_Out_Done

Synchronizer

InBuffer

OutBuffer

Synchronizer

Input Handshake

Output Handshake

FSM

From BusFrom IP

Clk Rst

Generated Blocks

StartMaster

AHB Bus

AHBInitiator_inoutmaster_port

Wrapper

Page 39: Defense

39

NCKU, Low power, high performance VLSI design lab

Develop Library for CoWare 2/3

PMU Generator

Input :Configure

Block Num: Default 3

Idle cycle: Default 1000

Wake Up cycle: Default 1000

Policy: fixed-time out Policy

Output : SystemC

Start

PMU_Out_Ready_0

PMU_Out_Idle_0

PMU_In_PB_CL_0

PMU_In_CL_0

PMU_out_CLkg_0

Clk Rst

FSM

Register

Signal Detect

PMU

LUT

Page 40: Defense

40

NCKU, Low power, high performance VLSI design lab

900 ns 950ns

time

Freq = 70 MHz

1* *Active ActiveN PFreq1

* *Idle IdleN PFreq

• Known parameters

• Total simulation time

• Operation frequency

• Active duration

• Total active number

ACT Energy

Idle Energy

Total energy = (ACT Energy + Idle Energy)

Power= (ACT Energy + Idle Energy)/total time

idleN Number of Idle counts

IdleP Idle power/unit time

ActiveP Active power/unit time

ActiveN Number of Active counts

Power Calculation

Develop Library for CoWare 3/3

Page 41: Defense

41

NCKU, Low power, high performance VLSI design lab

Outline

Motivation and ContributionsPrevious WorksProposed Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms

RepresentationDesign Flow OverviewBlock Level

MethodologyTranslation

Platform LevelDevelop Library for CoWareSystem Control Generator

ExperimentsScalar 176*144DWT 44*36

Conclusions and Future worksReferences

Page 42: Defense

42

NCKU, Low power, high performance VLSI design lab

System Control Generator

TOP Control Generator

InputBlock scheduling

Block numbers

Type settingParallel

Pipeline

Single (Default)

OutputSystemC Start

CtlDone1

Clk Rst

CtlEnable1

ToFinishFSM

Synchronizer

Synchronizer

CTL

Block 1

Block 2

Block 3

Enable Block 1

Enable Block 2

Enable Block 3

Page 43: Defense

43

NCKU, Low power, high performance VLSI design lab

Outline

Motivation and ContributionsPrevious WorksProposed Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms

RepresentationDesign Flow OverviewBlock Level

MethodologyTranslation

Platform LevelDevelop Library for CoWareSystem Control Generator

ExperimentsScalar 176*144DWT 44*36

Conclusions and Future worksReferences

Page 44: Defense

44

NCKU, Low power, high performance VLSI design lab

CoWare - ScalarSequence : Foreman, Football(30 frames)

Page 45: Defense

45

NCKU, Low power, high performance VLSI design lab

Simple Bus Environment - Scalar

Y Cb Cr

Mem 1 Mem 2

Arbiter

CPU

Simple Bus

SystemC 2.1 Simple bus Read Transfer

Write Transfer

Page 46: Defense

46

NCKU, Low power, high performance VLSI design lab

CoWare Environment -Scalar

Top Platform for scalar application

CTL

Y

Cb

Cr

PMU

Mux

Interrupt

Wrapper

Step 1Step2

Step3

Step4 Step 5

Page 47: Defense

47

NCKU, Low power, high performance VLSI design lab

Experiments – Scalar

Performance with app-time and cycle time

Scalar performance && State size in Cycle time base

scalar Y part Cb part Cr part

cycle cycle cycle

Approximate time 239761 91775 91775

Cycle time 325296 100638 100638

Scalar Y part

Parallel constrain 4

Maxcascade

STSize

BusAccess

ComputationCycle

Communication Cycle

Code Line

Original C code 0 0 0 126720 0 23

case 2 4 78 9916 31680 115118 1403

case 3 11 81 1724 31680 33388 1668

Page 48: Defense

48

NCKU, Low power, high performance VLSI design lab

Experiments – Power Monitor

Power Library

Method Search the Look up table

Block -> Module

FSM switch

InBuffer

OutBuffer

Register

Block ->Data PathOperator

Data Path

Size Active power Idle power

ADD 8 1.0444 mw 23.4124 nw

SUB 8 808.2718 uW 21.3216 nW

DIV 8 4.0100 mW 67.5333 nW

SHR 8 425.8246 uW 9.9244 nW

width Power power Idle power

FSM 6 0.418mw 12 nw

Buffer 32 1.7346mw 66 nw

Register 32 1.7346mw 66 nw

Page 49: Defense

49

NCKU, Low power, high performance VLSI design lab

Experiments - Scalar

Scalar176*144 Power saving

Case ActiveCycle

Wake upCycle

SleepCycle

Power mw

Scalar YNO PMU 526572 X X 22584673.08

WITH PMU 326296 1000 199276 14038522.54

Scalar CbNO PMU 526572 X X 11000089.08

WITH PMU 101638 1000 423934 2124065.68

Scalar CrNO PMU 526572 X X 11000089.08

WITH PMU 101638 1000 423934 2124065.68

No PMU with PMU Power Saving Rate

Scalar 44584851.24mw 18286653.9mw 58.98%

Page 50: Defense

50

NCKU, Low power, high performance VLSI design lab

DWT && IDWT

Experiments - DWT

DWT

IDWT

Page 51: Defense

51

NCKU, Low power, high performance VLSI design lab

Experiments - DWT

Top Platform for DWT application

Step 1Step 2

Step 3

Step 4

Page 52: Defense

52

NCKU, Low power, high performance VLSI design lab

Experiments - DWT

Performance with app-time and cycle time

DWT performance && State size in Cycle time base

DWT cycle

Approximate time 11088

Cycle time 76262

DWT

Parallel constrain 1

Maxcascade

STSize

BusAccess

ComputationCycle

Communication Cycle

CodeLine

Original C code 0 0 0 1584 0 46

case 1 1 42 9504 1584 74678 8630

Page 53: Defense

53

NCKU, Low power, high performance VLSI design lab

Experiments - DWT

DWT 44*36 Power saving

ActiveCycle

Interrupt Cycle

SleepCycle

Power mw

DWT

NO PMU 145962 X X 4066501.32

WITHPMU

76362 1000 68600 2155442.52

IDWT

NO PMU 145962 X X 4415350.53

WITHPMU

69600 1000 75362 2105550.72

NO PMU With PMU Power Saving Rate

DWT 8481851.85mw 4260993.24mw 49.765%

Page 54: Defense

54

NCKU, Low power, high performance VLSI design lab

Outline

Motivation and ContributionsPrevious worksProposed Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms

RepresentationDesign FlowBlock Level

MethodologyTranslation

Platform LevelDevelop librarySystem Control generator

ExperimentsScalar 176*144DWT 44*36

Conclusions and Future WorksReferences

Page 55: Defense

55

NCKU, Low power, high performance VLSI design lab

We develop a Automation tool from behavior level CDFG to TLM level SystemC for virtual bus based platform design

We have also incorporated some method to reduce the Bus Access times for the system design at the Architecture level profiling

We develop some library for virtual bus based platform

We can fast explore the Architecture to reduce the verification time

Conclusions

Page 56: Defense

56

NCKU, Low power, high performance VLSI design lab

Future Works

Model each module’s power using equations so that a more accurate power management could be carried out

Adding a test platform into the tool so that the corresponding test circuitry could be generated automatically

Including more hardware architectures to extend the Hardware Library so that designer can have more design options to choose

Page 57: Defense

57

NCKU, Low power, high performance VLSI design lab

References

[1]S. S. Pasricha, N. Dutt, and M. Ben-Romdhane, "Using TLM for exploring bus-based SoC communication architectures," 16th IEEE International Conference on Application-Specific Systems, Architecture Processors, 2005, pp. 79-85, 2005

[2]C. Lennard and D. Mista, "Taking Design to the System Level," 2006 [Online]. Available:(http://www.arm.com/pdfs/ARM_ESL_20_3_JC.pdf)

[3] SPARK Methodology, (http://mesl.ucsd.edu/spark/methodology.shtml)

[4] S. Gupta, S. Gupta, N. Dutt, R. Gupta, and A. Nicolau, "SPARK: a high-level synthesis framework for applying parallelizing compiler transformations," Proceedings of 16th International Conference on VLSI Design, 2003, pp. 461-466, 2003

[5] J. Cong, F. Yiping, H. Guoling, J. Wei, and Z. Zhiru, "Platform-Based Behavior-Level and System-Level Synthesis," in IEEE International SOC Conference, 2006, pp. 199-202, 2006

[6] Ya-Shu Chen, Shih-Chun Chou, Chi-Sheng Shih and Tei-Wei Kuo, "MFASE: Multiple Functions SoCs Analysis Environment," in the VLSI Design/CAD Symposium, Taiwan, August 2007, 2007

Page 58: Defense

58

NCKU, Low power, high performance VLSI design lab

Thank you