NCKU, Low power, high performance VLSI design lab Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms Advisor: Lih-Yih Chiou Student: H i-Ho Chen 23 June 2008
NCKU, Low power, high performance VLSI design lab
Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms
Advisor: Lih-Yih Chiou Student: Hi-Ho Chen
23 June 2008
2
NCKU, Low power, high performance VLSI design lab
Outline
Motivation and ContributionsPrevious WorksProposed Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms
RepresentationDesign Flow OverviewBlock Level
MethodologyTranslation
Platform LevelDevelop Library for CoWareSystem Control Generator
ExperimentsScalar 176*144DWT 44*36
Conclusions and Future worksReferences
3
NCKU, Low power, high performance VLSI design lab
Introduction
Entering SoC era, more and more IPs are integrated onto one single chip
ESL (Electronic System Level) design is proposed to rapidly allow designer to simulate the system function behavior at higher level before hardware implementation
Communication design has become one of the important criteria for SoC design
4
NCKU, Low power, high performance VLSI design lab
Top-down Design Flow
Product Requirements from customer
Specification Model
Architecture Model
Communication Model
Implement Model
Algorithm select Optimization
AllocationBehavior partitioning
scheduling
Protocol selectionChannel partitioning
arbitration
Cycle schedulingProtocol Scheduling
1
2
3
4
5[1]S. S. Pasricha, N. Dutt, and M. Ben-Romdhane, "Using TLM for exploring bus-based SoC communication architectures," 16th IEEE International Conference on Application-Specific Systems, Architecture Processors, 2005, pp. 79-85, 2005
5
NCKU, Low power, high performance VLSI design lab
Arbitration Level vs. Simulation Speed
[2]C. Lennard and D. Mista, "Taking Design to the System Level," 2006 [Online]. Available:(http://www.arm.com/pdfs/ARM_ESL_20_3_JC.pdf)
6
NCKU, Low power, high performance VLSI design lab
High Level Synthesis
Behavior Synthesis
Separate the Control and Data path from the behavior description
Control
If then else
Switch case
Data PathData flow
x=a+b;c=a<b;If(c){
d=c-f;}
Else{ g=h+I;
}J=d*g;L=e+x;
x=a+b;c=a<b;
c
d = e-f; g =h+i;
j = d*g;l=e+x;
control
Memory
MUX MUX MUX
x ALU
MUX
Data pathControl
[3]SPARK. Methodology, http://mesl.ucsd.edu/spark/methodology.shtml
7
NCKU, Low power, high performance VLSI design lab
Contributions
Rapid system explorationFast exploration of multiple micro-architecture alternatives
Shorter verification/simulation cycleSpeed up with behavior-level to transaction level
Quickly obtain the power and performance informationEarlier estimation of design specifications
Increase the performance Reduce the communication & computation
8
NCKU, Low power, high performance VLSI design lab
Outline
Motivation and ContributionsPrevious WorksProposed Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms
RepresentationDesign Flow OverviewBlock Level
MethodologyTranslation
Platform LevelDevelop Library for CoWareSystem Control Generator
ExperimentsScalar 176*144DWT 44*36
Conclusions and Future worksReferences
9
NCKU, Low power, high performance VLSI design lab
Previous Works - SPARK(1)
Input : C
Output C VHDL
Advantages :They define a new synthesis tool for parallel design
Disadvantages :No platform architecture
No communication issue
[4]SPARK:A High-Level Synthesis Frame work For Applying Parallelizing Compiler Transformations VLSI Design, 2003. Proceedings. 16th International Conference on 4-8 Jan. 2003 Page(s):461 – 466
Phase 1
Phase 2
Phase 3
10
NCKU, Low power, high performance VLSI design lab
Previous Works - xPilot(2)
Input: c/SystemC
Output: Verilog/SystemC
MethodPhase 1
SSDM
Phase 2Synthesis
Advantages:Directly mapping to FPGA
Quick Verification
Disadvantages:No communication issue
[5]“Platform-Based Behavior-Level and System-Level Synthesis“International SOC Conference, 2006 IEEE Sept. 2006 Page(s):199 – 202
Phase 1
Phase 2
11
NCKU, Low power, high performance VLSI design lab
Previous Works - MFASE(3)
MFASE:(Multiple Functions SoCs Analysis Environment)
Design Flow HW/SW Partition.
Architecture mapping. communication analysis.
…..Advantage
HW/SW co-design
Limitation IP Data Base
[6]MFASE: Multiple Functions SoCs Analysis Environment the VLSI Desing/CAD Symposium, Taiwan, Augest 2007
12
NCKU, Low power, high performance VLSI design lab
Summary
Previous worksSynthesis tool
SPARK & xPilot Synthesis from hardware C code to RTL Verilog code
SPARK & xPilot did not consider communication issue
MFASE did not mention about how to generate automatically
ThesisBuilding a automation tool from Functional Level to Transaction Level for virtual Bus-based Platform
Computation & Communication issues
Automation tool from Behavior Level to Transaction Level
13
NCKU, Low power, high performance VLSI design lab
Outline
Motivation and ContributionsPrevious WorksProposed Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms
RepresentationDesign Flow OverviewBlock Level
MethodologyTranslation
Platform LevelDevelop Library for CoWareSystem Control Generator
ExperimentsScalar 176*144DWT 44*36
Conclusions and Future worksReferences
14
NCKU, Low power, high performance VLSI design lab
Representation
Example C to CDFG
Example for “If the else”
Example for
“for loop”
Condition
END
If Body
ElseBody
TrueFalseif(a==0){b=c+d;}else{b=c-d}
a==0
END
b=c-d
TrueFalse c d
+
b
for(i=0;i<5;i++){c=a+b;}
i=0
i<5
Body
i=i+1
END
True
False
a b
+
C
Initial
Condition
Body
Update
END
True
False
15
NCKU, Low power, high performance VLSI design lab
Outline
Motivation and ContributionsPrevious worksProposed Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms
RepresentationDesign Flow OverviewBlock Level
MethodologyTranslation
Platform LevelDevelop Library for CoWareSystem Control Generator
ExperimentsScalar 176*144DWT 44*36
Conclusions and Future worksReferences
16
NCKU, Low power, high performance VLSI design lab
Design Flow Overview 1/2
Profiling & Analysis
Translation
SystemC.cpp
SystemC.h
Until all Spec C have been translate
Spec C to CDFGTranslation
Spec C Spec CSpec C
Link Port Setting
tcl Transaltion
Wrapper Library
Simulation on Coware CoWare
Library
ConnectsConnects
PMUGenerator
CTLGenerator
Link Port
Approximated-TimeSimulation
Platform Level using simple Bus
Platform Level using
CoWare
Block Level
17
NCKU, Low power, high performance VLSI design lab
Design Flow Overview 2/2
Block LevelMethodology
Parallel
Cascade (Multi cycle)
TranslationState & Edge Reduction
STG to SystemC generator
Platform Level using Simple BusApproximate time simulation
Platform Level using CoWare*.tcl generator
Peripheral generator
18
NCKU, Low power, high performance VLSI design lab
Outline
Motivation and ContributionsPrevious WorksProposed Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms
RepresentationDesign Flow OverviewBlock Level
MethodologyTranslation
Platform LevelDevelop Library for CoWareSystem Control Generator
ExperimentsScalar 176*144DWT 44*36
Conclusions and Future worksReferences
19
NCKU, Low power, high performance VLSI design lab
Block Level
Input
Functional Level CDFG
Block Inside Configuration
Max Parallel deep
Buffer Size
Boundary Case
Block to Bus Configuration
Max Burst size
Initial Address
Address offset
Output
TLM SystemC
CDFG
CDFG Analysis
Power Lib
Parallel analysis
W r a p p e r
L i b
Boundary analysis
Block synthesis&& interface synthesis
Synthesisconfigure
Irregularity analysis
State && Edge Reduction
Performance Estimation
For loop condition
No
Yes
Approximate time Cycle time
Implement SelectState Reduction
Parallel&
internalCommunication
analysis
20
NCKU, Low power, high performance VLSI design lab
Outline
Motivation and ContributionsPrevious worksProposed Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms
RepresentationDesign FlowBlock Level
MethodologyTranslation
Platform LevelDevelop librarySystem Control generator
ExperimentScalar 176*144DWT 44*36
Conclusions and Future worksReferences
21
NCKU, Low power, high performance VLSI design lab
ForBegin
ForBegin
Body(0,4)
Body(0,6)
Body(1,4)
Body(1,6)
ForEnd
ForEnd
Cycle1 2 3 4 5 6 7 8
Body(0,3)
Body(0,5)
Body(1,3)
Body(1,5)
ForBegin
ForBegin
Body(0,3)
Body(0,4)
Body(0,5)
Body(0,6)
Body(1,3)
Body(1,4)
ForEnd
ForEnd
Cycle1 2 3 4 5 6 7 8 9 10
Body(1,5)
Body(1,6)
11 12
Block Level - Methodology1/10
Computation Reduction
Parallel analysisStep 1: C to CDFG format
Step 2 : un-rolling the “for loop” to know the cycle counts
Step 3 : find the Solution to fit the “for loop” condition
Under Hardware constrain
GCD Methodology
Step 4: We will find the closed solution based on the Hardware condition
Step 5: update CDFG
for(j=0;j<2;j++){ for(i=3;i<7;i++){ b[j][i] = (a[j][i]+a[j][i+1])>>1; }}
22
NCKU, Low power, high performance VLSI design lab
Assume a[6][8]
Address
a[0][3] = address 12a[0][4] = address 16a[0][5] = address 20
a[1][3] = address 44a[1][4] = address 48a[1][5] = address 52
Memory
Addr 12
Addr 44
Block Level – Methodology 2/10
Communication factorsWe assume the array will be located in the external MemoryHow can we get data from external memory?Bus Transform
SingleBurst
Buffer Size requirementParallel & size of data transformation will influence the performance and power
Mem 1
Bus
A[j][i]
IBuff OBuff
Mem 2
Read Write
Burst
New Transform
23
NCKU, Low power, high performance VLSI design lab
Block Level - Methodology3/10
Communication Reduction
Case 1 ForBegin
ForBegin
Body(0,4)
Body(0,6)
Body(1,4)
Body(1,6)
ForEnd
ForEnd
Cycle1 2 3 4 5 6 7 8
Body(0,3)
Body(0,5)
Body(1,3)
Body(1,5) Case 2 For
BeginFor
Begin
Body(0,4)
Body(0,6)
Body(1,4)
Body(1,6)
ForEnd
ForEnd
Cycle1 2 3 4 5 6 7 8
Body(0,3)
Body(0,5)
Body(1,3)
Body(1,5)
Case 3 ForBegin
ForBegin
Body(0,4)
Body(0,6)
Body(1,4)
Body(1,6)
ForEnd
ForEnd
Cycle1 2 3 4 5 6 7 8
Body(0,3)
Body(0,5)
Body(1,3)
Body(1,5) Case 4 For
BeginFor
Begin
Body(0,4)
Body(0,6)
Body(1,4)
Body(1,6)
ForEnd
ForEnd
Cycle1 2 3 4 5 6 7 8
Body(0,3)
Body(0,5)
Body(1,3)
Body(1,5)
24
NCKU, Low power, high performance VLSI design lab
T(1) WR
B(3) B( 2)
T(2) WR
B(3) B( 2)
T(3) WR
B(3) B( 2)
T(4) WR
B(3) B( 2)
b[0][3] = (a[0][3]+a[0][4])>>1;
b[0][4] =
(a[0][4]+a[0][5])>>1;
b[0][5] = (a[0][5]+a[0][6])>>1;
b[0][6] =
(a[0][6]+a[0][7])>>1;
b[1][3] = (a[1][3]+a[1][4])>>1;
b[1][4] =
(a[1][4]+a[1][5])>>1;
b[1][5] = (a[1][5]+a[1][6])>>1;
b[1][6] =
(a[1][6]+a[1][7])>>1;
Block Level - Methodology4/10
Case 1:
parallel deep 2operator 1 cycle
Irregularity: 1Buss Access times: Read : 4 Write : 4
Max Buffer Size usage :3
B(): Burst sizeT(): Transaction numberR: Read from busW: Write to bus
ForBegin
ForBegin
For For For For
ForEnd
ForEnd
Cycle1 2 3 4 5 6 7 8
For For For For
Case 1
25
NCKU, Low power, high performance VLSI design lab
Block Level - Methodology5/10
Case 2 :
parallel deep 2 operator 2 cycles
Irregularity : 1Bus Access times: Read 2: Write 2Max Buffer Size usage :5
b[0][3] = (a[0][3]+a[0][4])>>1;
b[0][4] =
(a[0][4]+a[0][5])>>1;
b[0][5] = (a[0][5]+a[0][6])>>1;
b[0][6] =
(a[0][6]+a[0][7])>>1;
b[1][3] = (a[1][3]+a[1][4])>>1;
b[1][4] =
(a[1][4]+a[1][5])>>1;
b[1][5] = (a[1][5]+a[1][6])>>1;
b[1][6] =
(a[1][6]+a[1][7])>>1;
T(1)R
B(5)
T(2) W
B( 4)
R
B(5)
T(4) W
B( 4)
T(3)
B(): Burst sizeT(): Transaction numberR: Read from busW: Write to bus
ForBegin
ForBegin
For For For For
ForEnd
ForEnd
Cycle1 2 3 4 5 6 7 8
For For For For
Case 2
26
NCKU, Low power, high performance VLSI design lab
Block Level - Methodology6/10
Case 3:
parallel deep 2operator 3 cycles
Irregularity : 2Bus Access times: Read 3: Write 3Max Buffer Size usage :8
T(1)R
B(3)
T(2) W
B( 2)
R
B(3)
W
B( 2)
T(3)R
B(5)
T(3) W
B( 4)
b[0][3] = (a[0][3]+a[0][4])>>1;
b[0][4] =
(a[0][4]+a[0][5])>>1;
b[0][5] = (a[0][5]+a[0][6])>>1;
b[0][6] =
(a[0][6]+a[0][7])>>1;
b[1][3] = (a[1][3]+a[1][4])>>1;
b[1][4] =
(a[1][4]+a[1][5])>>1;
b[1][5] = (a[1][5]+a[1][6])>>1;
b[1][6] =
(a[1][6]+a[1][7])>>1;
B(): Burst sizeT(): Transaction numberR: Read from busW: Write to bus
ForBegin
ForBegin
For For For For
ForEnd
ForEnd
Cycle1 2 3 4 5 6 7 8
For For For For
Case 3
27
NCKU, Low power, high performance VLSI design lab
for(j=0; j<2; j++){ for(i=3; i<7; i++){ b[j][i] = (a[j][i]+a[j][i+1])>>1; }}
boundary
Memory
ADDR relation
New Transform
j
i
Block Level - Methodology7/10
Boundary case
Limitation: high address relation
Relation with the Memory location
28
NCKU, Low power, high performance VLSI design lab
Block Level - Methodology8/10
Case 4:
parallel deep 2operator 4 cycles
Irregularity :1Bus Access times: Read 2: Write 2Max Buffer Size usage :10
b[0][3] = (a[0][3]+a[0][4])>>1;
b[0][4] =
(a[0][4]+a[0][5])>>1;
b[0][5] = (a[0][5]+a[0][6])>>1;
b[0][6] =
(a[0][6]+a[0][7])>>1;
b[1][3] = (a[1][3]+a[1][4])>>1;
b[1][4] =
(a[1][4]+a[1][5])>>1;
b[1][5] = (a[1][5]+a[1][6])>>1;
b[1][6] =
(a[1][6]+a[1][7])>>1;
B(): Burst sizeT(): Transaction numberR: Read from busW: Write to bus
ForBegin
ForBegin
For For For For
ForEnd
ForEnd
Cycle1 2 3 4 5 6 7 8
For For For For
Case 4
T(1)R
B(5)
T(2) W
B( 4)
T(4)R
B(5)
T(3) W
B( 4)
29
NCKU, Low power, high performance VLSI design lab
Block Level - Methodology9/10
Which case is better for implement?Problem
Case 1single operator cycleBus Access times
Case 3Control is so complexityNo considering the Boundary case
Case 4Buffer size
We choose “Case 2” to implementUnder Boundary case conditionUnder Buffer size constrainBus Access issueregular
Case 1 Case 2 Case 3 Case 4
Irregularity 1 1 2 1
Boundary Case
O O X O
Max Buffer size
3 5 8 10
Read Bus Access times
4 2 3 2
Write Bus Access times
4 2 3 2
30
NCKU, Low power, high performance VLSI design lab
Boundary condition Cycle 8Parallel deep 2
O(2) O(3) O(4)O(1)
B(5)
S(4) W(2)R(2)
Ir(1) Ir(1) Ir(2) Ir(1)
B(8)B(5)
S(4) W(4)R(4)
Block Level - Methodology10/10
Under condition Parallel deepBoundary Case
AnalysisStep 1: Trace states by operator cycles
Step 2: separate the Read and Write part,find the period
Step 3: estimation the cycles and hardware cost
Step 4: find the best solution
O(): operator cyclesB(): buffer sizeR(): Read countsW(): Write countsS(): state sizesIr(): Irregularity
case1 case2 case3 case4
31
NCKU, Low power, high performance VLSI design lab
Outline
Motivation and ContributionsPrevious WorksProposed Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms
RepresentationDesign Flow OverviewBlock Level
MethodologyTranslation
Platform LevelDevelop Library for CoWareSystem Control Generator
ExperimentsScalar 176*144DWT 44*36
Conclusions and Future worksReferences
32
NCKU, Low power, high performance VLSI design lab
Translation 1/3
Example for CDFG to state transaction graph (STG)Fit to time step
Easily to FSM Generator
If Then Else
Begin
If Body
Else Body
If Then Else END
a==0
END
b=c-d
False
b=c+d
a==0
a!=0
Cycle 1
Cycle 2
Cycle 3
b=c+d
b=c-d
a==0
a!=0
i=0
i<5
c=a+b
i=i+1
END
True
False
for Begin for Body for End
True
False
i=0c=a+bi=i+1
i<5
i>=5
Cycle 3
Cycle 2
Cycle 1
Example for ”If then else”
Example for ”for loop”
33
NCKU, Low power, high performance VLSI design lab
Translation 2/3
Step 1
CDFG to STG
Un-rolling “for loop” condition
Step 2
Methodology
Reduce Computation
Parallel
Reduce Communication
Cascade
Architecture definition
Step 3
Translate to TLM SystemCHeader
Function
for begin for end
for begin Body for end
ForBegin
ForBegin
For For For For
ForEnd
ForEnd
Cycle1 2 3 4 5 6 7 8
For For For For
Case 2
for Body
RD_REQ READ EXE Write WT_REQ
!Grant
Grant
!Read done
Read done
!Writedone
Writedone
!Grant
Grant
34
NCKU, Low power, high performance VLSI design lab
Translation 3/3
Block Level
Interface
Block to Wrapper
Block to Block
Control
FSM
Data path
Operator assignment
Control signalsBlock to Wrapper
Block to Data path
Block to Buffer
DataPath(1)
DataPath(2)
Input Buffer
CTL
Output BufferWrapper
Bus
Wrapper Interface
Block Interface
35
NCKU, Low power, high performance VLSI design lab
Outline
Motivation and ContributionsPrevious WorksProposed Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms
RepresentationDesign Flow OverviewBlock Level
MethodologyTranslation
Platform LevelDevelop Library for CoWareSystem Control Generator
ExperimentsScalar 176*144DWT 44*36
Conclusions and Future worksReferences
36
NCKU, Low power, high performance VLSI design lab
Platform Level
Input :Port mapping
Library location
CoWare setting
Output*.tcl for CoWare based
Communication GeneratorSystem Control
Wrapper
Mux
PMU
Interrupt
Communication generator
Communication &&Wire
configure
Peripheral Generator
Platform Level
SystemControl
Generator
WrapperGenerator
PMUGenerator
MuxGenerator
InterruptGenerator
Platform Generator
37
NCKU, Low power, high performance VLSI design lab
Outline
Motivation and ContributionsPrevious WorksProposed Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms
RepresentationDesign Flow OverviewBlock Level
MethodologyTranslation
Platform LevelDevelop Library for CoWareSystem Control Generator
ExperimentsScalar 176*144DWT 44*36
Conclusions and Future worksReferences
38
NCKU, Low power, high performance VLSI design lab
Develop Library for CoWare 1/3
Master Wrapper Generator
Base on CoWare API fo
r AMBA AHB
Advantage
Support any burst type
Burst Lock
LimitationBuffer size
IP_Dout
IP_OutValid
IP_OuMemType
IP_OuMemReq
IP_OuMemRNW
IP_OuMemCounts
IP_OuMemAccess
IP_OuMemAddr
finish3
WR_InValid
WR_Din
WR_RelReq
Data_Out_Done
Synchronizer
InBuffer
OutBuffer
Synchronizer
Input Handshake
Output Handshake
FSM
From BusFrom IP
Clk Rst
Generated Blocks
StartMaster
AHB Bus
AHBInitiator_inoutmaster_port
Wrapper
39
NCKU, Low power, high performance VLSI design lab
Develop Library for CoWare 2/3
PMU Generator
Input :Configure
Block Num: Default 3
Idle cycle: Default 1000
Wake Up cycle: Default 1000
Policy: fixed-time out Policy
Output : SystemC
Start
PMU_Out_Ready_0
PMU_Out_Idle_0
PMU_In_PB_CL_0
PMU_In_CL_0
PMU_out_CLkg_0
Clk Rst
FSM
Register
Signal Detect
PMU
LUT
40
NCKU, Low power, high performance VLSI design lab
900 ns 950ns
time
Freq = 70 MHz
1* *Active ActiveN PFreq1
* *Idle IdleN PFreq
• Known parameters
• Total simulation time
• Operation frequency
• Active duration
• Total active number
ACT Energy
Idle Energy
Total energy = (ACT Energy + Idle Energy)
Power= (ACT Energy + Idle Energy)/total time
idleN Number of Idle counts
IdleP Idle power/unit time
ActiveP Active power/unit time
ActiveN Number of Active counts
Power Calculation
Develop Library for CoWare 3/3
41
NCKU, Low power, high performance VLSI design lab
Outline
Motivation and ContributionsPrevious WorksProposed Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms
RepresentationDesign Flow OverviewBlock Level
MethodologyTranslation
Platform LevelDevelop Library for CoWareSystem Control Generator
ExperimentsScalar 176*144DWT 44*36
Conclusions and Future worksReferences
42
NCKU, Low power, high performance VLSI design lab
System Control Generator
TOP Control Generator
InputBlock scheduling
Block numbers
Type settingParallel
Pipeline
Single (Default)
OutputSystemC Start
CtlDone1
Clk Rst
CtlEnable1
ToFinishFSM
Synchronizer
Synchronizer
CTL
Block 1
Block 2
Block 3
Enable Block 1
Enable Block 2
Enable Block 3
43
NCKU, Low power, high performance VLSI design lab
Outline
Motivation and ContributionsPrevious WorksProposed Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms
RepresentationDesign Flow OverviewBlock Level
MethodologyTranslation
Platform LevelDevelop Library for CoWareSystem Control Generator
ExperimentsScalar 176*144DWT 44*36
Conclusions and Future worksReferences
44
NCKU, Low power, high performance VLSI design lab
CoWare - ScalarSequence : Foreman, Football(30 frames)
45
NCKU, Low power, high performance VLSI design lab
Simple Bus Environment - Scalar
Y Cb Cr
Mem 1 Mem 2
Arbiter
CPU
Simple Bus
SystemC 2.1 Simple bus Read Transfer
Write Transfer
46
NCKU, Low power, high performance VLSI design lab
CoWare Environment -Scalar
Top Platform for scalar application
CTL
Y
Cb
Cr
PMU
Mux
Interrupt
Wrapper
Step 1Step2
Step3
Step4 Step 5
47
NCKU, Low power, high performance VLSI design lab
Experiments – Scalar
Performance with app-time and cycle time
Scalar performance && State size in Cycle time base
scalar Y part Cb part Cr part
cycle cycle cycle
Approximate time 239761 91775 91775
Cycle time 325296 100638 100638
Scalar Y part
Parallel constrain 4
Maxcascade
STSize
BusAccess
ComputationCycle
Communication Cycle
Code Line
Original C code 0 0 0 126720 0 23
case 2 4 78 9916 31680 115118 1403
case 3 11 81 1724 31680 33388 1668
48
NCKU, Low power, high performance VLSI design lab
Experiments – Power Monitor
Power Library
Method Search the Look up table
Block -> Module
FSM switch
InBuffer
OutBuffer
Register
Block ->Data PathOperator
Data Path
Size Active power Idle power
ADD 8 1.0444 mw 23.4124 nw
SUB 8 808.2718 uW 21.3216 nW
DIV 8 4.0100 mW 67.5333 nW
SHR 8 425.8246 uW 9.9244 nW
width Power power Idle power
FSM 6 0.418mw 12 nw
Buffer 32 1.7346mw 66 nw
Register 32 1.7346mw 66 nw
49
NCKU, Low power, high performance VLSI design lab
Experiments - Scalar
Scalar176*144 Power saving
Case ActiveCycle
Wake upCycle
SleepCycle
Power mw
Scalar YNO PMU 526572 X X 22584673.08
WITH PMU 326296 1000 199276 14038522.54
Scalar CbNO PMU 526572 X X 11000089.08
WITH PMU 101638 1000 423934 2124065.68
Scalar CrNO PMU 526572 X X 11000089.08
WITH PMU 101638 1000 423934 2124065.68
No PMU with PMU Power Saving Rate
Scalar 44584851.24mw 18286653.9mw 58.98%
50
NCKU, Low power, high performance VLSI design lab
DWT && IDWT
Experiments - DWT
DWT
IDWT
51
NCKU, Low power, high performance VLSI design lab
Experiments - DWT
Top Platform for DWT application
Step 1Step 2
Step 3
Step 4
52
NCKU, Low power, high performance VLSI design lab
Experiments - DWT
Performance with app-time and cycle time
DWT performance && State size in Cycle time base
DWT cycle
Approximate time 11088
Cycle time 76262
DWT
Parallel constrain 1
Maxcascade
STSize
BusAccess
ComputationCycle
Communication Cycle
CodeLine
Original C code 0 0 0 1584 0 46
case 1 1 42 9504 1584 74678 8630
53
NCKU, Low power, high performance VLSI design lab
Experiments - DWT
DWT 44*36 Power saving
ActiveCycle
Interrupt Cycle
SleepCycle
Power mw
DWT
NO PMU 145962 X X 4066501.32
WITHPMU
76362 1000 68600 2155442.52
IDWT
NO PMU 145962 X X 4415350.53
WITHPMU
69600 1000 75362 2105550.72
NO PMU With PMU Power Saving Rate
DWT 8481851.85mw 4260993.24mw 49.765%
54
NCKU, Low power, high performance VLSI design lab
Outline
Motivation and ContributionsPrevious worksProposed Design Automation Tool from Behavior Level to Transaction Level for Virtual Bus-Based Platforms
RepresentationDesign FlowBlock Level
MethodologyTranslation
Platform LevelDevelop librarySystem Control generator
ExperimentsScalar 176*144DWT 44*36
Conclusions and Future WorksReferences
55
NCKU, Low power, high performance VLSI design lab
We develop a Automation tool from behavior level CDFG to TLM level SystemC for virtual bus based platform design
We have also incorporated some method to reduce the Bus Access times for the system design at the Architecture level profiling
We develop some library for virtual bus based platform
We can fast explore the Architecture to reduce the verification time
Conclusions
56
NCKU, Low power, high performance VLSI design lab
Future Works
Model each module’s power using equations so that a more accurate power management could be carried out
Adding a test platform into the tool so that the corresponding test circuitry could be generated automatically
Including more hardware architectures to extend the Hardware Library so that designer can have more design options to choose
57
NCKU, Low power, high performance VLSI design lab
References
[1]S. S. Pasricha, N. Dutt, and M. Ben-Romdhane, "Using TLM for exploring bus-based SoC communication architectures," 16th IEEE International Conference on Application-Specific Systems, Architecture Processors, 2005, pp. 79-85, 2005
[2]C. Lennard and D. Mista, "Taking Design to the System Level," 2006 [Online]. Available:(http://www.arm.com/pdfs/ARM_ESL_20_3_JC.pdf)
[3] SPARK Methodology, (http://mesl.ucsd.edu/spark/methodology.shtml)
[4] S. Gupta, S. Gupta, N. Dutt, R. Gupta, and A. Nicolau, "SPARK: a high-level synthesis framework for applying parallelizing compiler transformations," Proceedings of 16th International Conference on VLSI Design, 2003, pp. 461-466, 2003
[5] J. Cong, F. Yiping, H. Guoling, J. Wei, and Z. Zhiru, "Platform-Based Behavior-Level and System-Level Synthesis," in IEEE International SOC Conference, 2006, pp. 199-202, 2006
[6] Ya-Shu Chen, Shih-Chun Chou, Chi-Sheng Shih and Tei-Wei Kuo, "MFASE: Multiple Functions SoCs Analysis Environment," in the VLSI Design/CAD Symposium, Taiwan, August 2007, 2007
58
NCKU, Low power, high performance VLSI design lab
Thank you