Top Banner
Design Flows and Tools Peter A. Beerel University of Southern California USC Asynchronous CAD/VLSI Group (async.usc.edu)
30

Design Flows and Tools

Feb 24, 2016

Download

Documents

nerita

Design Flows and Tools. Peter A. Beerel University of Southern California USC Asynchronous CAD/VLSI Group (async.usc.edu). Part II - Agenda. Design Flows Design via decomposition Modeling design using System Verilog Design Automation – The Proteus-A flow Legacy RTL - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Design Flows and Tools

Design Flows and Tools

Peter A. BeerelUniversity of Southern California

USC Asynchronous CAD/VLSI Group (async.usc.edu)

Page 2: Design Flows and Tools

Part II - Agenda

Design Flows

• Design via decomposition

• Modeling design using System Verilog

Design Automation – The Proteus-A flow

• Legacy RTL

• Added System Verilog CSP front-end

• Asynchronous optimizations

Final Flow Considerations• Analog Verification• Design for Test and Debug

Page 3: Design Flows and Tools

Design via Process Decomposition

• Collection of Processes linked by Channels

• Channels pass messages with guaranteed delivery

• Processes synchronize • Processes can be decomposed into smaller processes

Page 4: Design Flows and Tools

Modeling Asynchronous Design viaSystemVerilogCSP (SVC)

• SystemVerilog interface abstracts channel wires as well as communication protocol

• Send/Receive• Blocking tasks (Flow control)

module Sender (interface R); parameter WIDTH = 8; logic [WIDTH-1:0] data; always begin //produce data R.Send(data);endendmodule

module Receiver (interface L); parameter WIDTH = 8; logic [WIDTH-1:0] data; always begin L.Receive(data); //consume data endendmodule

Abstract communicati

on

Sender ReceiverSVC Interface

Page 5: Design Flows and Tools

SVC - Waveform view

Receiver pending on

Receive

Sender performs

Send,Communicatio

n happens

No one is Sending or Receiving

Sender pending on

Send

Receiver performs Receive,

Communication happens

//Sender (DataGen)always begin #Delay; R.Send(data);End

//Receiveralways begin L.Receive(data); #FL; R.Send(data); #BL; end

Page 6: Design Flows and Tools

Part II - Agenda

Design Flows

• Design via decomposition

• Modeling design using System Verilog

Design Automation – The Proteus-A flow

• Legacy RTL

• Added System Verilog CSP front-end

• Asynchronous optimizations

Final Flow Considerations• Analog Verification• Design for Test and Debug

Page 7: Design Flows and Tools

ConstraintsSync Library

Clock Gating

Clock Tree SynthesisNetlist

Clock Gating

The Proteus-A Flow – Legacy RTL

Synthesis

Physical Design

Synth RTL

Netlist

Netlist

Constraints

Constraints

Final Layout

Proteus/Sync

LibraryClockFree

Image Netlist

Design Goals

Async Netlist

Key Features• Re-uses synchronous EDA tools• Seamless integration into existing

flows• Back-end design style agnostic• Up to 2X higher performance

Tool Status• Commercialized version in

production for 2+ years • Uses proprietary QDI library• Academic version (Proteus-A)

enhanced significantly at USC

Recent Advances• Power optimization algorithms

Page 8: Design Flows and Tools

Synth. RTLFlow Demo – Legacy RTL

Legacy RTL Specification

SynthesisClockfreePhysical Design

Final Layout

Asynchronous Gate-level Netlist Synthesized Image Netlist

Page 9: Design Flows and Tools

• Download from http://opencores.com/project,amber

• ARM-compatible 32-bit RISC processor• 3 stages : FETCH, DECODE and EXECUTE

Amber23 – Proteus-A Case Study

CacheBus interface

DecodeState machine

Register bankBarrel shifter

ALUMultiplexer

instructioncontrolCache

Bus interface

Read data

Address, write dataZhang, USC Summer Research, 2012

Page 10: Design Flows and Tools

• Download from http://opencores.com/project,amber

• ARM-compatible 32-bit RISC processor• 3 stages : FETCH, DECODE and EXECUTE

Amber23 – Performance Comparison

CacheBus interface

DecodeState machine

Register bankBarrel shifter

ALUMultiplexer

instructioncontrolCache

Bus interface

Read data

Address, write dataZhang, USC Summer Research, 2012

Page 11: Design Flows and Tools

ConstraintsSync Library

Clock Gating

Clock Tree SynthesisNetlist

Clock Gating

The Proteus-A Flow – SVCRTL

Synthesis

Physical Design

Verilog

Netlist

Netlist

Constraints

Constraints

Final Layout

Proteus/Sync

LibraryClockFree

System- Verilog

Image Netlist

SVC2RTLDesign Goals

Synth. RTL Constraints

Async Netlist

Key New Features

• Supports System Verilog CSP front-end• Enables user-defined conditional

communication• Saves power at architectural level

Tool Status• Proprietary version starting from CAST

developed at Fulcrum • System Verilog version subsequently

developed at USC • Used in current research at USC and

Technion and 40+ person async class

Page 12: Design Flows and Tools

Key to Low-Power - Conditional Communication

Conditional communication reduces token flow, saving power

• Traditionally - manually introduced via user-created decomposition

• Recent research - automatically introduced via Operand Isolation

DEM

UXA,B

op

Add/Sub

Mult

MU

X

+ +

DS R0

0 0

0

Saifhashemi, PATMOS 2012

Page 13: Design Flows and Tools

SVC2RTL – Enables User-Defined Conditional Communication

0

1

0

Not received

Dummy value

0

1

Not sent

Page 14: Design Flows and Tools

Part II - Agenda

Design Flows

• Design via decomposition

• Modeling design using System Verilog

Design Automation – The Proteus-A flow

• Legacy RTL

• Added System Verilog CSP front-end

• Asynchronous optimizations

Final Flow Considerations• Analog Verification• Design for Test and Debug

Page 15: Design Flows and Tools

Power Optimization Overview

• Conditioning• Automatically add conditional

communication• Reconditioning

• Optimize the existing conditionality

Page 16: Design Flows and Tools

Power Saving - The Opportunity

+

Unnecessary calculation

Page 17: Design Flows and Tools

Our Solution - Adding Isolation Cells• All inputs/outputs are unconditional

• Operand Isolation• And-based isolation

cells• Generated by

synchronous RTL synthesizer

• Does not prevent switching in asynchronous circuits

Isolation cells are not effective in asynchronous circuits

Page 18: Design Flows and Tools

Our Solution - Conditioning

&

+

0

0

+

No Activity

Page 19: Design Flows and Tools

Power Optimization Results

• Case study: 32-bit ALU placed and routed• Back annotated switching activity using a VCD file

• Results:• Isolating ADD and SUB are detrimental for rADD and rSUB > 0.2

• 53% power reduction when only isolating MUL (rf=0.25)

• Area cost of isolating MUL is about 4% and no performance penalty

Saifhashemi, Patmos 2012

Page 20: Design Flows and Tools

Power Savings – The Opportunity

0

1

0

1

0

0

0

Conditional communication is explicit and only at primary IO

Unnecessary activity

Unnecessary activity

Page 21: Design Flows and Tools

The Reconditioning Problem

Definition (The Reconditioning Problem): Rearrange location of RECEIVE and SEND cells to minimize Power consumption while preserving functional behavior.

Page 22: Design Flows and Tools

Power Results

0.25 0.5 0.750

10002000300040005000600070008000

Power Comparison: 32 bit

OriginalGreedy0MILP

Operational factor

Pow

er

0.25 0.5 0.750

50010001500200025003000350040004500

Power Comparison: 32 bit

OriginalGreedy0MILP

Operational factor

Pow

er

0.25 0.5 0.750

50010001500200025003000350040004500

Power Comparison: 32 bit

OriginalGreedy0MILP

Operational factor

Pow

er

RECON1:Dual-mode arithmetic

unit

RECON2:Conditional multiplier

ALU-OIALU after operand

isolationSaifhashemi, PhD Thesis, 2012

Page 23: Design Flows and Tools

Mode Based Conditional Slack Matching

DEM

UXA,B

op

MU

X

S R

S R

Add/Sub

Mult

Najibii,2012

Conditional Slack Matching Advantage – Conditional behavior yields less stalls and thus not as many pipeline buffers needed

• Previously ignored – conservatively modeled as unconditional

Page 24: Design Flows and Tools

Conditional Slack Matching - Results

Najibii,2012

33% less buffers on average

Page 25: Design Flows and Tools

Design Flow Demo

Synthesis

Physical Design

Constraints

Constraints

Final Layout

Proteus/Sync

LibraryClockFree

System- Verilog

Image Netlist

SVC2RTL

Design Goals

Synth. RTL Constraints

Async Netlist

Page 26: Design Flows and Tools

Agenda

Design Flows

• Design via decomposition

• Modeling design using System Verilog

Design Automation – The Proteus-A flow

• Legacy RTL

• Added System Verilog CSP front-end

• Asynchronous optimizations

Final Flow Considerations• Analog Verification• Design for Test and Debug

Page 27: Design Flows and Tools

Final Flow Considerations

Static Timing Analysis• Verify timing constraints and performance is a must• Trick traditional tools into working with asynchronous circuits

Analog Verification• Domino logic used in QDI flows sensitive to charge sharing• Asynchronous channels cannot tolerate cross-talk glitches• Special spiced-based tools developed

Asynchronous Scan• Asynchronous scan is a must but doable

Design for Silicon Debug• Chip deadlock is still difficult to debug

Page 28: Design Flows and Tools

Conclusions

The Asynchronous Design Flow/CAD Landscape• Synchronous design rigidity continues to hamper quality design• Asynchronous design offers solutions but has many design flow

challenges

Design Flow Requirements• Design flows must easily integrate into synchronous designs• Circuit quality must compete very well to warrant switching design styles

Our approach• Proteus provides a good design framework for automation of

both legacy RTL and SystemVerilog CSP• Final considerations of analog and timing verification, scan,

and debug should not be over looked

Page 29: Design Flows and Tools

Acknowledgements

Page 30: Design Flows and Tools

http://ee.usc.edu/async2013