Top Banner
North Dakota State University April Mariam Hoseini Mariam Hoseini Advisor: Dr. Chao You Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Supervisor: Dr. Mark Pavicic Committee members: Committee members: Dr. Rajendra Katti, Dr. Subbaraya Yuvarajan, Dr. Rajendra Katti, Dr. Subbaraya Yuvarajan, Dr. Deying Li Dr. Deying Li
42

North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

Jan 12, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

North Dakota State University April 2009

Mariam HoseiniMariam Hoseini

Advisor: Dr. Chao YouAdvisor: Dr. Chao You

Supervisor: Dr. Mark PavicicSupervisor: Dr. Mark Pavicic

Committee members: Committee members: Dr. Rajendra Katti, Dr. Subbaraya Yuvarajan, Dr. Deying LiDr. Rajendra Katti, Dr. Subbaraya Yuvarajan, Dr. Deying Li

Page 2: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

North Dakota State University 2

Page 3: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

North Dakota State University 3

Page 4: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

North Dakota State University 4

Page 5: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

North Dakota State University 5

Page 6: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

• Two major styles of circuit design: Synchronous & Asynchronous

• Advantages of asynchronous design, in terms of:– Clock skew– Speed– Meta-stability– Modularity – Power

• Disadvantages of asynchronous design:– More difficult to design for a hazard free behavior and a correct ordering

of operations.– Additional hardware to initiate, advance, and indicate the completion of

operations.

• Asynchronous systems are specified by handshake protocol, data encoding, underlying delay model.

North Dakota State University 6

Page 7: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

• Handshaking is the alternate for clocking in asynchronous systems.

• Data transfer between two processes is synchronized with signals that are generated by the same processes.

• Asynchronous operation can also be done without handshaking.– Handshaking is used to separate successive uses of a component.

– It may not be necessary to separate the use of a component or the separation can be done by delaying the operations.

• Handshaking can be done at higher levels in an asynchronous system.

North Dakota State University 7

Page 8: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

• Bundled data:

– Normal Boolean levels encodes data values

– Separate request and acknowledge wires are used

• Dual rail:

– Two wires are used to carry a single bit

– Request wire is encoded in dual rail data wires

– Dual rail data encoding is used in PCC cell design North Dakota State University 8

Dual rail encoding Meaning

00 No data

01 0

10 1

11 Forbidden

Page 9: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

• Pulse Signaling:– Each request and acknowledge is a pulse – Simple and small cycle like transition signaling– Dealing with levels like level signaling – Better noise immunity than single-track signaling– Potential problem: robustness of sending pulses over long wires.

– Pulse signaling is used in PCC cell design & there is no problem of long wires.

North Dakota State University 9

Request

Acknowledge

startevent

eventdone

One cycle

Page 10: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

North Dakota State University 10

• Bounded delay Simplest model Delays of circuit elements and wires are assumed to be known or bounded.

• Delay insensitive (DI) Both gates and wires have unbounded and unknown delays. Completion detection mechanism is needed at receiver

• Quasi delay insensitive (QDI) DI + Isochronic forks = QDI Isochronic forks are capable of indication All input transitions should be indicated by an output signal transition

A

C

Bd2

d3d1

Page 11: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

• In an asynchronous systems, interfaces and inside modules can be designed with different timing models

• In the PCC cell design, for timing management:– Internal of a cell is governed by a bounded delay model

– Communications between the cells is done by a QDI model

North Dakota State University 11

Page 12: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

• In synchronous systems, Boolean circuits can be constructed from a primitive like a NAND-gate

• Logic gates provide only logic functionality, not timing functionality, so not sufficient to make asynchronous circuits

• Asynchronous systems can be made from a set of primitives

• The set of primitives must provide both universal logic and timing functionalities

• Different sets of primitives have been introduced, such as Keller’s, Patra’s, Lee’s, and etc

North Dakota State University 12

Page 13: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

The set of primitives used in a PCC cell: • Wire

– Transfers the output of a component to input of another one.

• Fork– The output of one component is the input to

several components

• Merge– Sends one of its input to the output

• Join– Data from several independent components are needed

to be synchronized.

North Dakota State University 13

I O

I1 I2

O

O1

O2

I

12

I1

O1

I1 I2

O

Page 14: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

• An array of cells each having a simple one-bit processing unit

• Von Neumann neighborhood for local connections

• A routing problem occurs:

• A possible solution:

North Dakota State University 14

Page 15: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

• Another approach is to combine every two to make a double cell– The same routing capability with fewer neighboring connections

• A further step is to group 4 cells together to make a quad cell– The same routing capability with simple connections to 4 nearest

neighbors

North Dakota State University 15

Page 16: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

• Logic Unit Design

• Synchronization

• Pulse Regenerator

• Top Level Design

• Configuration Circuitry

• PCC Cell Simulations

– One-bit full adder

– Ring oscillator

– Shift register

• Implementing Pipelines North Dakota State University 16

Page 17: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

• There is a logic unit (LU) and an output register in each quarter

• Each LU has two inputs and one output

North Dakota State University 17

Page 18: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

• Dual rail inputs

• Dual rail outputs

• Switches should be set before inputs arrival

• 8 switches to define a function

• 16 functions

• Avoids floating nodes by pull down resistors

North Dakota State University 18

Page 19: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

• AND function

• D, E , F, G are “0001”

North Dakota State University 19

A B Z

0 0 0

0 1 0

1 0 0

1 1 1

Page 20: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

• Wire one output pulse triggers the LU inputs of the neighbor cell in the same direction.

• Merge is realized by 2:1 Muxs, pulses do right turns (90 degree)

• Fork Each turn triggers a neighbor quarter and also a neighbor cell, – a single computation forks

into multiple parallel computations

North Dakota State University 20

Page 21: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

Join

• A completion detection circuitry

• All the participating quarters should have their LU outputs ready

• Complements a fork by combining multiple parallel computations into a single computation.

• QDI Communications

North Dakota State University 21

Page 22: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

• Fork1

– Only when a pulse turns

– LU should use only the turned pulse

• Fork2 & Fork4

– No timing assumptions

• Fork3 & Fork5

– Bounded delay model

North Dakota State University 22

Page 23: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

North Dakota State University 23

• When a pulse travels through many cells, the width of the pulse may increase or decrease

• Too short pulse may not be detectable at all, too long pulse may catch up other pulses

• A PRG produces an output pulse with a certain constant width, independent of the width of the input pulse.

• D1 is the delay by which the input pulse is stretched• D2 determines the width of the output pulse

D1D2

A

B

C

D

E

Page 24: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

North Dakota State University 24

Page 25: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

North Dakota State University 25

In a PCC cell : (W/L)p / (W/L)n ≈ 1.6

In an inverter:

Equivalent resistance of a MOS : (R≈ L/W)

• To match PMOS and NMOS resistances (W/L)p / (W/L)n = 3 ~ 3.5

tpHL = .69* Rn* CL & tpLH = .69* Rp* CL if Rn = Rp tpHL = tpLH

• A bigger PMOS improves the tpLH by increasing the charging current.

• A bigger PMOS degrades the tpHL by causing a larger parasitic capacitance.

• tp = (tpHL + tpLH)/2 is not minimal.

• The ratio for an optimal speed performance equals to √(Rp/Rn)

• The device can be speed up device by reducing the size of PMOS

Page 26: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

• Configuration bits (16 bits for LU switches, 8 bits for Merge MUXs & 4 bits for Join, i.e. total of 28 bits) should be loaded

• Only some parts of the array may need to be configured

• One solution is to make a long chain of shift

registers of all the cells & configure all of them

• A better solution is to form the chain of shift

registers only by the cells that are needed to be configured.

• In each cell, a controller:

– decides whether the cell is wanted to be configured or not

– directs the bit flow to one of the cell neighbors

– stops the shift registers whenever all the intended cells are configured

North Dakota State University 26

Page 27: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

North Dakota State University 27

Shows that the shift register is filled

Shows that the cell is the last one in the chain of shift register

Determines that the cell should/should not be configured

Defines the neighbor to which the bits should be forwarded D

ecoder

Decoder

ORdata-N

00

01

10

11

Controller

clk-NDecoder clk-S

clk-E clk-W clk-N

OR clk-S clk-W

clk-E

data-N data-S data-W data-E

data-Wdata-Edata-S

Page 28: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

North Dakota State University 28

Page 29: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

• PCC cell was implemented in TSMC 250 nm CMOS using S-Edit.

• The simulation was done by Pspice

North Dakota State University 29

• The supply voltage is 5V

• Input pulse widths are 400ps

• Propagation delay through a cell is 480ps ~ 500ps.

• Better speed: Slope ≤ gate propagation delay

• Slope of the external inputs are 12ps.

• No overshoots and undershoots

Page 30: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

Voltage source =5V

Average current = 6 mA for 1.4 ns & 17 mA for 8.6 ns

For 20 pulses:

Energy = (5 * 6* 1.4) + (5 * 17 * 8.6) = 773 pJ

North Dakota State University 30

Page 31: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

North Dakota State University 31

Voltage source= 5 V Average current = 5 mA Energy = 5 * 5 *1.5 ns =37.5 pJ

Voltage source= 3.3 V Average current = 3 mA Energy = 3 * 3.3 *1.8 ns=17.8 pJ

For 1 pulse (1-bit of operation):

Page 32: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

• Sum = A B C 1 1 1= 1

• Carry= AB + BC + AC = AB + (A+B)C 1.1 + (1+1).1=1

• Sum & carry products are ready after 0.5ns & 1.8ns

North Dakota State University 32

Page 33: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

• Loops are important for many circuits such as sequential circuits, iterative computations and For, If, and While constructs

• The ring oscillator represents two capabilities of PCC cell:– A loop can be controlled externally (started & stopped)– Utilizing Join of pulses, communications can be QDI

North Dakota State University 33

0Start Pulse ‘0’0

Output is always a ‘1’

01

1

11

0

Page 34: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

North Dakota State University 34

• Ring oscillator implemented in an array of PCC cells

One

One

Nand

WR

XOR

One

WR

Pass

Pass

• ‘0’ pulses are shown in blue, ‘1’ pulses are shown in red

• The input Mux is configured to receive a ‘0’ pulse only from external of the 1st cell and a ‘1’ pulse only from a turn.

Page 35: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

Simulation

Results:

North Dakota State University 35

Page 36: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

An input bit stream of “1010” is used.

North Dakota State University 36

Cell 1

Cell 2

Cell 3

Cell 4

D1 x x x

D2 D1 x x

D3 D2 D1 x

D4 D3 D2 D1

Page 37: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

• If handshaking is done for every component, the components can form a pipeline.

• Each component should supply an Ack to indicate that it is available for re-use.

Delay(1) = 3X + (n-2)5X + 3X= (5n - 4)X

North Dakota State University 37

LU

LU

LU

LU

LU

LU

LU

LU

LU

Ack

Ack is received

Ack

Ack is received

Page 38: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

•  Some cells don’t handshake & they are cascaded. The cascaded cells form a unit of a pipeline. So, handshaking is done only at higher level.

Delay(2) = 3X + (n-2)2X + 3x= (2n +2)X

Delay(2)/Delay(1) = (2n + 2)X=(5n-4)X = 2/5 North Dakota State University 38

LU

LU

LU

LU

LU

LU

LU

AckAck

Ack is received

A unit of the pipeline A unit of the pipeline

Page 39: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

North Dakota State University 39

PCC Cell

Technology TSMC 250nm

Voltage Source 5V (3.3V)

Transistor Count 760

Propagation delay 500 ps (600 ps)

Minimum input pulse width 400 ps

Energy consumption for 1-bit operation

37.5 pJ (17.8 pJ)

Routing Capability Data can be routed in 4 directions

QDI Communications Yes, by perfuming Join

Implementing comb/seq circuits Yes

Controlling a loop externally Yes

Implementing pipelines Yes

Page 40: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

North Dakota State University 40

Page 41: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

North Dakota State University 41

Page 42: North Dakota State University April 2009 Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya.

North Dakota State University 42