Top Banner
A Synthesizable Datapath- Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British Columbia and Imperial College
34

A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Jan 18, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

A Synthesizable Datapath-Oriented Programmable Logic Core

Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton

University of British Columbia and Imperial College

Page 2: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Embedded Programmable Logic Cores

Embed a small amount of programmable logic onto an ASIC– Postpone some decisions until late in design cycle– Fast upgrade path for products– Embedded Debug:

Page 3: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Soft Programmable Logic Cores

RTL ofSoft PLC

RTL Simulation

Synthesis

Scan Insertion

Gate-Level Simulation

Floorplanning

Placement

Clock Tree Generation

Routing & TimingVerification

Physical Verification

0

Page 4: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Soft Programmable Logic Cores

Advantages – Easy to integrate, reduces design time– Very flexible, can create the exact required core– Easy to migrate to smaller technologies

Disadvantages– Inefficient compared to hard cores

Our thought– Makes sense if you only want a small core (a few hundred

gates)

Page 5: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

This talk:

A new architecture for a synthesizable programmable logic core that supports datapath (bus-based) circuits

Page 6: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Previous Synthesizable PLC’s

Kim Bozman and Noha Kafafi:

LUT-Based

Unique Directional Routing Fabric

OU

TP

UT

S

3-LUT

3-LUT

3-LUT3-LUT

3-LUT

3-LUT

3-LUT

3-LUTx3 x3

x3 x3

x3x3

INPUTS

x3

x3

3-LUTx3

V

All inputs are fed into multiplexer

x4

x4

x4

Page 7: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Synthesizable Cores

Observation 1: To make it truly synthesizable, must avoid

combinational loops in the unprogrammed fabric

Observation 2: Each tile need not be identical

Page 8: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Previous Synthesizable PLC’s

Andy Yan:

Product-term Based Logic Block

Unique Directional Routing Fabric

Supported Sequential Circuits

PTB

PTB

PTB

PTB

PTB

PTB

PTB

PTB

PTB

PTB

PTB

PTB

PTB

PTB

INPUTS OUTPUTS

Le

vel 1

Inte

rcon

ne

ct Sw

itch

Le

vel 2

Inte

rcon

ne

ctS

witch

Le

vel 3

Inte

rcon

ne

ct Sw

itch

Ou

tpu

tIn

terco

nn

ect S

witch

Page 9: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Our Architecture

Use it when the PLC is connected to a bus:

PLC

Bus Bus

Observation: These connections are permanently tied to the bus signals, and we know this when the ASIC is designed

Page 10: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Logic Architecture

bit0

bit1

bit2

bit3

bit N-1

4-LUT

4-LUT

reg

A

BC

Cin

Cout

k1

s

Wordblock Bitblock

Page 11: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Logic Architecture

Key point:

- All bitblocks within a wordblock share same set of configuration bits

- Means all bitblocks implement the same function

bit0

bit1

bit2

bit3

bit N-1

4-LUT

4-LUT

reg

A

BC

Cin

Cout

k1

s

Wordblock Bitblock

Page 12: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Routing Architecture

Key point: Signals are routed as buses

N

N

N

N

Bit N-1

Bit 2

Bit 1

Bit 0

N N N N

Page 13: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Routing Architecture

Key point: - Linear array of wordblocks

- Buses get wider as we go to the right

Bit N-1

Bit 2

Bit 1

Bit 0

Page 14: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Routing Architecture

Key point: - Linear array of wordblocks

- Buses get wider as we go to the right

Bit N-1

Bit 2

Bit 1

Bit 0

Bit N-1

Bit 2

Bit 1

Bit 0

Page 15: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Routing Architecture

Key point: - Linear array of wordblocks

- Number of buses goes up as we go to the right

Bit N-1

Bit 2

Bit 1

Bit 0

Bit N-1

Bit 2

Bit 1

Bit 0

Bit N-1

Bit 2

Bit 1

Bit 0

Page 16: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Datapath Architecture

Bit N-1

Bit 2

Bit 1

Bit 0

Bit N-1

Bit 2

Bit 1

Bit 0

Bit N-1

Bit 2

Bit 1

Bit 0

DQ DQ DQ

SH

IFT

SH

IFT

SH

IFT

Page 17: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Multipliers

Bit N-1

Bit 2

Bit 1

Bit 0

Bit N-1

Bit 2

Bit 1

Bit 0

DQ DQ DQ

SH

IFT

Multiply

SH

IFT

Multiply

Two inputs instead of three

Two output buses (MSB, LSB)

Page 18: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Add a Control Block

Control Block

Status Mux Control Mux

Wordblock 0

bit 0

bit 1

bit 2

bit N-1

control status

Q D

Wordblock 1

bit 0

bit 1

bit 2

bit N-1

control status

Wordblock D-1

bit 0

bit 1

bit 2

bit N-1

control status

Output Mux

Constant Registers

(C)

Input Buses (M)

Feedback Registers (F)

FeedbackMux

Output Buses

(R)

control

status

shift

er

shift

er

shift

er

Control block is based on P-term fine-grained synthesizable core

Page 19: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Example Mapping

Monitor two buses: - Count the number of times each bus matches a mask - includes don’t care bits - Count the number of times both buses match the mask at the same time

input businput bus

constantconstant

feedbackfeedbackfeedback

outp

ut b

uses

Q D

reset

Control Block

MA

SK

MA

SK

AD

D

AD

D

AD

D

Page 20: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Interesting Questions:

1. How do the various architectural parameters affect density?

2. How does this compare to a fine-grained architecture?

Page 21: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Architectural Parameters

D Number of Wordblocks (incl. multipliers)

N Bit Width

M Number of Input Buses

R Number of Output Buses

F Number of Feedback Paths

C Number of Constant Registers

A Number of Multipliers

P Number of Product-Term Blocks

Page 22: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Impact of Number of Word-blocks and bit-width

Key Result: Both bit-width and number of wordblocks have a significant impact on area.

0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1 2 3 4 5 6 7 8

Number of Wordblocks (D)

Cel

l Are

a (x

106 m

2 )

N=24

N=32

N=16

N=8

Page 23: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Impact of the Number of Multipliers

Key result: Area increase due to more buses in the routing

0.700.720.740.760.780.800.820.840.860.880.90

0 1 2 4 8 16 32

Number of Multipliers (A)

N=16, D=32

Cel

l Are

a (x

106 m

2 )

Page 24: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Impact of the Size of the Control Block

Key result: The control block can dominate if it becomes too big

Cel

l Are

a (x

106 m

2 )

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

4 6 9 12 16

Number of Product Term Blocks in the Control Block (P)

D=16

D=8

Page 25: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Bench- Datapath Fined-Grain ASIC Fine-Grain/Datapath/

Mark (ours) (PTerm) Datapath ASIC

fbly 68,190 132,339,335 9,300 1940 7.33

dotv3 34,119 65,534,780 6,575 1921 5.19

dscg 72,178 116,271,968 9,473 1611 7.62

fir4 76,213 130,971,120 9,843 1718 7.74

egcd 1,225,231 22,776,474 10,420 18.6 117

momul 294,135 11,448,589 7,097 38.9 41

median 142,172 10,733,962 4,420 75.5 32

debug1 87,265 1,302,928 3,484 14.9 25

Page 26: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Bench- Datapath Fined-Grain ASIC Fine-Grain/Datapath/

Mark (ours) (PTerm) Datapath ASIC

fbly 68,190 132,339,335 9,300 1940 7.33

dotv3 34,119 65,534,780 6,575 1921 5.19

dscg 72,178 116,271,968 9,473 1611 7.62

fir4 76,213 130,971,120 9,843 1718 7.74

egcd 1,225,231 22,776,474 10,420 18.6 117

momul 294,135 11,448,589 7,097 38.9 41

median 142,172 10,733,962 4,420 75.5 32

debug1 87,265 1,302,928 3,484 14.9 25

Key result 1: Significantly better than fine-grained architecture

Page 27: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Bench- Datapath Fined-Grain ASIC Fine-Grain/Datapath/

Mark (ours) (PTerm) Datapath ASIC

fbly 68,190 132,339,335 9,300 1940 7.33

dotv3 34,119 65,534,780 6,575 1921 5.19

dscg 72,178 116,271,968 9,473 1611 7.62

fir4 76,213 130,971,120 9,843 1718 7.74

egcd 1,225,231 22,776,474 10,420 18.6 117

momul 294,135 11,448,589 7,097 38.9 41

median 142,172 10,733,962 4,420 75.5 32

debug1 87,265 1,302,928 3,484 14.9 25

Key result 1: Significantly better than fine-grained architecture

Key result 2: Overhead roughly the same as FPGA/ASIC

Page 28: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

But these results aren’t fair:

- For each benchmark, we found the optimum set of

architectural parameters.

- We need an architecture that works for a variety of

circuits

Page 29: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Architecture Construction

Our thought:

- The number of inputs/outputs is fixed by the SoC

- The designer has an idea of the size of the programmable

logic (number of wordblocks)

Fix all other parameters (as a function of # of wordblocks)

- eg. fixed ratio between number of multipliers vs. wordblocks

fixed ratio between control logic and datapath logic, etc.

We arbitrarily chose fixed ratios based on our experience

- A full architecture study is left as future work!

Page 30: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Bench- Datapath Fined-Grain ASIC Fine-Grain/Datapath/

Mark (ours) (PTerm) Datapath ASIC

fbly 332,091 132,339,335 9,300 399 35.7

dotv3 225,518 65,534,780 6,575 291 34.3

dscg 325,029 116,271,968 9,473 358 34.3

fir4 307,154 130,971,120 9,843 426 31.2

egcd 3,778,611 22,776,474 10,420 6.02 363

momul 486,654 11,448,589 7,097 23.5 68.5

median 194,654 10,733,962 4,420 55.1 44

debug1 119,286 1,302,928 3,484 10.9 34

Page 31: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Bench- Datapath Fined-Grain ASIC Fine-Grain/Datapath/

Mark (ours) (PTerm) Datapath ASIC

fbly 332,091 132,339,335 9,300 399 35.7

dotv3 225,518 65,534,780 6,575 291 34.3

dscg 325,029 116,271,968 9,473 358 34.3

fir4 307,154 130,971,120 9,843 426 31.2

egcd 3,778,611 22,776,474 10,420 6.02 363

momul 486,654 11,448,589 7,097 23.5 68.5

median 194,654 10,733,962 4,420 55.1 44

debug1 119,286 1,302,928 3,484 10.9 34

Page 32: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Bench- Datapath Fined-Grain ASIC Fine-Grain/Datapath/

Mark (ours) (PTerm) Datapath ASIC

fbly 332,091 132,339,335 9,300 399 35.7

dotv3 225,518 65,534,780 6,575 291 34.3

dscg 325,029 116,271,968 9,473 358 34.3

fir4 307,154 130,971,120 9,843 426 31.2

egcd 3,778,611 22,776,474 10,420 6.02 363

momul 486,654 11,448,589 7,097 23.5 68.5

median 194,654 10,733,962 4,420 55.1 44

debug1 119,286 1,302,928 3,484 10.9 34

Key result 1: Significantly better than fine-grained architecture

Key result 2: Overhead roughly the same as FPGA/ASIC

Page 33: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

625m

625m

Page 34: A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.

Conclusions

Our architecture is 6 to 426 x more efficient than fine-grained architecture

But, this is only for datapath-oriented circuits.

However, this is ok:

- In an SoC, we know, when the chip is designed, whether

the inputs are buses or bits

- If there are buses, use this architecture

- If there are not buses, use Andy’s PTerm architecture

Final thought: using this architecture, the overhead is similar to

that of a normal FPGA. People already accept this!