The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays Steven J.

Post on 21-Dec-2015

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

The Memory/Logic Interface in FPThe Memory/Logic Interface in FPGA’s with Large Embedded MemGA’s with Large Embedded Mem

ory Arraysory Arrays

Steven J. E. Wilton, Member, IEEE, Jonathan Rose, Member, IEEE,

and Zvonko G. Vranesic, Senior Member, IEEE

Laboratory of Reliable ComputingDepartment of Electrical EngineeringNational Tsing Hua UniversityHsinchu, Taiwan

ReferenceReference S. J. E. Wilton, “Architectures and algorithms f

or field-programmable gate arrays with embedded memory,” Ph.D. dissertation, Dept. Elect. Comput. Eng., Univ. Toronto, Toronto, Ont., Canada, 1997.

OutlineOutline Introduction

Baseline architecture

Experiment methodology and result

Enhanced architecture and its improvement

IntroductionIntroduction In the past, FPGA’s have been primarily used t

o implement small logic subcircuits As the capacities of FPGA’s grow, they will be

use to implement much larger circuits than ever before

In order to address the storage requirement of large system, FPGA with large embedded memory arrays are now developed by many vendors

IntroductionIntroduction One of the challenges when embedding memory

arrays into FPGA is to provide enough interconnect between memory arrays and logic resources

Baseline ArchitectureBaseline Architecture

Memory/Logic Interconnect BlockMemory/Logic Interconnect Block

Benchmark Circuit GenerationBenchmark Circuit Generation Need to generate benchmark circuit for the arc

hitecture because Typical circuits have only a few memories each To gather hundreds of those is not feasible

The solution is to study the types of memory configuration found in systems, and develop a stochastic memory configuration generator Make sure they are realistic by some circuit analysis

Circuit Analysis Circuit Analysis Memory configuration

Logic memory clustering

Interconnect patterns Point to point patterns Shared-connection patterns Point to point with no shuffling patterns

Memory ConfigurationsMemory Configurations 171 circuits with total of 268 user memories, th

ey are from Recent conference proceeding Recent journal articles Local designer Customer study conducted by Atera

Memory ConfigurationsMemory Configurations

Logic Memory ClusteringLogic Memory Clustering

Interconnect PatternsInterconnect Patterns

Stochastic Circuit GenerationStochastic Circuit Generation A stochastic circuit generator is developed using the st

atistics gathered during circuit analysis The steps of generating a benchmark circuit

Choosing logical memory configuration Division logical memories into cluster Choosing interconnect pattern for each cluster Choosing number of data-in data-out subcircuits for the c

lusters Generate logic subcircuits and connect them to memory a

rrays

Implementation ToolImplementation Tool Each benchmark circuit generated is

“implemented” in each FPGA Logical to physical mapping Placement

Place memory and logic blocks simultaneously Routing

Initially nets to memory have higher priority Between each iteration the nets are reordered Repeat 10 times Increase W Determine the minimum value of W

Memory/Logic Flexibility ResultMemory/Logic Flexibility Result

Memory/Logic Flexibility ResultMemory/Logic Flexibility Result

Area ResultArea Result The area of the FPGA is the sum of

Logic blocks Memory blocks Routing resources

Programmable switch Programming bits Metal routing segments

Area ResultArea Result

Delay ResultDelay Result A delayed model is used to measure the memory

read time of all memories in the circuit CACTI: to estimate array access time Elmore: address in and data out

Delay ResultDelay Result

IssuesIssues Nets connect more than one memory block to

one or more than one logic block When combining the small memory arrays to

implement a large one When data in pins of several user memories are

driven by a common data bus

Such nets often appear but unfortunately they are hard to route, especially for larger architecture

We can use higher value of Fm for larger architecture or?

Further InvestigationFurther Investigation

Enhanced ArchitectureEnhanced Architecture The above motivates them to study memory to

memory connection more closely An enhanced architecture

Adding extra switches between memory arrays to support these nets

Result Extra switches take up negligible area Improvement in both speed and routability

Enhanced ArchitectureEnhanced Architecture

Baseline ArchitectureBaseline Architecture

Enhanced ArchitectureEnhanced Architecture

Evaluation of Enhanced ArchitectureEvaluation of Enhanced Architecture Maze routing algorithm must be restricted such

that it uses memory-to-memory switches only to implement memory-to-memory connection

If the maze router is not modified…

Routing Result Using Standard MazeRouting Result Using Standard Maze

Routing Result Using Standard MazeRouting Result Using Standard Maze

Modified MazeModified Maze Even though some tracks will be wasted if a

circuit contains no or few memory-to-memory connections, it alleviates the problem above

Area ResultArea Result

Area ResultArea Result

Delay ResultDelay Result

ConclusionConclusion Even with this relatively unaggressive use of the

memory-to-memory switches, area is improved somewhat and speed is improved significantly

The development of algorithms that use these tracks more aggressively is left as future work

The enhanced architecture reduces the channel width by 0.5~1 tracks, and improved the speed by 25%

top related