The Memory/Logic Interface The Memory/Logic Interface in FPGA’s with Large Embedd in FPGA’s with Large Embedd ed Memory Arrays ed Memory Arrays Steven J. E. Wilton, Member, IEEE, Jonathan Rose, Me mber, IEEE, and Zvonko G. Vranesic, Senior Member, IEEE Laboratory of Reliable Computing Department of Electrical Engineerin g National Tsing Hua University Hsinchu, Taiwan
35
Embed
The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays Steven J.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The Memory/Logic Interface in FPThe Memory/Logic Interface in FPGA’s with Large Embedded MemGA’s with Large Embedded Mem
ory Arraysory Arrays
Steven J. E. Wilton, Member, IEEE, Jonathan Rose, Member, IEEE,
and Zvonko G. Vranesic, Senior Member, IEEE
Laboratory of Reliable ComputingDepartment of Electrical EngineeringNational Tsing Hua UniversityHsinchu, Taiwan
ReferenceReference S. J. E. Wilton, “Architectures and algorithms f
or field-programmable gate arrays with embedded memory,” Ph.D. dissertation, Dept. Elect. Comput. Eng., Univ. Toronto, Toronto, Ont., Canada, 1997.
OutlineOutline Introduction
Baseline architecture
Experiment methodology and result
Enhanced architecture and its improvement
IntroductionIntroduction In the past, FPGA’s have been primarily used t
o implement small logic subcircuits As the capacities of FPGA’s grow, they will be
use to implement much larger circuits than ever before
In order to address the storage requirement of large system, FPGA with large embedded memory arrays are now developed by many vendors
IntroductionIntroduction One of the challenges when embedding memory
arrays into FPGA is to provide enough interconnect between memory arrays and logic resources
Benchmark Circuit GenerationBenchmark Circuit Generation Need to generate benchmark circuit for the arc
hitecture because Typical circuits have only a few memories each To gather hundreds of those is not feasible
The solution is to study the types of memory configuration found in systems, and develop a stochastic memory configuration generator Make sure they are realistic by some circuit analysis
Interconnect patterns Point to point patterns Shared-connection patterns Point to point with no shuffling patterns
Memory ConfigurationsMemory Configurations 171 circuits with total of 268 user memories, th
ey are from Recent conference proceeding Recent journal articles Local designer Customer study conducted by Atera
Memory ConfigurationsMemory Configurations
Logic Memory ClusteringLogic Memory Clustering
Interconnect PatternsInterconnect Patterns
Stochastic Circuit GenerationStochastic Circuit Generation A stochastic circuit generator is developed using the st
atistics gathered during circuit analysis The steps of generating a benchmark circuit
Choosing logical memory configuration Division logical memories into cluster Choosing interconnect pattern for each cluster Choosing number of data-in data-out subcircuits for the c
lusters Generate logic subcircuits and connect them to memory a
rrays
Implementation ToolImplementation Tool Each benchmark circuit generated is
“implemented” in each FPGA Logical to physical mapping Placement
Place memory and logic blocks simultaneously Routing
Initially nets to memory have higher priority Between each iteration the nets are reordered Repeat 10 times Increase W Determine the minimum value of W
Memory/Logic Flexibility ResultMemory/Logic Flexibility Result
Memory/Logic Flexibility ResultMemory/Logic Flexibility Result
Area ResultArea Result The area of the FPGA is the sum of
Logic blocks Memory blocks Routing resources
Programmable switch Programming bits Metal routing segments
Area ResultArea Result
Delay ResultDelay Result A delayed model is used to measure the memory
read time of all memories in the circuit CACTI: to estimate array access time Elmore: address in and data out
Delay ResultDelay Result
IssuesIssues Nets connect more than one memory block to
one or more than one logic block When combining the small memory arrays to
implement a large one When data in pins of several user memories are
driven by a common data bus
Such nets often appear but unfortunately they are hard to route, especially for larger architecture
We can use higher value of Fm for larger architecture or?
Further InvestigationFurther Investigation
Enhanced ArchitectureEnhanced Architecture The above motivates them to study memory to
memory connection more closely An enhanced architecture
Adding extra switches between memory arrays to support these nets
Result Extra switches take up negligible area Improvement in both speed and routability
Enhanced ArchitectureEnhanced Architecture
Baseline ArchitectureBaseline Architecture
Enhanced ArchitectureEnhanced Architecture
Evaluation of Enhanced ArchitectureEvaluation of Enhanced Architecture Maze routing algorithm must be restricted such
that it uses memory-to-memory switches only to implement memory-to-memory connection
If the maze router is not modified…
Routing Result Using Standard MazeRouting Result Using Standard Maze
Routing Result Using Standard MazeRouting Result Using Standard Maze
Modified MazeModified Maze Even though some tracks will be wasted if a
circuit contains no or few memory-to-memory connections, it alleviates the problem above
Area ResultArea Result
Area ResultArea Result
Delay ResultDelay Result
ConclusionConclusion Even with this relatively unaggressive use of the
memory-to-memory switches, area is improved somewhat and speed is improved significantly
The development of algorithms that use these tracks more aggressively is left as future work
The enhanced architecture reduces the channel width by 0.5~1 tracks, and improved the speed by 25%