Data-Flow Analysis Data-Flow Analysis in the Memory Management of in the Memory Management of Real-Time Multimedia Processing Real-Time Multimedia Processing Systems Systems Florin Balasa Florin Balasa University University of Illinois at Chicago of Illinois at Chicago
38
Embed
Data-Flow Analysis in the Memory Management of Real-Time Multimedia Processing Systems
Data-Flow Analysis in the Memory Management of Real-Time Multimedia Processing Systems. Florin Balasa University of Illinois at Chicago. Introduction. Real-time multimedia processing systems. (video and image processing, real-time 3D rendering, - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Data-Flow Analysis Data-Flow Analysis in the Memory Management of Real-Time in the Memory Management of Real-Time
Multimedia Processing SystemsMultimedia Processing Systems
Florin BalasaFlorin Balasa
UniversityUniversity of Illinois at Chicagoof Illinois at Chicago
Introduction
Real-time multimedia processing systemsReal-time multimedia processing systems
(video and image processing, real-time 3D rendering,(video and image processing, real-time 3D rendering, audio and speech coding, medical imaging, etc.)audio and speech coding, medical imaging, etc.)
A large part of power dissipation is due toA large part of power dissipation is due to
data transfer and data storagedata transfer and data storage
Fetching operands from an off-chip memory for additionFetching operands from an off-chip memory for addition consumes 33 times more power than the computation consumes 33 times more power than the computation
[ Catthoor 98 ][ Catthoor 98 ]
Area cost often largely dominated by memoriesArea cost often largely dominated by memories
Introduction
In the early years of high-level synthesisIn the early years of high-level synthesis
memory management tasks tackled at memory management tasks tackled at scalar levelscalar level
Algebraic techniquesAlgebraic techniques -- similar to those used -- similar to those used
in modern compilers -- in modern compilers -- allow to handle allow to handle
memory management atmemory management at non-scalar levelnon-scalar level
Requirement: addressing the entire class of affine specificationsRequirement: addressing the entire class of affine specifications
multidimensional signals with (complex) affine indexesmultidimensional signals with (complex) affine indexes loop nests having as boundaries affine iterator functionsloop nests having as boundaries affine iterator functions conditions – relational and / or logical operators of affine fct.conditions – relational and / or logical operators of affine fct.
Outline
Memory size computation using dataMemory size computation using data
based on data reuse analysisbased on data reuse analysis Data-flow driven data partitioningData-flow driven data partitioning
for on/off- chip memoriesfor on/off- chip memories ConclusionsConclusions
Computation of array reference size
for (i=0; i<=511; i++)for (i=0; i<=511; i++)
for (j=0; j<=511; j++)for (j=0; j<=511; j++)
for (k=0; k<=511; k++)for (k=0; k<=511; k++)
… … B [i+k] [j+k]B [i+k] [j+k] … …
How many memory locations are necessaryHow many memory locations are necessary to store the array references to store the array references A [2i+3j+1] [5i+j+2] [ 4i+6j+3] & B [i+k] [j+k]A [2i+3j+1] [5i+j+2] [ 4i+6j+3] & B [i+k] [j+k]
Memory size variation during the motion detection alg.Memory size variation during the motion detection alg.
Memory size computation
To handle high throughput applicationsTo handle high throughput applications
Extract the (largely hidden) parallelismExtract the (largely hidden) parallelism from the initially specified codefrom the initially specified code
Find the lowest degree of parallelismFind the lowest degree of parallelism to meet the throughput/hardware requirementsto meet the throughput/hardware requirements
Perform memory size computationPerform memory size computation for code with explicit parallelism instructionsfor code with explicit parallelism instructions
Hierarchical memory allocation
A large part of power dissipation A large part of power dissipation in data-dominated applications is due toin data-dominated applications is due to
data transfers and data storagedata transfers and data storage
Power cost reduction Power cost reduction memory hierarchymemory hierarchy
exploiting temporal locality exploiting temporal locality in the data accessesin the data accesses
Power dissipation Power dissipation = = f f ( ( memory size memory size , , access frequencyaccess frequency ) )
Hierarchical memory allocation
Power dissipation Power dissipation = = f f ( ( memory size memory size , , access freq.access freq. ) )
Lower power consumption by accessing from smaller memoriesLower power consumption by accessing from smaller memories
Higher power consumption due to additional transfersHigher power consumption due to additional transfers
Larger areaLarger area to store copies of datato store copies of data
additional area overheadadditional area overhead (addressing logic) (addressing logic)
Hierarchical memory allocation
Synthesis of multilevel memory architectureSynthesis of multilevel memory architectureoptimized for area and / or poweroptimized for area and / or power
subject to performance constraintssubject to performance constraints
1. Data reuse exploration1. Data reuse exploration
Which intermediate copies of data are necessaryWhich intermediate copies of data are necessary for accessing data in a power- and area- efficient way for accessing data in a power- and area- efficient way
Synthesis of multilevel memory architectureSynthesis of multilevel memory architectureoptimized for area and / or poweroptimized for area and / or power
subject to performance constraintssubject to performance constraints
1. Data reuse exploration1. Data reuse exploration
Array partitions to be considered as copy candidates: Array partitions to be considered as copy candidates:
the LBL’s from the recursive intersection of array refs. the LBL’s from the recursive intersection of array refs.
CostCost = = · · PPread / write read / write ( N( N bits bits , N , N words words , f , f read / write read / write ) )
+ + · · Area Area ( N( N bits bits , N , N words words , N, Nports ports , technology, technology ) )
Partitioning for on/off- chip memories
CPUCPU
CacheCacheDRAMDRAM
off-chip off-chip
SRAMSRAM on-chipon-chip
1 cycle1 cycle
1 cycle1 cycle
10-2010-20
cyclescycles
MemoryMemoryaddressaddressspacespace
Optimal data mapping to the SRAM / DRAM Optimal data mapping to the SRAM / DRAM to maximize the performance of the applicationto maximize the performance of the application
Partitioning for on/off- chip memories
Total conflict factorTotal conflict factor
Total number of array accessesTotal number of array accessesexposed to cache conflictsexposed to cache conflicts
The importance of mappingThe importance of mappingto the on-chip SRAMto the on-chip SRAM
Using the polyhedral data-dependence graphUsing the polyhedral data-dependence graph
Precise info about the relative lifetimes of the different parts of arraysPrecise info about the relative lifetimes of the different parts of arrays
Conclusions
Algebraic techniques are powerful non-scalar instrumentsAlgebraic techniques are powerful non-scalar instruments
in the memory management of multimedia signal processingin the memory management of multimedia signal processing
Data-dependence analysis at polyhedral levelData-dependence analysis at polyhedral level
useful in many memory management tasksuseful in many memory management tasks
memory size computation for behavioral specifications memory size computation for behavioral specifications