Scratchpad-oriented address generation for low-power embedded VLIW processors Guillermo Talavera Velilla Departament de Microelectrònica i Sistemes Electrònics Universitat Autònoma de Barcelona Thesis supervisor: Jordi Carrabina Ph.D. Defense Presentation October 15th, 2009
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Scratchpad-oriented address generation for low-power embedded
VLIW processors
Guillermo Talavera Velilla
Departament de Microelectrònica i Sistemes Electrònics
Universitat Autònoma de Barcelona
Thesis supervisor: Jordi Carrabina
Ph.D. Defense PresentationOctober 15th, 2009
Embedded Processors Memories Optimization AGUs Optimization Conclusions ? 2/50
Embedded Processors Memories Optimization AGUs Optimization Conclusions ? 46/50
Let’s talk about…
… conclusions.
Conclusions
Embedded Processors Memories Optimization AGUs Optimization Conclusions ? 47/50
Thesis contributions (1)
• Address generation unit template for the embedded multimedia domain– Improvements between 12% and 35% on several
benchmarks and applications (cycles and energy)– Improvements on a real application (MPEG4) of
51% on energy consumption (respect the previous optimization step)
– Global improvements over 90% applying a complete optimization methodology
Conclusions
Embedded Processors Memories Optimization AGUs Optimization Conclusions ? 48/50
Thesis contributions (2)
• Quantitave comparison of different platforms commonly used in the embedded domain
• Systematic classification of address generators• Review of literature on address generation
optimization according to the classification • Introduction of AGU reconfigurable framework
results into the COFFEE framework• Application of a complete methodology to optimize
energy consumption on a real data-flow application including address generation steps.
Conclusions
Embedded Processors Memories Optimization AGUs Optimization Conclusions ? 49/50
Open issues:
• Support for more loops and control• Bit calculation• Merge of index expression• Extension to other benchmarks and
applications• Heterogenous distributed AGUs• Distributed loop buffers with different speeds• Complete DTSE optimization
Conclusions
Embedded Processors Memories Optimization AGUs Optimization Conclusions ? 50/50
Embedded Processors Memories Optimization AGUs Optimization Conclusions ? 51/50
Embedded Processors Memories Optimization AGUs Optimization Conclusions ? 52/50
End of presentation and open discussion
??
Embedded Processors Memories Optimization AGUs Optimization Conclusions ? 53/50
Embedded Processors Memories Optimization AGUs Optimization Conclusions ? 54/50
Publications
Journal papers:
• G. Talavera, M. Jayapala, J. Carrabina, and F. Catthoor, “Address generation optimization for embedded high-performance processors: A survey”, Journal of Signal Processing Systems for Signal Image and Video Technology (formerly the Journal of VLSI Signal Processing Systems for Signal Image and Video Technology), May 2008 (online) Decembre 2008 (printed version) 2008.
• G. Talavera, A. Portero, P. Raghavan, M. Jayapala, J. Carrabina, and F. Catthoor, “Power exploration and address generation optimization of multimedia applications on VLIW processors”, Planned for re-submission to the IEEE Transactions on Image Processing.
• A. Portero, G. Talavera, J. Carrabina, and F. Catthoor, “Methodology for multimedia applications in multiplatform implementation for energy-flexibility space exploration”, Planned for re-submission to the IEEE Transactions on Computers .
• A. Portero, G. Talavera, J. Carrabina, and F. Catthoor, “Data-dominant application implementation in multi-platform for energy-flexibility space exploration”, Planned for re-submission to the IEEE Transactions on Image Processing.
Embedded Processors Memories Optimization AGUs Optimization Conclusions ? 55/50
Conference papers
• A. Lambrecths, T. V. Aa, M. Jayapala, A. Leroy, G. Talavera, A. Shickova, F. Barat, F. Catthoor, D. Verkest, G. Deconinck, H. Corporaal, F. Robert, and J. C. Bordoll, “Design style case study for compute nodes of a heterogeneous NoC platform”, in 25th IEEE Real-Time Systems Symposium (RTSS), December 2004.
• G. Talavera, V. Nollet, J.-Y. Mignolet, D. Verkest, S. Vernalde, R. Lauwereins, and J. Carrabina, “Hardware-Software debugging techniques for reconfigurable Systems-on-Chip, International Conference on Industrial Technology, 2004. IEEE ICIT '04. vol. 3, Dec. 2004, pp. 1402- 1407 Vol. 3.
• G. Talavera, V. Nollet, J.-Y. Mignolet, D. Verkest, S. Vernalde, R. Lauwereins, and J. Carrabina, “Métodos de depuración HW-SW para sistemas on chip recongurables, in Jornadas Sobre Computación Recongurable y Aplicaciones (JCRA), Barcelona, Spain, Septembre 2004, pp. 251-258.
• A. Lambrechts, P. Raghavan, A. Leroy, G. Talavera, T. Vander Aa, M. Jayapala, F. Catthoor, D. Verkest, G. Deconinck, H. Corporaal, F. Robert, and J. Carrabina, “Power breakdown analysis for a heterogeneous NoC platform running a video application”, in IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP)), 2005. 16th , July 2005, pp. 179-184.
• A. Portero, G. Talavera, M. Monton, B. Martinez, and J. Carrabina, “NoC system for MPEG-4 SP using heterogeneous tiles” , in Design of Circuits and Integrated Systems (DCIS), San Diego, California, USA. December 2006.
• A. Portero, G. Talavera, M. Monton, B. Martinez, M. Moreno, F. Cathoor, and J. Carrabina, “Energy-aware mpeg-4 single profile in HW-SW multiplatform implementation”, in IEEE International SOC Conference, Austin, Texas, USA. Sept. 2006, pp. 13-16.
• A. Portero, G. Talavera, M. Monton, B. Martinez, F. Cathoor, and J. Carabina, “Dynamic voltage scaling for power efficient MPEG4-SP implementation”, in Proceedings of the IEEE 17th International Conference on Application-specific Systems, Architectures and Processors (ASAP). Washington, DC, USA: IEEE Computer Society, 2006, pp. 257-260.
• A. Portero, G. Talavera, F. Catthoor, and J. Carrabina, “A study of a MPEG-4 codec in a multiprocessor platform”, in IEEE International Symposium on Industrial Electronics (ISIE), 2006, vol. 1, July 2006, pp. 661-666.
Embedded Processors Memories Optimization AGUs Optimization Conclusions ? 56/50
Teaching publications
• G. Talavera, J. Saiz, and J. Carrabina., “Dispositivos y plataformas para docencia de informática y electrónica”, in Jornadas Sobre Computación Recongurable y Aplicaciones (JCRA), Barcelona, Spain, Septembre 2004, pp. 711-717.
• G. Talavera, B. Lorente, M. Monton, B. Martinez, J. Oliver, C. Ferrer, L. Ribas, J. Aguilo, and E. Valderrama, “Nuevas metodologías docentes y autoaprendizaje en la enseñanza técnica universitaria”, in Congreso Internacional de Docencia Universitaria e Innovación (CIDUI), Barcelona, Spain, 2006
• B. Lorente, G. Talavera, L. Ribas, and E. Valderrama, “Implantació d'una nova metodologia docent a les pràctiques de fonaments de computadors d'enginyeria informàtica”, in Congreso Internacional de Docencia Universitaria e Innovación (CIDUI), Barcelona, Spain, 2006.
• G. Talavera, X. Fitó, B. Lorente, A. Portero, M. Montón, B. Martínez, J. Oliver, C. Ferrer, L. Ribas, J. Aguiló, and E. Valderrama, “Adaptación metodológica a las nuevas directrices del EEES en la enseñanza técnica universitaria”, in Tecnologías Aplicadas a la Enseñanza de la Electrónica (TAEE), Madrid, Spain. 2006.
• A. Portero, J. Saiz, G. Talavera, R. Aragonés, M. Rullán, J. Aguiló, and E. Valderrama, “Aplicación del plan piloto en sistemas digitales en ingenier ía informática siguiendo las directivas del EEES”, in Tecnologías Aplicadas a la Enseñanza de la Electrónica. (TAEE), Madrid, Spain. 2006.
• G. Talavera, F. X. Fitó, B. Lorente, M. Montón, B. Martínez, C. Ferrer, and E. Valderrama, “Cas pràctic d'adaptació metodològica a les directrius EEES d'una assignatura d'enginyeria informàtica”, in III Jornada de Campus d'Innovació Docent. UAB, Barcelona. Spain. 20 Setembre de 2006. .
• E. Valderrama, G. Talavera, M. Montón, B. Martínez, J. M. Fernández, and J. Muñoz, “Comparación de dos metodologías docentes utilizadas en los seminarios de fundamentos de computadores”, in XIV Jornadas de Enseñanza Universitaria de la Informática (JENUI), 2008.
Embedded Processors Memories Optimization AGUs Optimization Conclusions ? 57/50
Results: Energy
MORE THAN 90%!!! respecte the first straight implementation
Embedded Processors Memories Optimization AGUs Optimization Conclusions ? 58/50
Reconfigurable AGU template
AGUs
Embedded Processors Memories Optimization AGUs Optimization Conclusions ? 59/50
The loop buffer operation: An Illustration
OP11for (..){ …
if (..) {.….} else {.….} …}
OP21 OP31 NOP
NOP OP22 OP32 BNZ ‘x’
OP12 NOP NOP BR ‘y’
OP13 NOP OP33 NOP
OP14 OP23 NOP BNZ ‘s’
S:
X:
Y:
LBON <offset>
if block
else block
Optimization
Embedded Processors Memories Optimization AGUs Optimization Conclusions ? 60/50
The loop buffer operation: An Illustration
OP11for (..){ …
if (..) {.….} else {.….} …}
OP21 OP31 NOP
NOP OP22 OP32 BNZ ‘x’
OP12 NOP NOP BR ‘y’
OP13 NOP OP33 NOP
OP14 OP23 NOP BNZ ‘s’
S:
X:
Y:
LBON <offset>
if block
else block
IROCSTART_ADDR
END_ADDR
IR_USE
NEW_PC
PC
FU1
OP11OP12OP13OP14
01-0112131
FU2
OP21OP22OP23
0111-0-021
FU3
OP31OP32OP33
0111-021-0
BR
BNZ ‘x’BR ‘y’
BNZ ‘s’
-00111-021
Optimization
Embedded Processors Memories Optimization AGUs Optimization Conclusions ? 61/50
The loop buffer operation: An Illustration
OP11for (..){ …
if (..) {.….} else {.….} …}
OP21 OP31 NOP
NOP OP22 OP32 BNZ ‘x’
OP12 NOP NOP BR ‘y’
OP13 NOP OP33 NOP
OP14 OP23 NOP BNZ ‘s’
S:
X:
Y:
LBON <offset>
if block
else block
IROCSTART_ADDR
END_ADDR
IR_USE
NEW_PC
PC
FU1
OP11OP12OP13OP14
01-0112131
FU2
OP21OP22OP23
0111-0-021
FU3
OP31OP32OP33
0111-021-0
BR
BNZ ‘x’BR ‘y’
BNZ ‘s’
-00111-021
Optimization
Embedded Processors Memories Optimization AGUs Optimization Conclusions ? 62/50
The loop buffer operation: An Illustration
OP11for (..){ …
if (..) {.….} else {.….} …}
OP21 OP31 NOP
NOP OP22 OP32 BNZ ‘x’
OP12 NOP NOP BR ‘y’
OP13 NOP OP33 NOP
OP14 OP23 NOP BNZ ‘s’
S:
X:
Y:
LBON <offset>
if block
else block
IROCSTART_ADDR
END_ADDR
IR_USEPC
NEW_PC
FU1
OP11OP12OP13OP14
01-0112131
FU2
OP21OP22OP23
0111-0-021
FU3
OP31OP32OP33
0111-021-0
BR
BNZ ‘x’BR ‘y’
BNZ ‘s’
-00111-021
Optimization
Embedded Processors Memories Optimization AGUs Optimization Conclusions ? 63/50
The loop buffer operation: An Illustration
OP11for (..){ …
if (..) {.….} else {.….} …}
OP21 OP31 NOP
NOP OP22 OP32 BNZ ‘x’
OP12 NOP NOP BR ‘y’
OP13 NOP OP33 NOP
OP14 OP23 NOP BNZ ‘s’
S:
X:
Y:
LBON <offset>
if block
else block
IROCSTART_ADDR
END_ADDR
IR_USEPC
NEW_PC
FU1
OP11OP12OP13OP14
01-0112131
FU2
OP21OP22OP23
0111-0-021
FU3
OP31OP32OP33
0111-021-0
BR
BNZ ‘x’BR ‘y’
BNZ ‘s’
-00111-021
Optimization
Embedded Processors Memories Optimization AGUs Optimization Conclusions ? 64/50
The loop buffer operation: An Illustration
OP11for (..){ …
if (..) {.….} else {.….} …}
OP21 OP31 NOP
NOP OP22 OP32 BNZ ‘x’
OP12 NOP NOP BR ‘y’
OP13 NOP OP33 NOP
OP14 OP23 NOP BNZ ‘s’
S:
X:
Y:
LBON <offset>
if block
else block
IROCSTART_ADDR
END_ADDR
IR_USEPC
NEW_PC
FU1
OP11OP12OP13OP14
01-0112131
FU2
OP21OP22OP23
0111-0-021
FU3
OP31OP32OP33
0111-021-0
BR
BNZ ‘x’BR ‘y’
BNZ ‘s’
-00111-021
Optimization
Embedded Processors Memories Optimization AGUs Optimization Conclusions ? 65/50