Top Banner
Embedded Systems Group (http://www.cse.iitd.ac.in/esproject) Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11, 2002 Ph.D. Research Plan Presentation Anup Gangwar
31

Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Mar 29, 2015

Download

Documents

Jazmine Gundry
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Embedded Systems Group(http://www.cse.iitd.ac.in/esproject)

Department of Computer Science & EngineeringIndian Institute of Technology Delhi

June 11, 2002

Ph.D. Research Plan Presentation

Anup Gangwar

Page 2: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 22Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Presentation Outline

Introduction and motivation

Specialization opportunities in VLIW processors

Methodology

Validation framework (supporting tools required)

Work plan

Status of work

References

Page 3: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 33Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Introduction

Why customize architectures? General purpose computing domain Vs embedded Customization leads to cheaper design solutions

Architectural choices for exploiting ILP Superscalar processors

Try to extract ILP at run time, so, complex hardware Limited clock speeds and high power dissipation Not suited for embedded type of applications

VLIW processors Compiler has lot of knowledge about hardware Compiler extracts ILP statically, so, simplified hardware Possible to attain higher clock speeds

Page 4: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 44Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Introduction - Problems with VLIW Processors

Complex compiler required for extracting ILP

Adequate hardware support needed for compiler controlled execution

Code size expansion due to explicit NOPs if, The application does not contain enough parallelism

The compiler is not able to extract parallelism from the application

Need for good instruction encoding and NOP compression schemes

Page 5: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 55Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Presentation Outline

Introduction and motivation

Specialization opportunities in VLIW processors

Methodology

Validation framework (supporting tools required)

Work plan

Status of work

References

Page 6: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 66Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Specialization Opportunities -> FUs

Page 7: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 77Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Specialization Opportunities -> FUs (contd...)

Functional Unit Types MISO or Multiple Input Single Output MIMO or Multiple Input Multiple Output MIMO with LD/ST or MIMOs with memory interaction Rigid or flexible I/O timeshapes

NAME Inputs and Sources Outputs and Dests. I/ O Policy

MISO Multiple (Regfile) Single (Regfile) Flexible or Rigid

MIMO Multiple (Regfile) Multiple (Regfile) Flexible or Rigid

MIMO withLD/ST

Multiple (Regfile orMem.)

Multiple (Regfile orMem.)

Flexible or Rigid for Reg.and block LD/ ST formem.

Page 8: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 88Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Specialization Opportunities -> Reg. File

Single register file organization doesn’t scale well Area grows as N3

Delay grows as N3/2

Power grows as N3

where N is the no. of Functional Units connected to the register file

Clustered VLIW architectures are the solution Each FU can read from/write to only a subset of

registers Data copying may increase execution latency Powerful application analysis required to overcome

above mentioned problems

Page 9: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 99Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Specialization Opportunities -> Reg. File (contd...)

A Clustered VLIW Architecture

Page 10: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 1010Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Specialization Opportunities -> Interconnect

Clustering FUs together requires deciding ICN between different clusters

between clusters and memory

Analysis of data access patterns required for evaluating cost-performance tradeoffs

Current ASIP vendors do not offer customizable interconnects

Page 11: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 1111Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Specialization Opportunities -> Encoding

Instruction encoding/decoding scheme affects Code size Object code compatibility Branch miss prediction penalty Hardware cost Address specification in code size

Each UniOp is equivalent to a RISC/CISC instruction

UniOp UniOp UniOp UniOp

MultiOp

Page 12: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 1212Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Specialization Opportunities -> Encoding (contd...)

ADD NOP FMUL NOP

IALU.0 IALU.1 FALU.0 BU.0

NOPs in a MultiOp

VLIW Processor Pipeline with Instruction Decompressor

Page 13: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 1313Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Specialization Opportunities -> Summary

Page 14: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 1414Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Presentation Outline

Introduction and motivation

Specialization opportunities in VLIW processors

Methodology

Validation framework (supporting tools required)

Work plan

Status of work

References

Page 15: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 1515Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Existing Methodologies -> Simulation Driven

Page 16: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 1616Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

VLIW ASIP Synthesis Methodology

Task Set and Constraints

ArchitectureDescription

Architecture Design Space ExplorationApplication Parameter

Extraction

Retargetable Compiler

Instruction Encoding Specialization

Validation(Simulation with encoded instructions)

Architecture Description(Output to synthesizer)

Page 17: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 1717Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Presentation Outline

Introduction and motivation

Specialization opportunities in VLIW processors

Methodology

Validation framework (supporting tools required)

Work plan

Status of work

References

Page 18: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 1818Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Validation Framework -> TrimaranC Program

IMPACT

SIMULATOR Generator

ELCOR

Bridge Code

ELCOR IR

HMDES Machine Description

Generated Simulator(Statistics)

•ANSI C Parsing•Code profiling•Classical machine independent optimizations•Block formation

•Machine dependent

code optimizations

•Code scheduling

•Register allocation•ELCOR IR to low level C files•HPL-PD virtual machine•Cache simulation•Performance statistics

•Compute and stall cycles•Cache stats•Spill code info

Page 19: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 1919Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Validation Framework -> Trimaran (contd...)

Code Processor

Native Compiler

REBEL

HMDES

Low level C files C libraries Emulation Library

Executable for the host platform

Page 20: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 2020Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Validation Framework -> Retargetable Assembler

Instruction Encoding Description

Toolkit Generator

Generated AssemblerAssembly Instructions

Object Code

To Simulator(for simulation with encoded instructions)

Page 21: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 2121Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Presentation Outline

Introduction and motivation

Specialization opportunities in VLIW processors

Methodology

Validation framework (supporting tools required)

Work plan

Status of work

References

Page 22: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 2222Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Work Plan -> Interconnect/RF/FU Specialization

Initially model the interconnect problem as ILP and later on move to other solutions

Code selection problem in compilers is similar to identifying compute intensive parts for AFUs

No. and type of FUs has not been properly explored

RF clustering problem has not been dealt with elsewhere

Jacome et. al. Deal with Interconnect/RF/FU specialization

simultaneously Operation chaining is not considered

Page 23: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 2323Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Work Plan -> Encoding/Decoding Specialization

Goal is to be able to generate encoding schemes automatically

Work of Shail Aditya et. al. Basically a parameterized encoding scheme Techniques especially for HPL-PD architecture Do not talk of dynamic code size minimization Encoding template is fixed exploration limited only to within

the template design space

Various encoding templates need to be explored, also the template itself may be derived from application

Dynamic code size minimization needs to be considered

Page 24: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 2424Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Presentation Outline

Introduction and motivation

Specialization opportunities in VLIW processors

Methodology

Validation framework (supporting tools required)

Work plan

Status of work

References

Page 25: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 2525Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Work Status -> Specialized FUs in Trimaran

Modeling MISOs Model as external function calls Replace in Trimaran bridge code and replace with AFU op Model new AFU in MDES with the required ops Introduce the semantics in simulator op definitions file

Modeling MIMOs Model as external function calls returning voids Replace in Trimaran bridge code and replace with AFU op Explicitly reserve registers in C-code for returning values Introduce operation semantics in simulator op definition

file

Page 26: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 2626Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Work Status -> Specialized FUs in Trimaran (contd...)

Modeling MIMOs with LD/ST Model as regular MIMOs Memory interaction with block LD/ST at beginning and

end of execute cycles

Additionally Possible to impose register file constraints Various I/O timeshapes, rigid or flexible Possible to introduce pipelined functional units

Page 27: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 2727Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Work Status -> Instruction Enc. in Trimaran

Page 28: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 2828Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Work Status -> Instruction Enc. in Trimaran (contd...)

New Jersey Machine Code Toolkit (NJMC) Deals with bits at symbolic level Can be used to write assemblers/disassemblers Specification in SLED (Specification Language for

Encoding/Decoding)

Model instruction decompressor in HMDES Instrument ELCOR to generate assembly code Encoding is done using procedures generated by

NJMC Problems with NJMC

VLIW instruction need to be broken up into 32 bit tokens Encoded instructions must end on 8 bit boundary

Page 29: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 2929Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Work Status -> Code Gen. for Clustered ASIPs

ELCOR Disadvantages

ELCOR is heavily oriented towards HPL-PD architecture Does not support clustered VLIW architecture

Advantages Strong optimizing compiler Rich library to deal with the IR

IMPACT compiler system offers another choice for building a backend

Feasibility study being carried out to fix a particular direction of work

Page 30: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 3030Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

Presentation Outline

Introduction and motivation

Specialization opportunities in VLIW processors

Methodology

Validation framework (supporting tools required)

Work plan

Status of work

References

Page 31: Embedded Systems Group ( Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11,

Slide Slide 3131Research Plan Presentation, June 11, 2002Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esprojecthttp://www.cse.iitd.ac.in/esproject

References

Bhuvan Middha, Varun Raj, Anup Gangwar, M. Balakrishnan, Anshul Kumar and Paolo Ienne, “A Trimaran based framework for exploring design space of VLIW ASIPs with coarse grain FUs”, ISSS-2002.

Anup Gangwar, M. Balakrishnan and Anshul Kumar, “A framework for studying the effect of VLIW processor instruction encoding and decoding schemes”, Mini Project, Dept. of CSE.

M. Jacome and G. de. Veciana, “Design challenges for new application specific processors”, IEEE Design and Test of Computers-2000.

B. Ramakrishna Rau and Michael S. Schlansker, “Embedded computer architecture and automation”, IEEE Computer-2001

Michael S. Schlansker and B. Ramakrishna Rau, “EPIC: An architecture for instruction-level parallel processors”, HPCA-2000.

N. G. Busa, A. van der Werf and M. Bekooij, “Scheduling coarse grain operations for VLIW processors”, ASPDAC-1998.

Shail Aditya, Scott A. Mahlke and B. Ramakrishna Rau, “Code size minimization and retargetable assembly for custom EPIC and VLIW processors”, ISSS-1999.