Top Banner
1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc
48

1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

1

SHARC‘S’uper ‘H’arvard ‘ARC’hitecture

Nagendra Doddapaneni

ER

hit

HAR

ect

VARD

ure

SUP

Arc

Page 2: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

2

Overview

•Harvard

Architecture

•Super Harvard

Architecture

•TigerSHARC

processor

Page 3: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

3

Outline

• Background

• Harvard Architecture−Why?−What?

• Modern CPU Chip Design

• Super Harvard Architecture

• TigerSHARC Processor

Page 4: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

4

Outline

• Background <-

• Harvard Architecture−Why?−What?

• Modern CPU Chip Design

• Super Harvard Architecture

• TigerSHARC Processor

Page 5: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

5

Background

•von Neumann Architecture−Single storage for instructions and data

•Digital Signal Processors−Specialized microprocessor designed specifically for digital signal processing, generally in real time

Page 6: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

6

Outline

• Background

• Harvard Architecture−Why? <-−What?

• Modern CPU Chip Design

• Super Harvard Architecture

• TigerSHARC Processor

Page 7: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

7

Why Harvard Architecture ?

• von Neumann bottleneck

(‘memory bound’)

• DSP applications

• In von Neumann architecture−Either reading an instruction−Or reading/writing from/to memory

Page 8: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

8

Harvard Architecture (cont…)

Page 9: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

9

Outline

• Background

• Harvard Architecture−Why?−What? <-

• Modern CPU Chip Design

• Super Harvard Architecture

• TigerSHARC Processor

Page 10: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

10

What is Harvard Architecture ?

•Physically separate storage and signal pathways for instruction and data•Next instruction fetched, when executing current instruction•Program memory can be small and wide•Data memory can be large and narrower

Page 11: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

11

Outline

• Background

• Harvard Architecture−Why?−What?

• Modern CPU Chip Design <-

• Super Harvard Architecture

• TigerSHARC Processor

Page 12: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

12

Modern CPU chip design

• Incorporate features from both architectures• ‘On chip’ cache memory – divided into

instruction cache and data cache.

Harvard architecture used when CPU accesses cache memory.

• On a cache miss, ‘off chip’ main memory is accessed using von Neumann architecture.

Main memory is not separated into data and instruction sections.

Page 13: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

13

Outline

• Background

• Harvard Architecture−Why?−What?

• Modern CPU Chip Design

• Super Harvard Architecture <-

• TigerSHARC Processor

Page 14: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

14

Super Harvard Architecture

• Cache used to store instructions, leaving both instruction bus and data bus free to fetch operands

• Harvard Architecture + cache = Extended Harvard Architecture or Super Harvard Architecture

Page 15: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

15

Outline

• Background

• Harvard Architecture−Why?−What?

• Modern CPU Chip Design

• Super Harvard Architecture

• TigerSHARC Processor <-

Page 16: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

16

TigerSHARC Processor

• Processor Architecture• Instruction Parallelism and SIMD Operation• Integer ALU• Computational blocks

− X and Y Register File− X and Y ALU− Multiplier− Shifter− CLU

• Program Sequencer• I J and K buses• DMA Controller• Applications

Page 17: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

17

TigerSHARC Processor

• Processor Architecture <-• Instruction Parallelism and SIMD Operation• Integer ALU• Computational blocks

− X and Y Register File− X and Y ALU− Multiplier− Shifter− CLU

• Program Sequencer• I J and K buses• DMA Controller• Applications

Page 18: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

18

TigerSHARC Processor Architecture

•3 128-bit data

buses

•2 IALU’s

•2 Computational

Blocks− ALU ( Float and Integer )− SHIFTER− MULTIPLIER− CLU

Page 19: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

19

TigerSHARC Processor

• Processor Architecture• Instruction Parallelism and SIMD Operation <-• Integer ALU• Computational blocks

− X and Y Register File− X and Y ALU− Multiplier− Shifter− CLU

• Program Sequencer• I J and K buses• DMA Controller• Applications

Page 20: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

20

TigerSHARCInstruction Parallelism and SIMD

Operation

• Core can execute simultaneously one to four 32-bit instructions encoded in single instruction line (VLIW).

• Can execute in parallel? Depends on….− Instruction line resources each requires−Source and Destination of registers used

• Supports SIMD operations through the use of both Computational Blocks in parallel.

• Each Computational Block can execute four 16-bit or eith 8-bit SIMD computations in parallel.

Page 21: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

21

TigerSHARC Processor

• Processor Architecture• Instruction Parallelism and SIMD Operation• Integer ALU <-• Computational blocks

− X and Y Register File− X and Y ALU− Multiplier− Shifter− CLU

• Program Sequencer• I J and K buses• DMA Controller• Applications

Page 22: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

22

TigerSHARCInteger ALU

•31 32 bit general registers + 1 status register + 8 dedicated registers for circular buffers• Performs integer ALU operations and data addressing• ALU instructions: ADD, SUB, ARS, LRS (right shifts only), ROT (left and right), AND NOT, NOT, OR, XOR, ABS, MIN, MAX, CMP• Status flags: zero (Z), negative (N), overflow (V), carry (C)• Instruction conditions: EQ, LT, LE, NEQ, NLT, NLE• Instruction options: unsigned (U), circular buffer (CB), bit reverse (BR), computed jump (CJMP)• Address related operations: data address generation, circular buffers, bit reverse, UREG moves, DAB control.

Page 23: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

23

TigerSHARC Processor

• Processor Architecture• Instruction Parallelism and SIMD Operation• Integer ALU• Computational blocks

− X and Y Register File <-− X and Y ALU− Multiplier− Shifter− CLU

• Program Sequencer• I J and K Buses• DMA Controller• Applications

Page 24: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

24

TigerSHARC Computational Blocks

X and Y Register File

•Register File Syntax−Each Block has 32x32 bit Data registers−Each register can store 4x8 bit, 2x16 bit or 1x32 bit words. −Registers can be combined into dual or quad groups. These groups can store 8, 16, 32, 40 or 64 bit words.

Page 25: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

25

TigerSHARC Computational Blocks

X and Y Register File•Register File Syntax

Page 26: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

26

Volatile registers in each block

• 24 Volatile Data registers in each block−XR0 – XR23−YR0 – YR23

• 2 ALU summation registers in each block−XPR0, XPR1, YPR0, YPR1

• 5 MAC accumulate registers in each block−XMR0 – XMR3, YMR0 – YMR3−XMR4, YMR4 – Overflow registers

Page 27: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

27

TigerSHARC Processor

• Processor Architecture• Instruction Parallelism and SIMD Operation• Integer ALU• Computational blocks

− X and Y Register File− X and Y ALU <-− Multiplier− Shifter− CLU

• Program Sequencer• I J and K buses• DMA Controller• Applications

Page 28: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

28

TigerSHARC

X and Y ALU• 2x64 bit input

paths• 2x64 bit output

paths• 8, 16, 32, or 64 bit

addition/subtraction - Fixed-point

• 32 or 64 bit logical operations - fixed-point

• 32 or 40 bit floating-point operations

Page 29: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

29

Sample ALU Instruction

• Example of 16 bit addition

• XYSR1:0 = R31:30 + R25:24

• Performs addition in X and Y Compute Blocks

Page 30: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

30

TigerSHARC Processor

• Processor Architecture• Instruction Parallelism and SIMD Operation• Integer ALU• Computational blocks

− X and Y Register File− X and Y ALU− Multiplier <-− Shifter− CLU

• Program Sequencer• I J and K buses• DMA Controller• Applications

Page 31: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

31

TigerSHARC

Multiplier• Operates on fixed,

floating and complex numbers.

• Fixed-Point numbers− 32x32 bit with 32 or

64 bit results− 4 (16x16 bit) with

4x16 or 4x32 bit results

• Floating-Point numbers− 32x32 bit with 32 bit

result− 40x40 bit with 40 bit

result

• Complex Numbers− 32x32 bit with 32 bit

result − Fixed-point only

• Results stored in MR register

Page 32: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

32

TigerSHARC Multiplier

XR0 = R1*R2;;XR1:0 = R3*R5;;XMR1:0 = R3*R5;; //uses XMR4 overflowXR2 = MR3:2, XMR3:2 = R3*R5;; XR3:2 = MR1:0, XMR1:0 = R3*R5;;

XFR0 = R1*R2;;XFR1:0 = R3:2*R5:4;; //40 bit multiply

//32 bit mantissa

Page 33: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

33

TigerSHARC Processor

• Processor Architecture• Instruction Parallelism and SIMD Operation• Integer ALU• Computational blocks

− X and Y Register File− X and Y ALU− Multiplier− Shifter <-− CLU

• Program Sequencer• I J and K data buses• DMA Controller• Applications

Page 34: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

34

TigerSHARC

Shifter• Operates on one 64-

bit, one or two 32-bit, two or four 16-bit, and four or eight 8-bit fixed-point operands

• Shifts and rotates bits

• manipulation operations, like bit set, clear, toggle and test

• Bit FIFO operations to support bit streams

Page 35: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

35

TigerSHARC Processor

• Processor Architecture• Integer ALU• Computational blocks

− X and Y Register File− X and Y ALU− Multiplier− Shifter− CLU <-

• Program Sequencer• J and K data buses• I bus – data bus

Page 36: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

36

TigerSHARC CLU

• CLU instructions are designed to support different algorithms used for communications applications

• Algorithms supported are−Viterbi Decoding (minimal distance decoding

algorithm)−Turbo-code Decoding (variant of Viterbi

decoding)−De-spreading for Code Division Multiple Access

(CDMA) systems (used for tasking a signal in wide Pseudo Noise spread bandwidth)

Page 37: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

37

TigerSHARC Processor

• Processor Architecture• Instruction Parallelism and SIMD Operation• Integer ALU• Computational blocks

− X and Y Register File− X and Y ALU− Multiplier− Shifter− CLU

• Program Sequencer <-• I J and K buses• DMA Controller• Applications

Page 38: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

38

TigerSHARC Program Sequencer

• Supplies instruction addresses to memory • IAB caches up to five fetched instruction

lines waiting to execute• It extracts an instruction line from IAB

and distributes to appropriate core component for execution

• Determine flow control for instructions like JMP, CALL

• Reduce branch delays using branch prediction and BTB

Page 39: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

39

TigerSHARC Processor

• Processor Architecture• Instruction Parallelism and SIMD Operation• Integer ALU• Computational blocks

− X and Y Register File− X and Y ALU− Multiplier− Shifter− CLU

• Program Sequencer• I J and K buses <-• DMA Controller• Applications

Page 40: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

40

TigerSHARC architecture at a glance

Page 41: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

41

TigerSHARC Buses

• DRAM divided into 6 blocks of 4Mbits• 6 blocks connect to four 128-bit wide

internal buses through a crossbar connection

• Internal bus architecture provides a total memory bandwidth of 32Gbytes/sec

• Core and I/O can access −twelve 32-bit data words−four 32-bit instructions

per cycle

Page 42: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

42

TigerSHARC Processor

• Processor Architecture• Instruction Parallelism and SIMD Operation• Integer ALU• Computational blocks

− X and Y Register File− X and Y ALU− Multiplier− Shifter− CLU

• Program Sequencer• I J and K buses• DMA Controller <-• Applications

Page 43: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

43

TigerSHARC DMA Controller

• On-chip, with 14 DMA channels

• Provide zero-overhead data

transfers

• Operates independently and

invisibly to the DSP’s core

Page 44: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

44

TigerSHARC Processor

• Processor Architecture• Instruction Parallelism and SIMD Operation• Integer ALU• Computational blocks

− X and Y Register File− X and Y ALU− Multiplier− Shifter− CLU

• Program Sequencer• I J and K buses• DMA Controller• Applications <-

Page 45: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

45

TigerSHARC Applications

Page 46: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

46

References

• ANALOG DEVICES− http://www.analog.com/processors/processors/tigersharc/index.ht

ml− http://www.analog.com/processors/processors/sharc/index.html− http://www.analog.com/processors/resources/teachingResources.ht

ml

• ECE-ADI-PROJECT HOME PAGE− http://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/index.html− http://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/otherschoolsFrame.

htm

Page 47: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

47

Summary

• What is Harvard Architecture?

• What is Super Harvard Architecture?

• TigerSHARC processor architecture

• How TigerSHARC is ‘faster’ for

targeted DSP applications?

Page 48: 1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

48

Questions?

Thank You.