Top Banner
Clare Smtih SHARC Present ation 1 The SHARC Super Harvard Architecture Computer
26

Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

Dec 11, 2015

Download

Documents

Kelsie Trillo
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

Clare Smtih SHARC Presentation 1

The SHARC

Super Harvard Architecture Computer

Page 2: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

Clare Smtih SHARC Presentation 2

The SHARC

• Developed by Analog Devices

• Optimized for demanding DSP and imaging applications.

• 32 Bit floating point, with 40 bit extended floating point capabilities.

• Large on-chip memory.

• Ideal for scalable multi-processing applications.

Page 3: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

3

Harvard Architecture

• Program memory can store data.

• Able to simultaneously read or write data at one location and get instructions from another place in memory.

• 2 buses1 Data memory bus.2 Program bus.

• Either two separate memories or a single dual-port memory.

Page 4: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

Clare Smtih SHARC Presentation 4

Super Harvard Architecture

• Many processor employ Harvard Architecture by having two separate memories or caches integrated into the processor chip

• The SHARC is unique in that it’s internal memory is capable of holding a large program as well a large amount of data. This is what makes it SUPER!!!

Page 5: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

Clare Smtih SHARC Presentation 5

DSP

• Digital Signal Processor.

• High speed, low overhead data movement and rapid computations required.

• Usually has a small on-board ROM, RAM and single cycle multiply.

• Designed to run single line, serial in, serial out, signal processing applications very fast.

Page 6: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

Clare Smtih SHARC Presentation 6

DSP Computations

• The inner product of two vectors is a common computation for determining energy or correlation.

• The following C code is an example: for (n=0; n<length; n++) result+= x[n] * y[n];

• The process which has the lowest instruction time will have the best performance.

Page 7: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

Clare Smtih SHARC Presentation 7

SHARC DSP

• The SHARC incorporates features aimed at optimizing such loops.

• High-Speed Floating Point Capability

• Extended Floating Point

• These features are DSP specific.

• Meaning, when applied to a non-DSP application performance may not be as optimal.

Page 8: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

8

Floating Point and Extended Floating Point

• The SHARC supports floating, extended-floating and non-floating point.

• No additional clock cycles for floating point computations.

• Data automatically truncated and zero padded when moved between 32-bit memory and internal registers.

• Not accurate enough for scientific algorithms. Excellent signal to noise ratio.

Page 9: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

9

SHARC’s Internal Memory

• Makes SHARC unique.

• Size• Allows many complex functions to be preformed on-chip.

Eliminating the need to move data between internal and external memory.

• Memory size is significantly larger then most other high speed computational devices.

• Dual-block, Dual-port• Optimizes the Harvard Architecture by allowing the fetch

of instructions while performing data memory accesses.

Page 10: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

10

Multiply and Accumulate Instructions on the SHARC

• Like most DSPs the SHARC is able to compute a product and add the product to a running total in a single clock cycle.

• The SHARC’s super instruction is that it can multiply and accumulate while adding, subtracting, or averaging data in two other registers.

• These instructions give the SHARC its 120 megaflop rating.

Page 11: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

11

Zero Overhead Loopingon the SHARC

• A single instruction outside the loop performs loop set-up. Informing the SHARC that there is a loop approaching.

• The instruction also includes the iteration count and termination condition.

• This causes the pipeline to remain full during loop execution and also allows the termination condition to be tested in parallel.

Page 12: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

12

DAGs on the SHARC

• Data Address Generators are integer computation units that manage the indexing of registers.

• Allows the SHARC to to fetch a value and update the index value.

• If the updated value exceeds a limit, the DAB adjusts the index so that it wraps.

• This occurs in the same clock cycle as the read or write.

Page 13: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

Clare Smtih SHARC Presentation 13

DAG Capabilities

• Circular Buffering• Rather then actually moving data in and out of a vector,

circular buffers are used.• Updating the index modulo, the oldest entry can be

conveniently replaced by the newest entry.

• Bit Reverse Addressing• The bit pattern of a vector index is reversed.• Done automatically by the SHARC.• Required for Fast Fourier Transform (FFT), which is

often critical to DSP applications.

Page 14: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

Clare Smtih SHARC Presentation 14

SHARC DSP

• What Makes the SHARC unique?– It also has some features not related directly

related to optimizing numeric computations.• Pipelining

• Handling Branches

• Why has this not emerged sooner?– Technology has only recently become available

to make it economical to integrate general single computing devices.

Page 15: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

Clare Smtih SHARC Presentation 15

SHARC’s Pipeline

• 3 stages1 Instruction Fetch

2 Decode

3 Execution

• Takes three clock cycles for an instruction to propagate through the pipeline.

• The processor execution speed is one instruction per clock cycle even though each instruction requires three clock cycles.

Page 16: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

16

SHARC’s Handling BranchesDelayed Branching

• When a branch instruction is encountered the two instructions which have been loaded and decoded are executed before the branch.

• This keeps the pipeline full and avoids junking those two instructions and reloading the pipeline.

• Beneficial in situations such as a few instruction loops. When the ratio of wasted clock cycles to instructions is significant.

Page 17: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

Clare Smtih SHARC Presentation 17

SHARC’s Handling BranchesNon-delayed Branching

• Traditional branching.

• If the pipeline cannot be reordered to use delayed branching, non-delayed branching is space saving.

• Uses only one word of storage.

• Although, it takes three cycles as the pipeline gets reloaded.

Page 18: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

Clare Smtih SHARC Presentation 18

Multi-processing

• SHARC is uniquely equipped for multi-processing.

• Links to ports are very powerful multi-processing capabilities.

• Two main program models depending on the application.

• Adapts well to different multi-processing architectures.

Page 19: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

Clare Smtih SHARC Presentation 19

Multi-processingSHARC Links

• SHARC has 6 link ports that can transport data at rates up to 40Mbytes/sec.

• Links designed for point-to-point connections.

• Data can be transmitted in either direction but not both simultaneously.

Page 20: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

Clare Smtih SHARC Presentation 20

Multi-processing Program ModelMIMD

• Multiple instruction, multiple data.

• Good for applications that require multiple instruction threads to execute concurrently.

• Processors operate individually.• Each processor executes different code.

• Typically used for image reconstruction and multi-channel DSP.

Page 21: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

Clare Smtih SHARC Presentation 21

Multi-processing Program ModelSIMD

• Single instruction, multiple data.

• Works best when all processors execute identical instruction sequences.

• Do not require overhead for inter-processor synchronization.

• Typically used for synthetic aperture radar and automatic target recognition.

Page 22: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

Clare Smtih SHARC Presentation 22

Multi-processing ArchitecturesCluster Design

• Groups of up to 6 in a cluster

• Most common for joining multiple SAHRC's

• All processors, global I/O and global memory connected to a common “Cluster bus.”

• Each SHARC can “drive” the bus.

Page 23: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

23

Multi-processing ArchitecturesMesh Design

• All SHARC’s joined by their link ports and are connected to a common bus.

• In SIMD mode one single master SHARC drives the bus.

• In MIMD mode mesh architecture cannot function if data is lager then on chip available memory.

• Advantageous scalability over a wider range of applications.

Page 24: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

Clare Smtih SHARC Presentation 24

Summary of what makes the SHARC Super

• It performs excellently for DSP applications.

• Employs a Harvard Architecture with very large on chip memory.

• Respectable Megaflop rating.

• It’s multiprocessing capabilities.

Page 25: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

Clare Smtih SHARC Presentation 25

How optimal is the SHARC for non-DSP Applications?

• It is obviously geared for DSP applications.

• While it may fare better then other processors it is still behind those which are designed specifically for non-DSP applications.

Page 26: Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

Clare Smtih SHARC Presentation 26

Sources

• www.alacron.com/news/tp_mimd_simd.htm

• www.analog.com

• www.cs.seas.gwu.edu/~cs339/cs339-lecture2.pdf

• www.ixthos.aa.psiweb.com/technical/notes_articles/articles