Top Banner
Throughput Oriented Architectures 1
17

Throughput oriented aarchitectures

Dec 05, 2014

Download

Engineering

Nomy059

computer architecture article related to throughput oriented architectures
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Throughput oriented aarchitectures

1

Throughput Oriented Architectures

Page 2: Throughput oriented aarchitectures

2

Contents

• Throughput oriented Processors• Hardware Multithreading• Many Simple Processing Units • SIMD Execution • GPUs• NVIDIA GPU architecture• Throughput oriented programming• Conclusion

Page 3: Throughput oriented aarchitectures

3

Key Points:

• Throughput oriented processors tackle problems where parallelism is abundant.

• Due to their design ,programming throughput oriented processors requires much more emphasis on parallelism and scalability than programming sequential processors.

• GPUs are the leading exemplars of modern throughput-oriented architectures .

Page 4: Throughput oriented aarchitectures

4

Throughput-Oriented Architectures:

• Throughput and latency are two fundamental measures for processor performance.

• Traditional Scalar microprocessors are latency oriented architectures.

• Throughput oriented processors arise from the assumption that they will work where parallelism is abundant.

• Throughput oriented architectures rely on three key architectures:

1. Emphasis on many simple processing cores2. Extensive Hardware multi-threading3. SIMD Execution

Page 5: Throughput oriented aarchitectures

5

Hardware Multithreading:

• A computation in which parallelism is abundant can be decomposed into a collection of concurrent sequential tasks that execute in parallel or across many threads.

• A thread is able to execute the instruction stream corresponding to a single sequential task.

• Multithreading weather in hardware or software provides a way of tolerating latency.

• Hardware multi-threading as a design strategy for improving aggregate performance on parallel workloads has a long history.

Page 6: Throughput oriented aarchitectures

6

Hardware Multithreading:

• Tera, Sun Niagara and NVIDIA GPU22 uses multithreading for high throughput performance.

• Simultaneous multithreading is used to improve the efficiency of superscalar sequential processors.

• HEP, Tera and NVADIA G20 shows characteristics of throughput-oriented processors.

Page 7: Throughput oriented aarchitectures

7

Many simple processing units:

• High density transistors consists of many simple processing units.

• Throughput oriented architectures achieve higher level of performance by using simple and many processing units.

• The instructions execute in the order they are in the program.• Saving in chip area allow many parallel processing units and

gives higher throughput on parallel workloads.

Page 8: Throughput oriented aarchitectures

8

SIMD execution:

• Parallel processors uses form of SIMD execution to improve aggregate throughput.

• Two basic catagories of SIMD machines are SIMD processor array and vector processor.

• SIMD processor arrays consists of many processing units and single control unit.

• Vector processor consist of traditional scalar instructions and vector instructions operating on data vectors of fixed width.

Page 9: Throughput oriented aarchitectures

9

• GPUs are similar to a computer's CPU. A GPU, however, is designed specifically for performing the complex mathematical and geometric calculations that are necessary for graphics rendering.

GPU:

Page 10: Throughput oriented aarchitectures

10

• Difference between a CPU and GPU .• A CPU comprise of a few cores enhanced for serial

sequence.• And a GPU comprise of thousand of smaller more

efficient cores make for handling multiple tasks concurrently.

CPU And GPU:

Page 11: Throughput oriented aarchitectures

11

CPU ANG GPU:

Page 12: Throughput oriented aarchitectures

12

• Floating Point performance is 1000GFLOPS• On-chip scratchpads is 48KB/SM. • Off-chip memory bandwidth is 100GB/s

NVIDIA Fermi Graphical Processing Unit.

Page 13: Throughput oriented aarchitectures

13

NVIDIA v Intel:

Page 14: Throughput oriented aarchitectures

14

Performance per watt:

Page 15: Throughput oriented aarchitectures

15

Microarchitecture of GPU

Page 16: Throughput oriented aarchitectures

16

Reduction tree:

Page 17: Throughput oriented aarchitectures

17

• Throughput oriented processors assume parallelism is more focused, rather than scarce, and it target is maximizing total throughput of all tasks rather than minimizing the latency of one task.

• A fully general purpose chip can not affords to aggressively trade for increased total performance at the cost of single thread performance.

Conclusion