Top Banner
1 Lecture Notes IAF0042 Arvo Toomsalu
27

IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

May 03, 2018

Download

Documents

dangtuyen
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

1

Lecture Notes

IAF0042

Arvo Toomsalu

Page 2: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

2

Computer Architecture Introduction

I/O-subsystem

Processor subsystem Memory subsystem

CORE

Computer Model

Classical Architectures

Princeton or von Neumann architecture

System Bus

Data and Instructions

CPU

MEMORY

Data &

Instructions

Page 3: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

3

Harvard architecture

I Bus D Bus

MEMORY Instructions

CPU

MEMORY Data

Page 4: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

4

MEMORY SYSTEM

MSD (Memory Storage Devices)

Primary storage Secondary storage

RAM ROM Magnetic Optical

CAM ROM M tape CD-ROM

SRAM PROM M disk WORM (CD-R)

DRAM (OTP) Magneto-optical CD-RW

Molecular RPROM DVD

RAM Hologram Optical Disc

Flash ROM

EEPROM

[UVROM]

CAM – Content Addressable Memory (Associative Memory)

Page 5: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

5

Memory Hierarchy

Memory hierarchy wide-spread model

A typical hierarchy consists of:

1. Register file;

2. Per-processor level 1 (L1) instruction and data cache;

3. On-chip, shared unified level 2 (L2) cache;

4. Off-chip level 3 (L3) cache;

5. Main memory;

6. Hard disc for virtual memory.

Extended memory hierarchy model

Page 6: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

6

Memories Internal (architectural) Organization

DRAM – dynamic RAM

SDRAM - synchronous dynamic RAM

DDR-SDRAM - double-data-rate SDRAM

MDRAM – multi-bank DRAM

ESDRAM - cache-enhanced DRAM

etc.

Page 7: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

7

MULTIPROCESSOR SYSTEMS

Flynn-Johnson taxonomy

SISD Architecture

I- instructions; D – data

SISD

SIMD

MISD

MIMD

SINGLE

DATA

STREAM

MULTIPLE

DATA

STREAM

SINGLE

INSTRUCTION

STREAM

MULTIPLE

INSTRUCTION

STREAM

CU EU MUI

I

D

Page 8: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

8

SIMD Architecture

MISD Architecture

D

D

D

I

I

I

I

I

CU

MUEU

EU

EU

MU

MU

MU CU CU CU

EU EU EUD D

I I I

I

I

I

Page 9: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

9

MIMD Architecture

SM - Shared Memory

LM – Local Memory

CU

CU

CU

EU

EU

EU

MU

MU

MU

D

D

D

I

I

I

PR PR PR

SM SMIO

System interconnect(bus, crossbar, network) UMA model

Page 10: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

10

PR – processor; IO – input-output unit; SM – shared memory; LM – local memory;

GSM – global shared memory; CSM – cluster shared memory;

CIN – cluster interconnection network.

GSMGSMGSM

PR

PR

PR

PR

CSM CSM

CSMCSM

CIN CIN

Global interconnection network

Cluster 1 Cluster n

NUMA model (cluster)

Page 11: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

11

UMA versus NUMA

CPU CPU CPU CPU Cache Cache CacheCache

MEM MEM MEM MEM

Interconnection Network

Uniform

memory

latency

CPU CPU CPU CPU CacheCacheCacheCache

MEM MEM MEM MEM

Interconnection Network

Long memory latency

NUMA

UMA

Short

local

memory

latency

Microprocessor systems capabilities are related to system processing capabilities include:

Cost-performance

Throughput (operations per time unit)

Resource sharing

Page 12: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

12

Example

The Newisys ASIC implementation HORUS

Page 13: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

13

Summary

Taxonomy of Mono- and Multiprocessor Organizations

Serial

UniprocessorVectorprocessor

Arrayprocessor

Symmetricmultiprocessor

(SMP)

Nonuniformmemory access(NUMA)

Clusters

SISD SIMD MIMD

Shared

memory

Distributed

memory

Tightlycoupled

Looselycoupled

Multi ALUOverlappedoperations

MISD

Parallel

Processor Organization

Literature Arthur W. Burks, Herman H. Goldstine, John von Neumann. Preliminary Discussion of the

Logical Design of an Electronic Computing Instrument.

Arvutivõrgus: http://www.cs.unc.edu/~adyilie/comp265/vonNeumann.html

Page 14: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

14

Network Processors

Network processor is a programmable CPU chip that is optimized for networking and

communications functions.

Two common approaches (a, b) to parallelism in network processors:

a. Input packets are distributed among multiple processing units to divide the load.

b. Input packets flow through a pipeline of processing elements.

Page 15: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

15

Graphics Processor

A graphics processor (video card, graphic accelerator card, display adapter) is a special

purpose microprocessor specifically designed to generate signals to drive a video

monitor.

In graphics applications, complex shapes and structures are formed through the

sampling, interconnection and rendering of more simple objects (primitives).

Graphics primitives may include lines, characters, areas (triangles and ellipses), and

shapes (polygons, spheres, cylinders and the like).

Page 16: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

16

These primitives are formed by the interconnection of individual pixels.

3D graphics images, there are three dimensions, include the dimension of depth

(Z dimension).

Modern computers typically produce graphical output using a sequence of tasks known as

a graphics pipeline.

Page 17: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

17

NVIDIA GeForce 6800 Features

1. High performance:

2. Multiple small independent memory partitions for improved latency

3. Early culling and clipping, cull non-visible primitives at high rate;

4. Rasterization supports aliased and anti-aliasing and triangles, etc;

5. Z-Cull, allows high-speed removal of hidden surfaces;

6. Occlusion Query, keeps a record of the number of fragments passing or failing the

depth test and reports it to the CPU.

Page 18: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

18

Pyramid3D Real-time Graphics Processor TriTech Microelectronics, Inc.

Multiprocessor architecture

Single-chip 3D graphics solution, which consists of:

� Geometry Processor

� Primitive Processor

� Pixel Processor

Page 19: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

19

Multimedia Processors

Multimedia is media and content that uses a combination of different content forms.

Multimedia is integration of multiple forms of media: text, graphics, audio, video,

communication etc.

Multimedia Applications Characteristics

The most important ones are:

• Real-time response.

• Processing of streaming data.

• Significant fine and coarse grained data parallelism.

• Data reorganization.

• Small loops.

• High memory bandwidth requirement. The applications process large data sets, putting a severe burden on memory system.

• Small data types.

• MMAs perform significantly more arithmetic operations than GPAs.

Page 20: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

20

Classification of Processor Architectures that Support Multimedia

Dedicated multimedia processors

The dedicated processors are typically custom designed architectures intended to perform

specific multimedia functions. Some advanced multimedia processors provide also

support for 2D and 3D graphics applications.

Designs of dedicated multimedia processors ranges from fully custom architectures,

referred to as function specific architectures, with minimal programmability, to fully

programmable architectures.

A. Function specific architectures

Function specific dedicated multimedia architectures provide limited, programmability,

because they use dedicated architectures for a specific encoding or decoding standard.

B1. Flexible programmable architectures

These processors can have a moderate to high flexibility, are based on coprocessor

concept as well as parallel datapaths and deeply pipelined designs.

Page 21: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

21

TI’s Multimedia Video Processor

B2. Adapted programmable architectures These processors provide increased efficiency by adapting the architecture to the specific

requirements of video coding applications.

Page 22: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

22

C-Cube’s VideoRISC processor

The modern advanced dedicated multimedia processors use SIMD and VLIW

architectural schemes and their variations to achieve very high parallelism.

Page 23: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

23

Philips TriMedia CPU64

Page 24: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

24

Philips TriMedia CPU64 TM1x00 with VLIW-core

Processor’s main characteristics are:

1. A 5-issue VLIW architecture with a 32-bit word size;

2. 27 functional units;

3. Any operation can be guarded to provide conditional execution without branching;

4. Instruction set and functional units optimized with respect to media processing;

5. A single multi-ported register file with bypass network, allowing 1-cycle latency operations;

6. 32 kB, 8-way instruction cache;

7. 16 kB, 8-way, quasi-dual ported, data cache;

8. A variable-length (compressed) instruction set design.

Page 25: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

25

Example

ZMS-08 Media Processor ZiiLABS Pte Ltd.

Typical Application

o Web tablets

o Netbooks

o Connected TVs

o Portable infotainment

o Digital media hubs

o Point of service terminals

o Video conferencing systems

Main Features

o Blue-ray Quality 1080p H.264 video decode

o 1080p H.264 video encode

o 720p H.264 video conferencing

o Multi format media codecs o ARM Cortex-A8 at 1GHz

o Accelerated graphics and compositing

o Advanced image signal processing

o Rich peripheral integration and connectivity

Performance

o Blue-ray Quality 1080p H.264 video decode at 40mbps

o Simultaneous 720p H.264 video encode and decode

o 1080p H.264 video encode

o ARM Cortex-A8 at 1GHz

Page 26: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

26

General-purpose (GP) processors

GP processors provide support for multimedia by including multimedia instructions

into the instruction set.

Multimedia Processors Architecture Development Trends

There are three new architectural concepts:

1. Reconfigurable computing;

2. Simultaneous multithreading (SMT):

Page 27: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction

27

SMT based multimedia architecture

3. Associative controlling