Top Banner
1 COSC 6385 Computer Architecture Introduction and Organizational Issues Edgar Gabriel Spring 2018 Organizational issues (I) Classes: Tuesday, 11.30am – 1.00pm, F162 Thursday, 11.30am – 1.00pm, F162 Evaluation as planned right now 1 homework: 25% 3 quizzes: 75% (25% each) In case of questions: email: [email protected] Tel: (713) 743 3358 Office hours: PGH 228, Monday, 11am-11.45am or by appointment All slides available on the website: http://www.cs.uh.edu/~ gabriel/courses/cosc6385_s18/ Videos of some lectures will be posted on the course web page
12

COSC 6385 Computer Architecture Introduction and ...gabriel/courses/cosc6385_s18/CA_01_Intro.pdf · Computer Architecture Introduction and Organizational Issues ... John L. Hennessy,

Sep 07, 2018

Download

Documents

doquynh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: COSC 6385 Computer Architecture Introduction and ...gabriel/courses/cosc6385_s18/CA_01_Intro.pdf · Computer Architecture Introduction and Organizational Issues ... John L. Hennessy,

1

COSC 6385

Computer Architecture

Introduction and Organizational Issues

Edgar Gabriel

Spring 2018

Organizational issues (I)

• Classes:

– Tuesday, 11.30am – 1.00pm, F162

– Thursday, 11.30am – 1.00pm, F162

• Evaluation as planned right now

– 1 homework: 25%

– 3 quizzes: 75% (25% each)

• In case of questions:

– email: [email protected]

– Tel: (713) 743 3358

– Office hours: PGH 228, Monday, 11am-11.45am or by appointment

• All slides available on the website:

– http://www.cs.uh.edu/~gabriel/courses/cosc6385_s18/

– Videos of some lectures will be posted on the course web page

Page 2: COSC 6385 Computer Architecture Introduction and ...gabriel/courses/cosc6385_s18/CA_01_Intro.pdf · Computer Architecture Introduction and Organizational Issues ... John L. Hennessy,

2

Organizational Issues (III)• TA’s for the course:

– Guangli Dai, email: [email protected]

• Dates for the quizzes:

– 1st quiz: Thursday, Feb 15

– 2nd quiz: Tuesday, March 27

– 3rd quiz: Thursday, April 26

• Homework

– Announced: Tuesday, Feb 13

– Due on: Tuesday, March 6

Contents

• Textbook:

John L. Hennessy,

David A. Patterson

“Computer Architecture –

A Quantitative Approach”

6th Edition

Morgan Kaufmann Publishers

Page 3: COSC 6385 Computer Architecture Introduction and ...gabriel/courses/cosc6385_s18/CA_01_Intro.pdf · Computer Architecture Introduction and Organizational Issues ... John L. Hennessy,

3

Contents (II)

• Most of chapters 1 – 5, and 7

– Memory Hierarchy Design

– Instruction Level Parallelism

– Data Level Parallelism

– Thread Level Parallelism

– Domain Specific Architectures

• Appendix B, C

– Review of Memory Hierarchies

– Pipelining

• Selected literature to multi-core processors, GPUs, and

storage systems

Why learn about Computer Architecture?

• Every loop iteration requires 3 memory operations

– 2 loads

– 1 store

• For a micro-processor having a frequency of 2 GHz this loop

requires

to satisfy one Floating Point Unit (FPU)

• Most modern processors have 2 FPUs and two or more Integer Units

which could work in parallel

• Most modern processors have more than one core that can operate

in parallel

for (i=0; i<n; i++ ) {

c[i] = a[i] + b[i];

}

sGBytessBytes /2410*2*4*3 19

Page 4: COSC 6385 Computer Architecture Introduction and ...gabriel/courses/cosc6385_s18/CA_01_Intro.pdf · Computer Architecture Introduction and Organizational Issues ... John L. Hennessy,

4

Memory technology (www.kingston.com/newtech)

• Memory Bandwidth

with

CycleOpfSBSB BUSBus /**max

maxSB

BUSSB

BUSf

: max. memory bandwidth

: Bandwidth of the memory bus (64 Bit = 8 Bytes)

: Frequency of the memory bus

Memory modules

Source: https://www.kingston.com/us/memory/resources/ddr3_1600

Page 5: COSC 6385 Computer Architecture Introduction and ...gabriel/courses/cosc6385_s18/CA_01_Intro.pdf · Computer Architecture Introduction and Organizational Issues ... John L. Hennessy,

5

Memory hierarchies

Size Access time

[cycles]

Backup (tape) TB, PT, EB

Primary data

storage (disk)

~ 10s TB > 106

main memory ~ 4-512 GB 100 - 1000

Caches ~ 1-32 MB 2 – 50

Register < 256 Words 1 - 2

Memory hierarchies

• Do I have to care about memory hierarchies?

• Example: Matrix-multiply of two dense matrices

– “Trivial” code

for ( i=0; i<dim; i++ ) {

for ( j=0; j<dim; j++ ) {

for ( k=0; k<dim; k++) {

c[i][j] += a[i][k] * b[k][j];

}

}

}

Page 6: COSC 6385 Computer Architecture Introduction and ...gabriel/courses/cosc6385_s18/CA_01_Intro.pdf · Computer Architecture Introduction and Organizational Issues ... John L. Hennessy,

6

Matrix-multiply

• Performance of the trivial implementation on an 2.2

GHz AMD Opteron

Matrix dimension Execution time

[sec]

Performance

[MFLOPS]

256x256 0.118 284

512x512 2.05 130

Matrix-multiply (II)

• Peak floating point performance of the processor

2 * (2.2 * 109) Floating point operations/sec

= 4.4 * 109

= 4.4 GFLOPS

• Where are the missing FLOPS between theoretical peek

and achieved performance?

– Memory wait time

Number of floating

point units

Frequency of the processor

→ assuming that each FPU

can finish one operation per

cycle

Theoretical floating point peak

performance of the processor

Page 7: COSC 6385 Computer Architecture Introduction and ...gabriel/courses/cosc6385_s18/CA_01_Intro.pdf · Computer Architecture Introduction and Organizational Issues ... John L. Hennessy,

7

Blocked code

for ( i=0; i<dim; i+=block ) {

for ( j=0; j<dim; j+=block ) {

for ( k=0; k<dim; k+=block) {

for (ii=i; ii<(i+block); ii++) {

for (jj=j; jj<(j+block); jj++) {

for (kk=k; kk<(k+block);kk++) {

c[ii][jj] += a[ii][kk] * b[kk][jj];

}

}

}

}

}

}

Performance of the blocked codeMatrix

dimension

block Execution time

[sec]

Performance

[MFLOPS]

“trivial”

[MFLOPS]

256x256 4 0.065 513 284

8 0.046 726

16 0.51 657

32 0.043 777

64 0.049 677

128 0.113 296

512x512 4 0.686 391 130

8 0.422 635

16 0.447 599

32 0.501 535

64 1.00 266

128 0.994 269

Page 8: COSC 6385 Computer Architecture Introduction and ...gabriel/courses/cosc6385_s18/CA_01_Intro.pdf · Computer Architecture Introduction and Organizational Issues ... John L. Hennessy,

8

Page 9: COSC 6385 Computer Architecture Introduction and ...gabriel/courses/cosc6385_s18/CA_01_Intro.pdf · Computer Architecture Introduction and Organizational Issues ... John L. Hennessy,

9

Top 500 List

Page 10: COSC 6385 Computer Architecture Introduction and ...gabriel/courses/cosc6385_s18/CA_01_Intro.pdf · Computer Architecture Introduction and Organizational Issues ... John L. Hennessy,

10

Trends: Cores and Threads per Chip

19

Source: SICS Multicore Day’ 14

Slide source: Andy Semin ‘ntel processors and platforms roadmap for energy efficient HPC solutionsn’

http://academy.hpc-russia.ru/files/intel_hpc_public.pptx

20

Source: SICS Multicore Day’ 14

Slide source: Andy Semin ‘Intel processors and platforms roadmap for energy efficient HPC solutions’

http://academy.hpc-russia.ru/files/intel_hpc_public.pptx

Page 11: COSC 6385 Computer Architecture Introduction and ...gabriel/courses/cosc6385_s18/CA_01_Intro.pdf · Computer Architecture Introduction and Organizational Issues ... John L. Hennessy,

11

21

Source: SICS Multicore Day’ 14

Slide source: Andy Semin ‘ntel processors and platforms roadmap for energy efficient HPC solutionsn’

http://academy.hpc-russia.ru/files/intel_hpc_public.pptx

„Big Core“ – „Small Core“

Intel® Xeon® ProcessorIntel® Xeon Phi™

Coprocessor

Simply aggregating more cores

generation after generation is not

sufficient

Optimized for highest compute per

watt

Performance per core/thread must

increase each generation, be as fast

as possible

Willing to trade performance per

core/thread for aggregate

performance

Power envelopes should stay flat or

go down each generation

Power envelopes should also stay

flat or go down every generation

Balanced platform (Memory, I/O,

Compute)

Optimized for highly parallel

workloads

Cores, Threads, Caches, SIMD Cores, Threads, Caches, SIMD

Different Optimization Points

Common Programming Models

and Architectural Elements

For illustration only

22

Slide source: Andy Semin ‘ntel processors and platforms roadmap for energy efficient HPC solutionsn’

http://academy.hpc-russia.ru/files/intel_hpc_public.pptx

Page 12: COSC 6385 Computer Architecture Introduction and ...gabriel/courses/cosc6385_s18/CA_01_Intro.pdf · Computer Architecture Introduction and Organizational Issues ... John L. Hennessy,

12

Slide source: A. Ramirez et.al, ‘Are mobile processors ready for HPC?’

http://www.montblanc-project.eu/sites/default/files/publications/Are%20mobile%20processors%20ready%20for%20HPC.pdf

Slide source: A. Ramirez et.al, ‘Are mobile processors ready for HPC?’

http://www.montblanc-project.eu/sites/default/files/publications/Are%20mobile%20processors%20ready%20for%20HPC.pdf