Top Banner
1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1
32

1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

Jan 02, 2016

Download

Documents

Melvin Phelps
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

1

Multithreaded Programming Concepts

2010. 3. 12Myongji University

Sugwon Hong

1

Page 2: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

2

Why Multi-Core?

Until recently increasing clock frequency is the holy grail to all processor designers to boost performance.

But it seems that they reach the dead end for raising clock speed because of power consumption and overheating.

So, they realize that it is much more efficient to run several cores at a lower frequency than one single core at a much faster frequency.

Page 3: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

3

Power and Frequency

(source : Intel Academy program)

Page 4: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

4

A little bit of history

In the past, performance scaling in single-core processors was achieved by increasing the clock frequency.

When processors shrink and clock frequencies rise, Excess power consumption, and

overheating Memory access time failed to keep pace

with increasing clock frequencies.

Page 5: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

5

Instruction/data-level parallelism Since 1993, processor designers

supported parallel execution at instruction and data level.

Instruction-level parallelism Out-of-order execution pipeline and multiple

functional units to execute instructions in parallel

Data-level parallelism Multimedia Extension (MMX) in 1997 Streaming SIMD Extension (SSE)

Page 6: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

6

Hyper-Threading

In 2002, Intel utilized additional copies of execution resources to execute two separate threads simultaneously on the same processor core.

This multi-threading idea eventually lead to introducing dual-core processor in 2005.

Page 7: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

7

Evolution of Multi-Core Technology

(source : Intel Academy program)

Page 8: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

8

Multi-processors Architecture Shared memory multiprocessor

(SMP) Non-shared memory architecture

Massively Parallel Processor (MPP) Cluster

CPU CPU CPU CPU CPU

Shared memory

SMP

CPU CPU CPU CPU CPU

Interconnected

memoryMPP

Page 9: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

9

Multi-processors vs. Multi-cores Shared memory multi-processors

(SMP) Multiple thread on a single core

(SMT) Multiple thread on multi-cores (CMT)

Tricky acronym

CMP (Chip Multi-processor)

SMT (Simultaneous MultiThreading)

CMT (Chip-level MultiThreading)

Page 10: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

10

CMT processor products

1st generation: Sun Microsystems (late 2005)

Intel Dual-Core Xeon (2005) Intel Quad-Core Xeon (late 2006) AMD Quad-Core Opteron (2007) 8-Core (??)

Page 11: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

11

Thread

A thread is a sequential flow of instructions executed within a program.

Thread vs. Process A single process always has one main

thread which initialize the process and begins executing the instructions.

Any thread can create other threads within a process which share code and data segments. But each thread has its own stack.

Page 12: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

12

Thread in a Processprocess

Page 13: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

13

Why use threads?

Threads are intended to improve performance and responsiveness of a program.

Quick turnaroud time Completing a single job in the smallest

amount of time possible High throughput

Finishing the most tasks in a fixed amount of time

Page 14: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

14

Risks of using Threads But if they are not used properly, they can

lead to degrade performance, and sometimes unpredictable behavior, and

error conditions Data race (race conditions) Deadlock

And other extra burdens. Code complexity Portability issues Testing and debugging difficulty

Page 15: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

15

Race condition

It happens when more than two threads access a shared variable.

“It is nondeterministic!”

For example, when Tread A and Tread B are executing the statement.

area = area + 4.0 / (1.0 + x*x)

Page 16: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

16(source : Intel Academy program)

Page 17: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

17

How to deal with race condition Synchronization

Critical region Mutual exclusion

Page 18: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

18

Concurrency vs. Parallelism

Generally two terminologies can be used interchangeably. But conventional wisdom has the following distinction.

Concurrency It happens when more than two threads are

in progress simultaneously, normally on a single processor.

Parallelism It occurs when more than two threads are

executed simultaneously on multiple cores.

Page 19: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

19

Performance criteria

Speedup Efficiency Granularity Load balance

Page 20: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

20

Speedup

The most noticeable quantitative measure is to compare the execution time of the best serial algorithm with that of the parallel algorithm.

Speedup = Ts/Tp

Ts = Serial Time, Tp = Parallel Time

Amdahl’s Law

Speedup = 1/[S+(1-S)/n + H(n)]

S: percentage of time spent on executing the serial portion

H(n) : parallel overhead

n: the number of cores

Page 21: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

21

Example

Consider painting a fence. Suppose it takes 30 min to get ready to paint and 30 min for cleanup after painting. Assume that it takes 1 min to paint one single picket and there are 300 pickets. What are the speedups when 1, 2, 10, 100 painters do this job respectively? What is the maximum speedup?

What if you use a spray gun to paint the fence? What happens if the fence owner uses spray gun to paint 300 pickets in 1 hrs?

Page 22: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

22

Parallel Efficiency

A measure of how efficiently core resources are used during parallel computations

In the previous example, assume that you knew that all painters were only busy for an average of less than 6% of entire job time but are still getting paid for the whole time. Do you think you were getting the money’s worth from the 100 painters?

Efficiency = (Speedup / Number of Threads) * 100%

Page 23: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

23

Granularity

The ratio of computation to synchronization

Coarse-grained Concurrent threads have a large amount of

computation between synchronization events. Fine-grained

Concurrent threads have a very little computation between synchronization events.

Page 24: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

24

Load Balance

Balancing the workloads among multiple threads

If more work is assigned to some threads, they will sit idle until other threads with more work finish.

All the cores must be busy to get max. performance.

For load balancing, which size of task will be better? Large-sized or small-sized?

Page 26: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

26

Computer Memory Hierarchy

CPU

L1 cache

L1 cache

L2 cache Main

memory

disk

1’s cycle 1’s ~10 cycle

~100’s cycle ~1000’s cycle

Page 27: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

27

Architecture consideration(1) In order to obtain better performance, we

need to understand how the work is done inside.

Cache Cache line (cache block, e.g. 64bytes)

Data moves between memory and caches in cache line. Shared caches or separate caches between cores Cache miss is very costly. Cache coherency when they are separate. Replacement policies such as LRU

Page 28: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

28

Architecture consideration(2) Memory management

Paging Translation look-aside table (TLB)

Inside CPU Registers

Page 29: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

29

False sharing

Assume the cache line is 64 bytes. What happens if two threads try to execute at the same time?

Thread 1

int a[1000];

int b[1000];

while

a[998] = i * 1000;

Thread 2

int a[1000];

int b[1000];

while

b[0] = i;

Page 30: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

30

Poor cache utilization

What is the difference between the following two codes?

int a[1000][1000];

for (i=0; i<100; ++i)

for (j=0; j<1000; ++j)

a[i][j] = i*j;

int b[1000][1000];

for (i=0; i<100; ++i)

for (j=0; j<1000; ++j)

b[j][i] = i*j;

Page 31: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

31

Poor Cache Utilization - with eggs

(source : Intel Academy program)

Page 32: 1 Multithreaded Programming Concepts 2010. 3. 12 Myongji University Sugwon Hong 1.

32

Good Cache Utilization – with eggs

(source : Intel Academy program)