Top Banner
HOW MANY CORES WILL WE NEED? IN SEARCH OF PARALLEL KILLER APPS CHIEN-PING LU, PHD MEDIATEK INC
31

Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

Jan 13, 2015

Download

Technology

Keynote presentation, How Many Cores Will We Need?, by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc., at the AMD Developer Summit (APU13), Nov. 11-13, 2013.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

HOW MANY CORES WILL WE NEED? IN SEARCH OF PARALLEL KILLER APPS

CHIEN-PING LU, PHD MEDIATEK INC

Page 2: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 2

A GROUP OF HIPPOS IS CALLED …

A Crash

Page 3: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 3

A GROUP OF CROWS IS CALLED …

A Murder

Page 4: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 4

A GROUP OF GIRAFFES IS CALLED …

A Tower

From Wikipedia

Page 5: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 5

SO, IT IS NOT SURPRISING THAT WE USE

“A Parade” of elephants “An Army” of ants “A Herd” of sheep

Page 6: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 6

FROM FREQUENCY TO MULTICORE SCALING

pe

rform

ance

Time Power wall: 2005

Multi-core Single-core

Po

we

r

Po

we

r

Freq

ue

ncy

Page 7: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 7

IT SEEMS INEVITABLE THAT WE WILL NEED A MASSIVE NUMBER OF CORES

pe

rform

ance

Time

Moderate Massive

Page 8: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 8

pe

rform

ance

Time

2x

4x 3x

8x 4x 16x 4x

DARK SILICON (OR DARK CORES)?

Page 9: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 9

HOW TO LIGHT UP THE CORES?

po

we

r

Degree of Parallelism

Power ceiling

SIMT “cores”

Parallelism wall

Little cores

Big cores

Redefine the cores to be heterogeneous

Search for parallel killer apps

H.264 encoding Ray tracing

Page 10: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 10

Fron

t End

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

Fron

t End

ALU

ALU

ALU

A

LU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

Fron

t End

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ARMY OF ANTS: SIMT CORES FOR SIMT (SINGLE-INSTRUCTION-MULTIPLE-THREAD ) EXECUTION

A branch is emulated thru divergence

SIMT is the execution model of HSA and implemented in modern GPUs, with MIMD flexibility and SIMD efficiency

A cluster of SIMT cores shares one front end in a SIMD manner

Parallel.For (…)

If (…) then

… Else

A SIMT core runs 1 iteration of the parallel loop

SPE

SPE

Specialized Processing Engines A

LU

ALU

ALU

ALU

ALU

Wider SIMT

Page 11: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 11

MASSIVELY PARALLEL WORKLOADS

• Problem size N can keep growing

• Visible serial workload s can be kept constant

• Parallel workload is speeded up by P, the number of cores

• Reduction overhead is proportional to log P (by a factor of r)

• "Embarrassingly" parallel, when there is no reduction overhead (r=0)

N/P r log P

N

s

s

Time saved by P cores

Page 12: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 12

1

10

100

1 2 4 8

16

32

64

12

8

25

6

51

2

10

24

20

48

40

96

81

92

Spe

ed

up

Degree of Parallelism (P)

s1=50%, r=50%

N=16

N=64

N=256

1

10

100

1 2 4 8

16

32

64

12

8

25

6

51

2

10

24

20

48

40

96

81

92

Spe

ed

up

Degree of Parallelism (P)

s1=50%, r=50%

N=16

N=64

N=256

P=N1

10

100

1000

10000

1 2 4 8

16

32

64

12

8

25

6

51

2

10

24

20

48

40

96

81

92

Spe

ed

up

Degree of Parallelism (P)

s=50%, r=50%

N=16

N=64

N=256

P=N

REVISITING AMDAHL'S LAW

1log

Prs

PsSpeedup

PNPrs

NsSpeedup

/log

Page 13: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 13

GRAPHICS KEEP MOVING

Pac-man, 1980

GL benchmark 2.1 Egypt

GL benchmark 2.5 Egypt

GFX bench 2.7 T-Rex

GFX bench 3.0 Manhattan

Mobile 3D Graphics

Highest grossing video game of all-time Recognized by 94% of American Consumers

Page 14: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 14

MEDIATEK FACE BEAUTIFICATION WHEN IT COMES TO BEAUTY, THERE SEEMS TO BE NO LIMIT

Before Skin tone adjustment Wrinkle removal Thinner face, bigger eyes

Page 15: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 15

HPC from 1993 to 2012

‒GFLOPS ~ 130,000x

‒Cores ~ 11,000x

‒GHz ~ 10x

HIGH-PERFORMANCE COMPUTING (HPC) KEEPS SCALING OUT

Higher grid resolution

More time steps

More atoms

0

1

10

100

1,000

10,000

100,000

1,000,000

1990 1995 2000 2005 2010 2015

Re

lati

ve t

o 1

99

3

Top of Top500 1993-2012

GFLOPS

Cores

GHz

Page 16: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 16

Higher frequency

THE MISSING LINKS

Moore’s law

Bigger problems

Bigger data Better user experience

More cores

IN SEARCH OF PARALLEL KILLER APPS

More complex software

What bigger problems to solve with bigger data?

How solving bigger problems leads to better user experience?

Mining bigger data with Machine Learning

Page 17: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 17

MACHINE LEARNING: TREND PREDICTION WITH POWERFUL MODELS

Powerful models (with many knobs) tend to over-fit the noise if the data set is not sufficiently large

The explosive growth of data has made powerful models feasible

A model with 1 billion knobs, trained with 10 million images from YouTube was used in Google Brain experiment to figure out the concepts of cats and human faces by itself

-50

0

50

100

150

200

250

300

350

0 2 4 6

Samples Data

Linear Poly. (2nd order)

Poly. (6th order)

6th-order polynomial undulates excessively with only 4 samples

Source: Le et al., Building High-level Features Using Large Scale Unsupervised Learning

Page 18: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 18

HOW TO DISTINGUISH CATS FROM DOGS?

ASIRRA Animal Species Image Recognition for Restricting Access (from Microsoft Research)

Page 19: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 19

CAN ASIRRA BE CRACKED?

Page 20: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 20

WHY IS IT HARD?

Source: training set of Kaggle.com Dogs vs. Cats competition

Page 21: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 21

IS THERE A MODEL FINDING OUT THAT THESE ARE THE SAME DOG?

Prancer, a 5-years-old toy poodle, before and after grooming

Page 22: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 22

MINE THE SOLUTIONS FROM THE DATA

Do

g-Cat

classifier

Theory of the differences between dogs and cats?

Learn from many (12,500) photos labeled as dogs or cats

Machine Learning

Page 23: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 23

Smarter Client Client

Sensing Better Sensing

Connectivity Better

Connectivity Cloud

Answer Powerful

Model Machine Learning

Better Answer

Bigger Machine Learning

Bigger Model

Big Data Bigger Data

SMART AND SMARTER CLIENTS IN THE ERA OF BIG DATA

Big Training Set

Input data

Bigger Training Set

In the cloud or the clients

Local Machine Learning

Page 24: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 24

PARALLEL COMPUTING IN THE CLOUD AND AT THE CLIENTS

),( nn yx

ia

x y

Knobs

Samples

x iaModel

f

Machine Learning

Tweak to minimize the error between

nyand

ia

nx iaModel

f

dog/cat photos dog or cat

Sensor readings jogging, walking or driving

Cloud Parallel Computing with more samples

Examples:

Client Parallel Computing with more knobs

Page 25: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 25

Machine learning happens in the cloud and at the clients

Models run in the cloud or at the clients

Need same ease of programming and write-once-run-everywhere for heterogeneous cores

WHY HSA?

Mediatek is one of the cofounders of HSA Foundation

MediaTek is the first to introduce in mobile SoC

True Octa-Core

Heterogeneous Multiprocessing (HMP)

Page 26: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 26

• Carbon footprint of US datacenters is at the same level as the airline industry

• A 1,000m2 datacenter consumes 1.5MW, enough to power 1,000 US homes per year

In order to scale out, we need to scale in with heterogeneous cores in the cloud and in our palms

Typical 1,000 homes in US

SCALE OUT AND SCALE IN WITH HETEROGENEOUS CORES

• Both the cloud and mobile clients are limited by power

• Mobile devices need to keep cool in our palms

• Data centers need to keep our environment clean

Page 27: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

BACKUP

Page 28: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 28

THE NEW VIRTUOUS CYCLE

Moore’s law and beyond

Bigger data Better user experience

More heterogeneous cores

Mining bigger data with Machine Learning

PERHAPS, LEADING TO COMPUTING LIKE OUR BRAIN

Page 29: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 29

MASSIVELY PARALLEL WORKLOADS

• Can keep growing the problem size N

• The serial workload s can be kept constant

• The parallel workload is speeded up by P, the number of cores

• The reduction overhead is proportional to log P (by a factor of r)

• "Embarrassingly" parallel, when there is no reduction overhead (r=0)

N/P r log P

N

s

s

Time saved by P cores

Page 30: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 30

Fron

t End

Fron

t End

Fron

t End

Fron

t End

Fron

t End

Fron

t End

ALU

ALU

ALU

ALU

ALU

ALU

THE ELEPHANTS: CPU CORES FOR MULTIPLE-INSTRUCTION-MULTIPLE-DATA (MIMD) EXECUTION

A CPU core runs 1 iteration of the parallel loop

The same color means the same piece of code

Fron

t End

Fron

t End

Fron

t End

Fron

t End

Fron

t End

Fron

t End

ALU

ALU

ALU

ALU

ALU

ALU

Retrofitted for moderately parallel workloads, and not very efficient for massively parallel workloads Parallel.For (i)

If (…)

… Else

Page 31: Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 31

DISCLAIMER & ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION

© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners.