Top Banner
EUROPE 2012 (LCE12) big.LITTLE mini-summit Amit Kucheria, Power Management Tech Lead LCE 2012, Copenhagen
18

LCE12: big.LITTLE Mini-Summit

Jun 13, 2015

Download

Technology

Linaro

Resource: LCE12
Name: big.LITTLE Mini-Summit
Date: 01-11-2012
Speaker: Amit Kucheria
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: LCE12: big.LITTLE Mini-Summit

EUROPE 2012 (LCE12)

big.LITTLE mini-summitAmit Kucheria, Power Management Tech Lead

LCE 2012, Copenhagen

Page 2: LCE12: big.LITTLE Mini-Summit

EUROPE 2012 (LCE12)

www.linaro.org

Asymmetric cores

CapacityEnergyLatenciesOperating pointsCache types

Page 3: LCE12: big.LITTLE Mini-Summit

EUROPE 2012 (LCE12)

www.linaro.org

In-kernel Switcher (IKS)

ProsMinimal kernel changes Available now through Linaro

ConsHalf the cores used

Page 4: LCE12: big.LITTLE Mini-Summit

EUROPE 2012 (LCE12)

www.linaro.org

Heterogenous MP (HMP)

ProsAll cores can be used

ConsLarge changes to Linux kernelProduction-ready only next year

Basic feature-set for partners 1Q 2013Upstreaming - several monthsOptimisations

Page 5: LCE12: big.LITTLE Mini-Summit

EUROPE 2012 (LCE12)

www.linaro.org

Being a catalyst...

Solving long standing problemsBetter CPU qiesceBetter scheduling

Useful for SMP (A9, A15)

Page 6: LCE12: big.LITTLE Mini-Summit

EUROPE 2012 (LCE12)

www.linaro.org

Mini-summit agenda

Plenary – Robin RandhawaWhirlwind tour of experimental results on TC2

Session 1 (09:00 – 09:55)Status overviewMaking Linux work with asymmetric systems

Session 2 (10:00 – 10:45)The Bluesky session: What would the ideal power-aware kernel do? (45 mins)

Session 3 (11:00 – 11:55)Back to reality: What do we have today and the sequence of steps to get to where we want to be (55 mins)

Session 4 (12:00 – 13:00)Workloads and Test Automation (30 mins)General Discussions on further work and Wrap-Up (30 mins)

Page 7: LCE12: big.LITTLE Mini-Summit

7

big.LITTLE on TC2

Robin Randhawa

Page 8: LCE12: big.LITTLE Mini-Summit

ARM’s Test Chip 2 (TC#2): An Overview

A Versatile Express core tile publically available: Capabilities

2 x A15 (r2p1) @ up to 1.2 Ghz

3 x A7 (r0p1) @ up to 1Ghz

CCI/DMC/GIC/ADB (r0p0)

DMA (PL330)

2GB external DDR2 memory @ 400Mhz

64k internal SRAM

Coresight debug (including JTAG and ITM trace but no STM)

No GPU

cpufreq support: Independent for each cluster with limited voltage scaling

cpuidle support: Cluster power gating

TC2

Page 9: LCE12: big.LITTLE Mini-Summit

IKS: CPU Migration

big.LITTLE extends DVFS DVFS algorithm monitors load on each

CPU

When load is low it can be handled on a LITTLE processor

When load is high the context is transferred to a big processor

The unused processor can be powered down

When all processors in a cluster are inactive the cluster and its L2 cache can be powered down

Page 10: LCE12: big.LITTLE Mini-Summit

IKS: CPU Migration

big.LITTLE extends DVFS DVFS algorithm monitors load on each

CPU

When load is low it can be handled on a LITTLE processor

When load is high the context is transferred to a big processor

The unused processor can be powered down

When all processors in a cluster are inactive the cluster and its L2 cache can be powered down

Page 11: LCE12: big.LITTLE Mini-Summit

11

IKS: Results for Audio on TC2

Power compared to executing the use case on A15

IKS does not use A15s during Audio run

70% saving

TC2:A15 up to 1.2 GHzA7 up to 1 GHzBetter results expected on representative silicon.

Page 12: LCE12: big.LITTLE Mini-Summit

12

IKS: Results for BBench + Audio on TC2

Performance is measured as from page loading times of BBench

Results normalised to power and performance consumed on same use case run on A15 only

BBench page + Audio

TC2:A15 up to 1.2 GHzA7 up to 1 GHzBetter results expected on representative silicon.

Page 13: LCE12: big.LITTLE Mini-Summit

13

IKS: Hispeed2

Page 14: LCE12: big.LITTLE Mini-Summit

14

IKS: Results: Bbench + Audio

Power improves with no performance cost

BBench page + Audio

TC2:A15 up to 1.2 GHzA7 up to 1 GHzBetter results expected on representative silicon.

Page 15: LCE12: big.LITTLE Mini-Summit

15

MP solution – more details

Scheduler modifications: Treat big and LITTLE cpus as

separate scheduling domains.

Use PJT's load-tracking patches to track individual task load.

Migrate tasks between the big and the LITTLE domains based on task load.

L BBL

Load balance Load balance

Load-based task migration

Task load

Task state

Executing Sleep

Load decay

Page 16: LCE12: big.LITTLE Mini-Summit

16

MP: ARM TC2: Audio

Workload: Audio (mp3 playback)

Performance/Energy target: A7 energy

Status: Audio related task do not use A15s, but

the power consumption is still significantly more than A7 alone.

MP not as power efficient as IKS yet

Todo: Target spurious wake-ups on A15. All

the extra power comes from the A15's which shouldn't be used at all. Energy

A7 30.79%

MP 39.86%

0

10

20

30

40

50

60

70

80

90

100Audio

A15A7 2CPUIKSMP

En

erg

y

Page 17: LCE12: big.LITTLE Mini-Summit

17

MP: Audio workload analysis

Where is the extra energy spent with MP? Need a look at why A15's consume

power when they are not necessary

We see unwarranted wake ups on A15 No user threads running on A15

Tend to favour CPU0

Examples:

tick_sched_timer (99.7% on CPU0)

Hrtimers

WorkqueueA7 MP

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Audio energy breakdown

A15 clusterA7 cluster

En

erg

y

Page 18: LCE12: big.LITTLE Mini-Summit

18

MP – Top Issues

Spurious wakeups A15s are woken up by scheduler ticks (mainly)

Workqueues

Timers

RCU

Scheduler ticks

cpu wakeup prioritisation Pick the cheapest target cpu

Balancing Scale invariance

Load accumulation rate

Spread load to A7s when A15s are overloaded

Pack vs. spread

Cluster aware cpufreq governors