Top Banner
Power management in embedded multi-core architectures Karim Ben Chehida , Raphaël David [email protected] CEA LIST, Saclay
17

Power management in embedded multi-core architectures · Power management in embedded multi-core architectures ... CPU Mgnt Scheduling PE Ctrl Fault-tolerance ... For real time DPM

May 11, 2018

Download

Documents

truongkien
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Power management in embedded multi-core architectures · Power management in embedded multi-core architectures ... CPU Mgnt Scheduling PE Ctrl Fault-tolerance ... For real time DPM

Power management in embedded multi-core architectures

Karim Ben Chehida, Raphaël David [email protected]

CEA LIST, Saclay

Page 2: Power management in embedded multi-core architectures · Power management in embedded multi-core architectures ... CPU Mgnt Scheduling PE Ctrl Fault-tolerance ... For real time DPM

Outline

•  Application and architectural trends

•  The SCMP architecture

•  Former LIST’s power management solutions

•  New challenges

" 2

Page 3: Power management in embedded multi-core architectures · Power management in embedded multi-core architectures ... CPU Mgnt Scheduling PE Ctrl Fault-tolerance ... For real time DPM

Embedded Applications trends

•  Embedded systems must support various applications •  They need more computing power

1 GOPS

0.1

10

100

1 TOPS

HD Audio

Multimedia

OpenGL1.1

OpenGL 2.0

H264

Digital TV

Mobile multimedia

MPEG2

3D Graphics

UMTS

EDGE

GPRS

GSM

WIMAX

3GPP-LTE

SDR

Telecom

DVB-S2

"  3

Page 4: Power management in embedded multi-core architectures · Power management in embedded multi-core architectures ... CPU Mgnt Scheduling PE Ctrl Fault-tolerance ... For real time DPM

Task management in MPSOC architectures for Embedded applications

•  Applications are more and more dynamic   Dynamic control flow   Data-dependent processing

•  System becomes less and less predictible   Variability   Defects   Aging

•  Dynamic Load balancing is needed to improve processor utilization rate

Connected-component labeling algorithm

3.8 ms

1.3 ms

~x3

1 1,5

2 2,5

3 3,5

4

1 83 165 247 329 411 493 575 657 739 821 903 985 1067 1149 1231 1313 1395 1477 Images

Exec

utio

n tim

e (m

s)

Connected component labeling execution time for a parallelization on 8 processors (128 tasks)

8

2

1

5 4

7

3

6

"  4

Page 5: Power management in embedded multi-core architectures · Power management in embedded multi-core architectures ... CPU Mgnt Scheduling PE Ctrl Fault-tolerance ... For real time DPM

"  5

•  Software management of tasks is however not for free   Low reactivity   Low transistor and silicon efficiency   Overhead hardly predictable

because of its dependencies regarding workload

•  Benefits of hardware acceleration   Overlapping between control and

computation activities   Determinism   Reactivity   Low cost

Task management implementation issues

The Scheduler and the time tick processing overheads in MicroC/OS-II on a PowerPC, A Configurable Scheduler for Real-Time Systems – ERSA03

Tâche Tâche Tâche

System interface

Processeur

API

Task

Application

Interrupt/ signaling

HMI

Tâche Tâche Tâche task

Application

Processeur Processeur Procesors

Allocation and scheduling

Synchronizations

Memory Mgnt

File mgnt

System messaging

Task mgnt Time Mgnt

IO mgnt Internal com mgnt

Resources sharing

HAL

NoC

Memory

PE

Memory

PE Control

Memory

OS (Host

processor)

Computing

HW-RTOS

Page 6: Power management in embedded multi-core architectures · Power management in embedded multi-core architectures ... CPU Mgnt Scheduling PE Ctrl Fault-tolerance ... For real time DPM

Hardware support for task management

•  Full Hardware solution   For asymmetric approaches

  May need several 100kgate but support very aggressive real time scheduling approaches

•  HW accelerated SW solution   For SMP systems

  Less than 100kgate   Not so smart but allows secured

sharing of system information and centralized signalization schemes

•  Mixed approach   For multi-purpose asymmetric approaches   Based on a small RISC processor with optimized coprocessor interface

PE and Memory

Allocation Selection

Control Interface / CPL (CI)

Task Exec. and

Synch.

CPU Mgnt

Scheduling PE Ctrl

Fault-tolerance Mgr

L1

Interconnect

Core

L1

Core

L1

Memory Interface

On/O

ff control

Cluster monitors

Core

Cluster monitors

Event/ Interrupt Manager

It Mgr C/S Regs

Fault-Tolerance C/S Registers

Power Mgt

Internal Registers

Prog. Notifier C/S Registers

Programmable Notifier

Synchronization C/S Registers

Synchro Regs Updater

Power C/S regs

Mem

ory

(L2,

L3, E

xter

n)

"  6

Page 7: Power management in embedded multi-core architectures · Power management in embedded multi-core architectures ... CPU Mgnt Scheduling PE Ctrl Fault-tolerance ... For real time DPM

SCMP: a new architecture for dynamic applications

•  New execution and programming model   Simplify the task parallelism management   Optimize the PE occupancy (load balancing)   Asymmetric architecture with a global control   Fast preemption and migration of tasks   Explicit separated execution between control and computation

  Low control overhead (HW accelerator)

  Specific memory management   Physically distributed and logically shared   Write exclusive accesses   Data and instruction virtualization

CPU Operating System (real-time)

SCMP

System Bus

Interconnection Resource

PE mem instr mem data

PE mem instr mem data

mem

controller I/O

I/O

mem mem mem

PE mem instr mem data

mem mem mem instr instr instr instr instr

Memory Configu-ration and Manage-ment Unit

GA GA GA GA

controller I/O

GA

I/O

OSoC : HW Controller

"  7

Page 8: Power management in embedded multi-core architectures · Power management in embedded multi-core architectures ... CPU Mgnt Scheduling PE Ctrl Fault-tolerance ... For real time DPM

Power Management strategies in SCMP

•  Use all hardware support   Idle modes   Variable Voltage/frequency

•  Adapt power management strategies to application needs   Real Time   Best effort in dataflow applications

Choice of the ideal low power mode TIDLE

si

di

ai

Time

Power

PSMi

Sleep Wake up

Choice of the ideal slow down factor TIDLE

di

ai

Time

Power

IDLE

Speed up Slow down

ai : arrival time

si : WCET

di : deadline

si

di

ai

Time

Power

IDLE

"  8

Page 9: Power management in embedded multi-core architectures · Power management in embedded multi-core architectures ... CPU Mgnt Scheduling PE Ctrl Fault-tolerance ... For real time DPM

–  Low Power State Parameters :

•  PS = State Power Consumption.

•  ETR = Transition energy consumption

•  TTR = Transition latency

•  TBE = Break-Event Time: minimum time to spend in state S so that the transition become interesting.

PXA270 PSM

Time

Power

IDLE

What is the Idle period on processor P (predict the future !!!)

What is the ideal low power mode S corresponding to the predicted idle time ???

S

Sleep Wake up

The DPM techniques – Generalities

" 9

Idle mode management

Page 10: Power management in embedded multi-core architectures · Power management in embedded multi-core architectures ... CPU Mgnt Scheduling PE Ctrl Fault-tolerance ... For real time DPM

Idle mode management

•  In real Time Systems   Characterize offline (using WCET) the variation of the

parallelism rate of each application   Detect online these variations and deduce the Idle periods   Activate the ideal low power mode (modified LEA) and predict

the corresponding awakening time

•  In dataflow mode   Switch to idle mode as soon as data buffer reaches a critically

low filling rate threshold   Awake whenever data buffer has been filled enough

"  10

Page 11: Power management in embedded multi-core architectures · Power management in embedded multi-core architectures ... CPU Mgnt Scheduling PE Ctrl Fault-tolerance ... For real time DPM

Voltage and frequency scaling

•  In real time applications   Take benefits of slack times

  Limit the amount of Voltage/frequency modification   Accumulate slacks until being able to activate DVFS for a long period of time

  Assign the slack to the next task on the same processor (% resource dependency) to avoid wasting slack time when reaching joining points in applications

•  In dataflow applications   Adapt production and consumption rates to avoid stalls when pipeline is

unbalanced

EDVFS = 4,5 x 925mW + 6,5 x 116mW = 4,916J

-52% Joules

D1 D2 D0 T2 T1 T0

t(s)

925 116

0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Eth = 11 x 925mW = 10,175J T1 T2 T0 D1 D2 D0

925 0 t(s) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

T1 T2 T0 D1 D2 D0

925 0 t(s) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Excess

"  11

Page 12: Power management in embedded multi-core architectures · Power management in embedded multi-core architectures ... CPU Mgnt Scheduling PE Ctrl Fault-tolerance ... For real time DPM

SCMP proptotype

•  Complete FPGA prototype   75 MHz prototypes on FCM4 boards from Scaleo Chip (StratixIII based)   Including OSoC and additionnal power management accelerators

  For real time DPM and DVFS management

  4 Processing Elements   Based on sparc processors   With Data Prefetch Engines + standard cache memories

  Multibanked memory system   With HW support for dynamic allocation

A H B

PE0 PE1 PE2 PE3

Selection $D $D $D $D

$D $D

RAM DPS TLB

DPS TLB

DPS TLB

DPS TLB

GPTimer UART Debug

Host

M U L T I - B U S

$I

$I

$I

$I ITMP

UGM 0 ms

10 ms 20 ms 30 ms 40 ms 50 ms 60 ms 70 ms

1 2 4 8

Run

ning

tim

e

PE number

scheduling wait for data running labelling

"  12

Page 13: Power management in embedded multi-core architectures · Power management in embedded multi-core architectures ... CPU Mgnt Scheduling PE Ctrl Fault-tolerance ... For real time DPM

Outline

•  Application and architectural trends

•  The SCMP architecture

•  Former LIST’s power management solutions

•  New challenges

" 13

Page 14: Power management in embedded multi-core architectures · Power management in embedded multi-core architectures ... CPU Mgnt Scheduling PE Ctrl Fault-tolerance ... For real time DPM

Application domain specific systems

•  Functional Heterogeneity   How to support the load balancing in this context ?

  Data access modes are IP specific   Binaries are IP specific   Performance predictions are IP specific   …

  How to manage the power in this context ?

  Very simple and drastic (ON-OFF)   Selective clock and power gating at application sub-system level

"  14

Page 15: Power management in embedded multi-core architectures · Power management in embedded multi-core architectures ... CPU Mgnt Scheduling PE Ctrl Fault-tolerance ... For real time DPM

•  A many-core fabric   Homogeneous or Heterogeneous resources   Not clustered by application

•  The fabric dynamic management allows to support:   Faults (remanant or transient, aging…)   Complex applications not fully predictable at compile time

  Load balancing   Power Management

•  Possible approaches for complexity management

  Globally static locally dynamic management:

  Dynamic application deployment,   no migration between clusters

  Globally dynamic management   Dynamic application deployment,   possible inter-cluster migration Or   Dynamic task deployment (at task creation)

"  15

From multi- to many-cores

Core Core Core Core

L1 L1 L1 L1

L2

Core Core Core Core

L1 L1 L1 L1

Synchronizer

XFC NI

Core Core Core Core

L1 L1 L1 L1

L2

Core Core Core Core

L1 L1 L1 L1

Synchronizer

XFC NI

Core Core Core Core

L1 L1 L1 L1

L2

Core Core Core Core

L1 L1 L1 L1

Synchronizer

XFC NI

Core Core Core Core

L1 L1 L1 L1

L2

Core Core Core Core

L1 L1 L1 L1

Synchronizer

XFC NI

Core Core Core Core

L1 L1 L1 L1

L2

Core Core Core Core

L1 L1 L1 L1

Synchronizer

XFC NI

Core Core Core Core

L1 L1 L1 L1

L2

Core Core Core Core

L1 L1 L1 L1

Synchronizer

XFC NI

Core Core Core Core

L1 L1 L1 L1

L2

Core Core Core Core

L1 L1 L1 L1

Synchronizer

XFC NI

Core Core Core Core

L1 L1 L1 L1

L2

Core Core Core Core

L1 L1 L1 L1

Synchronizer

XFC NI

Core Core Core Core

L1 L1 L1 L1

L2

Core Core Core Core

L1 L1 L1 L1

Synchronizer

XFC NI

Core Core Core Core

L1 L1 L1 L1

L2

Core Core Core Core

L1 L1 L1 L1

Synchronizer

XFC NI

Core Core Core Core

L1 L1 L1 L1

L2

Core Core Core Core

L1 L1 L1 L1

Synchronizer

XFC NI

Core Core Core Core

L1 L1 L1 L1

L2

Core Core Core Core

L1 L1 L1 L1

Synchronizer

XFC NI

Core Core Core Core

L1 L1 L1 L1

L2

Core Core Core Core

L1 L1 L1 L1

Synchronizer

XFC NI

Core Core Core Core

L1 L1 L1 L1

L2

Core Core Core Core

L1 L1 L1 L1

Synchronizer

XFC NI

Core Core Core Core

L1 L1 L1 L1

L2

Core Core Core Core

L1 L1 L1 L1

Synchronizer

XFC NI

Core Core Core Core

L1 L1 L1 L1

L2

Core Core Core Core

L1 L1 L1 L1

Synchronizer

XFC NI

Core Core Core Core

L1 L1 L1 L1

L2

Core Core Core Core

L1 L1 L1 L1

Synchronizer

XFC NI

Core Core Core Core

L1 L1 L1 L1

L2

Core Core Core Core

L1 L1 L1 L1

Synchronizer

XFC NI

Core Core Core Core

L1 L1 L1 L1

L2

Core Core Core Core

L1 L1 L1 L1

Synchronizer

XFC NI

Core Core Core Core

L1 L1 L1 L1

L2

Core Core Core Core

L1 L1 L1 L1

Synchronizer

XFC NI

Page 16: Power management in embedded multi-core architectures · Power management in embedded multi-core architectures ... CPU Mgnt Scheduling PE Ctrl Fault-tolerance ... For real time DPM

•  Advanced process variability   From a design based on the worst case to a design based on the average case

with online adaptation   For each processor, the frequency is evaluated and corrected online

  Implies a GALS functioning model   DVFS modes support tends to be generalized in complex systems   Need a low footprint HW support for voltage and frequency adaptation (from DC-DC

VDD Hoping)

  High performance heterogeneity   Need to reconsider performance homogeneity hypothesis in the large scheduling

literature   Tighter coupling of Power management policies with the allocation phase:

 With the Load balancing policies  With the system monitoring  with the thermal management policies

•  Effect of the temperature on the static power consumption overall power consumption

•  Effect of the (frequency, voltage) operating point on the temperature

"  16

From multi- to many-cores

Page 17: Power management in embedded multi-core architectures · Power management in embedded multi-core architectures ... CPU Mgnt Scheduling PE Ctrl Fault-tolerance ... For real time DPM

•  The transition from multi-core to many core systems is not straightforward as far as the dynamic system management is concerned :   HW acceleration of control primitives can lead to high controllability

  At the lower hierarchy level for witch it can be well dimensioned

•  Power management policies targeting many-core systems must take into account the process variability   Reactive and proactive policies can be considered

•  Many core system dynamic power management is more tightly coupled to the Task allocation phase   With or without Load balancing techniques

•  The effect of temperature on the power consumption and vice versa is not well estimated for 32nm and beyond …   Benefits of coupling the Thermal and power management techniques…

" 17

Conclusion