Top Banner
Michel Goraczko, Jie Liu (Microsoft Research, Redmond) Dimitrios Lymberopoulos (Yale University) Slobodan Matic (UC Berkeley) Bodhi Priyantha Feng Zhao (Microsoft Research, Redmond) Presentation at DAC 2008, Anaheim, CA June 10 th , 2008 1 Energy-Optimal Software Energy-Optimal Software Partitioning in Partitioning in Heterogeneous Heterogeneous Multiprocessor Embedded Multiprocessor Embedded Systems Systems
18

Michel Goraczko, Jie Liu (Microsoft Research, Redmond) Dimitrios Lymberopoulos (Yale University) Slobodan Matic (UC Berkeley) Bodhi Priyantha Feng Zhao.

Mar 27, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Michel Goraczko, Jie Liu (Microsoft Research, Redmond) Dimitrios Lymberopoulos (Yale University) Slobodan Matic (UC Berkeley) Bodhi Priyantha Feng Zhao.

Michel Goraczko, Jie Liu (Microsoft Research, Redmond)

Dimitrios Lymberopoulos (Yale University)

Slobodan Matic (UC Berkeley)

Bodhi Priyantha Feng Zhao (Microsoft Research, Redmond)

Presentation at DAC 2008, Anaheim, CA

June 10th, 2008

1

Energy-Optimal Software Energy-Optimal Software Partitioning in Partitioning in HeterogeneousHeterogeneousMultiprocessor Embedded Multiprocessor Embedded SystemsSystems

Page 2: Michel Goraczko, Jie Liu (Microsoft Research, Redmond) Dimitrios Lymberopoulos (Yale University) Slobodan Matic (UC Berkeley) Bodhi Priyantha Feng Zhao.

Energy Usage in Embedded ApplicationsEnergy Usage in Embedded Applications

Low duty cycle monitoring for long battery life

High throughput for realtime critical events processing.

Mobile devicesPatient monitoring Smart environments

Page 3: Michel Goraczko, Jie Liu (Microsoft Research, Redmond) Dimitrios Lymberopoulos (Yale University) Slobodan Matic (UC Berkeley) Bodhi Priyantha Feng Zhao.

Energy Performance DiversityEnergy Performance Diversity

• A single processor with DVFS may not be flexible enough.

― Energy efficiency in embedded processors

― Non-trivial wake-up latency and energy costs

Benchmark Platform Execution TimeEnergy Relative Speed Relative EfficiencyPXA255 24.8us 12.9uJ 207.7 28.4ARM7 330us 49.2uJ 15.6 7.5Atmega 5.15ms 367uJ 1 1PXA255 325us 166.9uJ 516.9 70.7ARM7 4.8ms 699uJ 35 17.6Atmega 168ms 11.8mJ 1 1PXA255 94.5us 45.8uJ 153.4 20.4ARM7 1.2ms 187uJ 12.1 5Atmega 14.5ms 934uJ 1 1

FFT

CRC-32

FIR

Page 4: Michel Goraczko, Jie Liu (Microsoft Research, Redmond) Dimitrios Lymberopoulos (Yale University) Slobodan Matic (UC Berkeley) Bodhi Priyantha Feng Zhao.

Heterogeneous Multi-Processor PlatformsHeterogeneous Multi-Processor Platforms

UCLA LEAP Platform MSR mPlatform

Page 5: Michel Goraczko, Jie Liu (Microsoft Research, Redmond) Dimitrios Lymberopoulos (Yale University) Slobodan Matic (UC Berkeley) Bodhi Priyantha Feng Zhao.

OutlineOutline

Introduction

Design Flow

Power State Machine

ILP Formulation and Optimization

A Sound Source Localization Case Study

Page 6: Michel Goraczko, Jie Liu (Microsoft Research, Redmond) Dimitrios Lymberopoulos (Yale University) Slobodan Matic (UC Berkeley) Bodhi Priyantha Feng Zhao.

Software Partitioning ProblemSoftware Partitioning Problem

Given a time sensitive application, allocate software components to different processors to minimize energy consumption without violating timing constraints.

Given a time sensitive application, allocate software components to different processors to minimize energy consumption without violating timing constraints.

TasksProcessor

modes

TimingAnalysi

s

Task timing

Partitioning

Applicationstructure/

requirementsPower model

Task-Processor-Modeassignments

Page 7: Michel Goraczko, Jie Liu (Microsoft Research, Redmond) Dimitrios Lymberopoulos (Yale University) Slobodan Matic (UC Berkeley) Bodhi Priyantha Feng Zhao.

Power State MachinesPower State Machines

STBYPower: ~0 mW

STBYPower: ~0 mW

IDLEPower:

0.25mW

IDLEPower:

0.25mW

60MHzPower: 141

mW

60MHzPower: 141

mW

30MHzPower: 72 mW

30MHzPower: 72 mW

7.5MHzPower: 20 mW

7.5MHzPower: 20 mW

negligiblenegligible

negligible

1.53 mJ

24.5 ms

0.1 mJ

1.4 ms

1.47 mJ

23.8 ms

Page 8: Michel Goraczko, Jie Liu (Microsoft Research, Redmond) Dimitrios Lymberopoulos (Yale University) Slobodan Matic (UC Berkeley) Bodhi Priyantha Feng Zhao.

Software ModelSoftware Model

Directed acyclic graph of tasks

Single-rate periodic execution

Known release time

Known end-to-end deadline

Worst case execution time:

Pre-assignments

mpT ,,

Page 9: Michel Goraczko, Jie Liu (Microsoft Research, Redmond) Dimitrios Lymberopoulos (Yale University) Slobodan Matic (UC Berkeley) Bodhi Priyantha Feng Zhao.

ILP: Variables and ObjectiveILP: Variables and Objective

Core binary variables task-to-processor assignment; task-to-mode assignment; task transition assignment;

Core integer variables task start time instances;

Derived variables: In order to convert the problem into ILP formulations, need to

further introduce auxiliary variables.

Objective: minimize total energy per iteration

)( ,mpnnO

)( nO

))(( mp nnnnO

Page 10: Michel Goraczko, Jie Liu (Microsoft Research, Redmond) Dimitrios Lymberopoulos (Yale University) Slobodan Matic (UC Berkeley) Bodhi Priyantha Feng Zhao.

ILP: ConstraintsILP: Constraints

A task can only be allocated to one processor and one mode;

A processor can only execute one task at any time;

Waking up from sleep modes takes time;

Processor total utilization should be less than 1;

Tasks have dependencies with in an iteration;

Tasks have dependencies across iteration boundaries;

No task can start before its release time;

All tasks should finish by the deadline;

Page 11: Michel Goraczko, Jie Liu (Microsoft Research, Redmond) Dimitrios Lymberopoulos (Yale University) Slobodan Matic (UC Berkeley) Bodhi Priyantha Feng Zhao.

S – Audio Sampling

FFT – Fast Fourier Transform

SC – Noise Estimation &

Signal classification

HT – Hypothesis Testing

VOTE – Sound detection voting

Case StudyCase Study

FFT

FFT

FFT

FFT

SC

SC

SC

SC

VOTE HT

Sound Source LocalizationSound Source Localization

Page 12: Michel Goraczko, Jie Liu (Microsoft Research, Redmond) Dimitrios Lymberopoulos (Yale University) Slobodan Matic (UC Berkeley) Bodhi Priyantha Feng Zhao.

Hardware ModelHardware Model

Power Mode ARM7 @ 2.5V 60MHz full speed

MSP430 @ 3V6MHz full speed

Full speed 141 10.8

1/2 speed 72 2.7

1/8 speed 20 1.4

Idle 0.25 ~0

Standby ~0 ~0

ARM7 @2.5V MSP430 @3V

Wake up Energy (mJ) Time (ms) Energy (mJ) Time (ms)

To full speed 1.5 24.5 ~0 0.006

To 1/8 speed 0.1 1.4 ~0 ~0

Page 13: Michel Goraczko, Jie Liu (Microsoft Research, Redmond) Dimitrios Lymberopoulos (Yale University) Slobodan Matic (UC Berkeley) Bodhi Priyantha Feng Zhao.

Task ProfilingTask Profiling

Proc Mode FFT (ms) SC(ms) HT (ms)

ARM760MHz 7.8 4.4 111

30MHz 15.6 9.0 222

7.5MHz 39.6 23.3 567

MSP430

6MHz 99.2 37.2

3MHz 196 76

1.5MHz 394 152

0.75MHz 792 300

Page 14: Michel Goraczko, Jie Liu (Microsoft Research, Redmond) Dimitrios Lymberopoulos (Yale University) Slobodan Matic (UC Berkeley) Bodhi Priyantha Feng Zhao.

Partitioning Results (1)Partitioning Results (1)

Deadline: 128ms

Need 4 MSP430

ARM7 @ 60MHz

Total energy/iteration: 21.7mJ

Average power: 166.7mW

50 100 150

50 100 150

50 100 150

50 100 150

50 100 150

ARM760MHz

MSP-46MHz

MSP-3

6MHz

MSP-2

6MHz

MSP-1

6MHz

HTHT

FFTFFT SCSC

FFTFFT SCSC

FFTFFT SCSC

FFTFFT SCSC

Page 15: Michel Goraczko, Jie Liu (Microsoft Research, Redmond) Dimitrios Lymberopoulos (Yale University) Slobodan Matic (UC Berkeley) Bodhi Priyantha Feng Zhao.

Scheduling Results (2)Scheduling Results (2)

Deadline: 256ms

Need 2 MSP430

ARM7 @ 30MHz

Total energy/iteration: 22.1mJ

Average power: 86.4mW

50 100 150 200 256

50 100 150 200 256

50 100 150 200 256

50 100 150 200 256

50 100 150 200 256

ARM

30MHz

MSP4

6MHz

MSP3

6MHz

MSP2

6MHz

MSP1

6MHz

HTHT

FFTFFT SCSC

FFTFFT SCSC

FFTFFT SCSC

FFTFFT SCSC

Page 16: Michel Goraczko, Jie Liu (Microsoft Research, Redmond) Dimitrios Lymberopoulos (Yale University) Slobodan Matic (UC Berkeley) Bodhi Priyantha Feng Zhao.

Scheduling Results (3)Scheduling Results (3)

200 400

200 400

200 400

200 400 600 800 1000

200 400 600 800 1000

ARM7

7.5MHz

MSP4

6MHz

MSP3

6MHz

MSP2

6MHz

MSP1

6MHz

4xFFT4xFFT HTHT

SCSC

600 800 1000

600 800 1000

600 800 1000

SCSC

SCSC

SCSC

• Deadline: 1000ms

• Need 2 MSP430• ARM7 @ 7.5MHz• Total energy/iteration:

16.2mJ• Average power: 16.2mW

Page 17: Michel Goraczko, Jie Liu (Microsoft Research, Redmond) Dimitrios Lymberopoulos (Yale University) Slobodan Matic (UC Berkeley) Bodhi Priyantha Feng Zhao.

ConclusionConclusion

Processor diversities can help energy saving.

Wakeup time and energy must be considered in software partitioning.

Optimal software partitioning is NP–hard, but can be formulated as an ILP problem.

Page 18: Michel Goraczko, Jie Liu (Microsoft Research, Redmond) Dimitrios Lymberopoulos (Yale University) Slobodan Matic (UC Berkeley) Bodhi Priyantha Feng Zhao.

Limitations & Future WorkLimitations & Future Work

Execution time variations

Aperiodic tasks

Lightweight heuristics for online scheduling