A Power-Aware Online Scheduling Algorithm for Streaming Applications in Embedded MPSoC PATMOS 2010: 7-10 September 2010,Grenoble,France T. Sassolas, N. Ventroux, G. Blanc CEA LIST, Embedded Computing Laboratory contact: [email protected]
A Power-Aware Online Scheduling Algorithm for Streaming
Applications in Embedded MPSoC
PATMOS 2010: 7-10 September 2010,Grenoble,France
T. Sassolas, N. Ventroux, G. Blanc
CEA LIST,Embedded Computing Laboratory
contact: [email protected]
Table of content
• Context
• Previous works
• Proposed solution
• Implementation
• Results
• Conclusion
2
Table of content
• Context
• Previous works
• Proposed solution
• Implementation
• Results
• Conclusion
3
Context of the Study• Embedded systems must support
Various application domainsmore computation intensive applicationsApplication that become more and more dynamic
• Move to multiprocessor Architectures
1 GOPS
0.1
10
100
1 TOPS
HD Audio
Multimedia
OpenGL1.1
OpenGL 2.0
H264
Digital TV
Mobile multimedia
MPEG2
3D Graphics
UMTS
EDGE
GPRS
GSM
WIMAX
3GPP-LTESDR
Telecom
DVB-S2
4
T3D2
T3 D3
T3 D1
Multiprocessor issues• Need to maximize resource usage
Increase task parallelismStreaming applicationsSet of tasks with data dependenciesScheduling of dependent tasksExecution speed determined by slowest task
5
T3D2
T3 D3
T3 D1
Worst case executionSome energy savings can be performed
T1 T2 T3in out
• Need to reduce power consumption Real case executionDynamism implies loss of energyNeed of a dynamic control
T1 D3
T2D2
T2D3
T2D4
T1 D4
T1 D5
P2
P1
P0
time
T1 D2
T2D1
T1 D2
Slack Slack SlackSlack
DVFS vs DPM• Dynamic Voltage and Frequency Scaling (DVFS)
Low mode switching penaltyReduces mainly dynamic power consumption
T1 T2
time
Power
T1T2
time
Power
6
• Dynamic power management (DPM)High energy and time switching penaltyReduces both static and dynamic power consumption
• Optimal functioning points are highly dependent on the technological process
Table of content
• Context
• Previous works
• Proposed solution
• Implementation
• Results
• Conclusion
7
Streaming schedulings: offline solutions• Scheduling on a multiprocessor is an NP complete problem [1]• Adding power optimization adds complexity
• Monoprocessor solutions [2] [3] …Find minimum power consumption given data production rate and communication buffer sizesWith DPM or DVFS functionalitiesVariable production rate following probability rule
• Multiprocessor solutionsMinimize energy consumption by finding optimum number of resources and their speed to meet QoS requirements [4]Various models : communication costs, consumption model, optimization techniques… [5]
• But regular workload was assumed: Application dynamism imposes online solutions
8
Streaming scheduling: Online solutions
• MonoprocessorSlack time reclamation: GSR [6]Offline and online partitioning
9
T1 D2
T2D2
T3 D2
T1 D2
T2D2 T3 D2
T1 D2
T2D2
T4 D2
T3D2
Resulting execution : No slack time! Buffer added
• MultiprocessorMany solutions for independent tasks -> do not applyPartitioning -> apply monoprocessor solution to every processor [7] [8]
T0
T1
T3
T2
out
in
T1 D2
T2D2 T4 D2
T3D2
time time
P1
P0 P0
P1
P0
P0
Table of content
• Context
• Previous works
• Proposed solution
• Implementation
• Results
• Conclusion
10
Power-aware streaming application scheduling
• PropertiesThroughput constrained by slowest task
» Other tasks can be slowed down to reach the same throughput -> DVFS
Task deeper in the pipeline can be blocked waiting for availabledata
» Preemption mechanisms are required for a higher resource usage rate» Unused resources can be shut down -> DPM
• Objective: keep the throughput while making substantial energy savings
11
Static Priorities
• If PE number < task number : need to specify static priorityDescribes the position in the pipelineAllows to execute oldest data first.Prevents to buffer instead of executing critical tasks
12
T0
T1
T3 T4
T2
in out
Prio = 0 Prio = 1 Prio = 2 Prio = 3
buffers monitors
Buffer full threshold : Preempt Writer
Buffer empty threshold: Preempt Reader(s)
Buffer filling threshold : reduce DVFS couple of Writer
Buffer emptying threshold: increase DVFS couple of Writer
Change QoS
Change QoS
13
• Priority impact Task is blockedTask executes at fastest speedApplication priorityTask priority
Table of content
• Context
• Previous works
• Proposed solution
• Implementation
• Results
• Conclusion
14
Consumption model• SESAM[9] simulation environment
SystemCAT-TLMIP: Noc, caches, memories…Processors ArchC ISS [10]Statistics
TurboA=1,B=1
Consumption 923 mW
Half-TurboA=1,B=2
Consumption 390 mW
Deep IdleA=0,B=1
Consumption 64 mW
1 µs1 µs
2 µs
3 µs
2 µs3 µs
15
• Modified ArchC modelsMIPS32 ISS annotatedwith PXA270 [11] PSMmode power consumptionExecution speed variationMode switching penalties
» Energy» t ime
Implemented platform
16
Centralmemory
CPUController
Processing elements
Shared memory banks
SchedulingAlgorithm
Task 1 Task 2
T1 T2in
T1T2
D1
D1
Threshold reached
Task 1
The scheduling loop
17
Update Dynamictask priorities
Buffer Statuses
Task Statuses
Order task alongwith priority
Keep already allocatedtasks on the same PE
Allocate remainingtasks on remaining PE
Update PEconsumption mode
along with buffer status
Execution / preemption demands
Mode switching demands
Table of content
• Context
• Previous works
• Proposed solution
• Implementation
• Results
• Conclusion
18
The WCDMA test case• Wideband code division multiple access application [12]
13 tasksVariable workload : pilot frame once every 10 framesIrregular pipeline task lengths
19
Results – Energy saving• 3 scheduling solutions: Standard, DPM only and DVFS + DPM• Substantial energy saving
0
10
20
30
40
50
60
70
80
90
100
1 2 4 8 13 16
Ener
gy sa
ving
(%)
PE ef
fect
ive
occu
panc
y (%
)
Number of PE
Standard dpm only dpm + dvfsenergy saving dpm only energy saving dpm+dvfs
20
Results – Execution time• No deviation in execution time
21
Results – pipeline balancing• Blocked states are reduced by the use of DVFS• More could be achieved with other DVFS couples
22
Table of content
• Context
• Previous works
• Proposed solution
• Implementation
• Results
• Conclusion
23
Conclusion
• Power reduction for variable pipeline Substantial powers saving when PE load drops : 45% on 13 processorsNo performance lossLight execution to reduce control overhead
• This work was partly funded by project SCALOPES (ARTEMIS)
• Upcoming worksImplementation on hardware multiprocessor platformEvaluation with other applications from various domainsEvaluation of optimal buffer sizes
24
References
[1] M. L. Dertouzos and A. K. Mok. Multiprocessor Online Scheduling of Hard-Real-Time Tasks. IEEE Transactions on Software Engineering, 15(12):1497-1506, 1989.[2] Y.-H. Lu, L. Benini, and G. De Micheli. Dynamic Frequency Scaling with Buffer Insertion for Mixed Workloads. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 21(5):1284-1305, 2002.[3] N. Pettis, L. Cai, and Y.-H. Lu. Statistically Optimal Dynamic Power Management for Streaming Data. IEEE Transactions on Computers, 55(7):800-814, 2006.[4] Xu, R., Melhem, R., and Mosse, D. 2007. Energy-Aware Scheduling for Streaming Applications on Chip Multiprocessors. In Proceedings of the 28th IEEE international Real-Time Systems Symposium (RTSS),pages 25-38, 2007.[5] L. Benini, D. Bertozzi, A. Guerri, and M. Milano. Allocation, Scheduling and Voltage Scaling on Energy Aware MPSoCs. In Conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems(CPAIOR), pages 44-58, 2006.[6] D. Mosse, H. Aydin, B. Childers and R. Melhem. Compiler-Assisted DynamicPower-Aware Scheduling for Real-Time Applications, In Workshop on Compilers and Operating Systems for Low Power, 2000.
25
References
[7] P. Choudhury, P. P. Chakrabarti, and R. Kumar. Online Dynamic Voltage Scalingusing Task Graph Mapping Analysis for Multiprocessors. In International Conferenceon VLSI Design (VLSID), pages 89-94, 2007.[8] S. Hua, G. Qu, and S. S. Bhattacharyya. Energy-Ecient Embedded Software Implementation on Multiprocessor System-on-Chip with Multiple Voltages. ACM Transactions on Embedded Computing Systems (TECS), 5(2):321-341, 2006.[9] N. Ventroux, A. Guerre, T. Sassolas, L. Moutaoukil, C. Bechara, and R. David. SESAM: an MPSoC Simulation Environment for Dynamic Application Processing. In IEEE International Conference on Embedded Software and Systems (ICESS),2010.[10] M. Bartholomeu G. Araujo C. Araujo R. Azevedo, S. Rigo and E. Barros. The ArchC Architecture Description Language and Tools. Parallel Programming, 33(5):453–484, 2005.[11] Intel PXA27x Processor Family, Electrical, Mechanical, and Thermal Specication,2005.[12]A. Richardson. WCDMA Design Handbook. 2006.
26
Thank you for your attention
We value your opinion and questions