Top Banner
Programming Tools for Embedded Multicore Jakob Engblom Technical Marketing Manager – Simics Wind River [email protected] | http://blogs.windriver.com/engblom/
33

Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Apr 26, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Programming Tools for Embedded MulticoreJakob EngblomTechnical Marketing Manager – SimicsWind River

[email protected] | http://blogs.windriver.com/engblom/

Page 2: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Disclaimer

These are my personal views on multicore and embedded

Nothing in this presentation should be interpreted as indicating the plan (or lack of plan) for products and product features in Wind River products

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-242

Page 3: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Embedded Multicore

Some Advantages

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-243

Page 4: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Software Dominates Development

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-244

1012

1010

108

106

104

102

1

Software-dominated systems industry

1960 1970 1980 1990 2000 2010 2020

Gates/chip 2x / 18monthsSW/chip: 2 x / 10 monthsSW Productivity: 2x HW/ 5 years

No. GatesLines of CodeNo. GatesLines of Code

Page 5: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Embedded Multicore Advantage

When it comes to multicore, there are certain advantages to the embedded tools field

– Embedded debug tools tend to be better at dealing with timing errors and doing debug of low-level code and

– Operating-system – application interfaces have better debug support

– Hardware-supported debug far beyond what desktops and servers can do

– OS awareness in external debug tools

Debuggers and tools are starting to catch up, including awareness of cores, systems, threads, domains, …

– But it gets pretty complex pretty quickly…

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-245

Page 6: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Multiple Context Debugging

Multiple Targets• One Wind River Workbench

instance• Target manager• Multiple simultaneous

connections includingshared connections

• Multiple OS types supported simultaneously

• Multiple target processors supported simultaneously

Bay Networks

Bay Networks

Bay Networks

FunctionProcessors

ControlProcessors

Multiple Contexts• Core, process, or thread• Each context has a set of views:

• Source• Stack• Registers

Processes/Threads• Qualify breakpoints on a process or

specific thread• Stop the entire process or an

individual thread

Target boards may be any mix of physical, logical, or virtual boards and any mix of uniprocessors and multicore running SMP or UP with Hypervisor, VxWorks, Wind River Linux and bare metal software.

Host System Target System

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-246

Page 7: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Hardware Trace On-Chip Trace

– Added feature of hardware– Costs some chip area,

some designers – and some customers – do not consider it worth the cost

– Mostly for processors and their buses

– Being added for other parts of the system, as they become more important Performance counters

common in complex devices today

– Interface bandwidth limitations can put a limit on effectiveness

Board 1

Flash

SoC

DDR RAM

Eth

Eth

PIC

Timer

Serial

PC

Ie

MemIntf

L1$

L1$

CPU

CPU

L2$

Peripherals

T

TT T

T

T

P P P P

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-247

Page 8: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Hardware Triggering Cross-triggering

– Coordination across the chip

– Cause action in one place based on events occurring elsewhere in the system Stop execution, start

tracing, stop tracing, interrupt, ...

Requires logic on the chip Basically, it is an on-chip

programmable little supervisor processor

Conclusion: wise users buy hardware with good debug support

Board 1

Flash

SoC

DDR RAM

Eth

Eth

PIC

Timer

Serial

PC

Ie

MemIntf

L1$

L1$

CPU

CPU

L2$

Peripherals

B

BB B

B

B

B

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-248

Page 9: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Trace, Trace, Trace

There seems to be a growing consensus that trace is a key tool for debugging multicore large-scale software

– Software stacks adding tracing as feature– Hardware support for extracting traces from software– Hardware actually tracing its own operation– Simulators hooks for getting data and key points out

Only way to get an overview of the system Trace long runs…

– Trace processing and analysis of data stream a key technology for the future, manual inspection does not suffice

And drop back to a debugger around a problemhttp://jakob.engbloms.se/archives/1251

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-249

Page 10: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Overhead vs Efficiency

Common complaint about debug hooks in hardware and software: it costs too much power / performance / throughput / chip area / money / …

Cary Millsap, Thinking Clearly about Performance 2, CACM Oct 2010http://mags.acm.org/communications/201010#pg40

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-2410

Page 11: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Embedded Multicore

Software Architecture and Hypervisors

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-2411

Page 12: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-24

OS

Core 2Core 1

OS

AMP

More Than Just SMP

Single Core

Multi-core

OS: Could be VxWorks, Wind River Linux, or other executive or OS

Combinations of these primary configurations can be used to create more advanced configurations.

Core

OS

“Traditional”

Hypervisor

Core Virtualization

Core

OS OS

Hypervisor

SMP

OS

Core 1 Core 2

12

Page 13: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Unit 1 Unit 2 Unit 3

Example: Consolidation with Hypervisor

Consolidated unit

Wind River Hypervisor

Multicore Hardware

Single-core

OS 1

App 1

Single-core

Bare-metal application

Multicore

OS 3

App 3

OS 1

App 1Bare-metal application

OS 3

App 3Single-core apps keep running as single-core, avoiding the risk of breakage due to true concurrency

Single hardware = easier to manage, reduced manufacturing cost, more units fits in the same space. Most of the multicore gain with very limited pain!Hypervisor provides

isolation between guests, virtual boards keep running as-is

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-2413

Page 14: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Example: Back to Basics

Wind River Hypervisor

Multicore Hardware

Control-Plane OS

Management, control

Core Core Core Core

Network stack

Core

Network stack

Network stack

WRE WRE WRE

WRE – Wind River Executive. Clear trend to provide sub-RTOS “executives” to provide very high performance for applications with no need for a full OS. Typically per-core.

Hypervisor can simplify the coordination between OS instances and provide a simpler programming interface for a WRE:

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-2414

Page 15: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Simics and Multicore

Debug

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-2415

Page 16: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-24

Wind River Simics

Wind River Simics: Full Simulation of Any Electronic System

Virtual Platform

An adaptive virtual platform that enables customers to define, develop, and deploy electronics systems more efficiently

Aerospace and Defense Industrial and Medical Mobile and Consumer Network EquipmentAutomotive

16

Page 17: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

System-Level FeaturesCheckpoint and restore Multicore, processor, board Real-world connections

Repeatable fault injection on any system component

Scripting Mixed endianness, word sizes, heterogeneity

con0.wait-for-string "$“

con0.record-start

con0.input "./ptest.elf 5\n"

con0.wait-for-string "."

$r = con0.record-stop

if ($r == "fail.”) {

echo ”test failed”

}

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-2417

Page 18: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Full-System Insight

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-2418

Page 19: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Simics

Hypervisor is just any Software Stack

Wind River Hypervisor

Multicore Hardware

OS 1

App 1Bare-metal application

OS 3

App 3

32/64-bit PC

Linux, Windows

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-2419

Page 20: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Simics Debugging FeaturesSynchronous stop for entire system

Determinism and repeatability

Reverse execution

Unlimited and powerful breakpoints

Trace anything Insight into all devices

break –x 0x0000->0x1F00

break-io uart0

break-exception int13

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-2420

Page 21: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-24

Repeatability and Reverse Debugging Repeat any run trivially

– No need to rerun and hope for bug to reoccur

Stop and go back in time– No rerunning program from start– Breakpoints and watchpoints backward in time– Investigate exactly what happened this time

This control and reliable repeatability is very powerful for parallel code.

Discover Bug

Rerun, bug doesn’t show up

Rerun, bug doesn’t show up

Rerun, different bug

Rerun, initial bug occurs

Discover Bug

Reverse execute and find source of bug

On virtual hardware, debugging is much easier.On hardware, only some runs reproduce an error.

http://blogs.windriver.com/engblom/2010/09/deterministic-but-unpredictable.html

21

Page 22: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Transporting BugsVirtual platform

checkpoint

software package, load, or configuration

hardware configuration or reconfiguration

PP

R

D

The software user finds a bug and needs to report it to the developer. This makes him or her the reporter R

A developer D creates a piece of software and passes it on for testing and use

The developer and reporter are both using a virtual platform to run software

The reporter uses virtual platform checkpointing to pass the bug to the developer. This ensures perfect replication and that the complete target state is communicated.

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-2422

Page 23: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Replaying Target Stimuli

R1

R RPP

Boot...

P

DR stimuli

RC

R

Configure...

R0

Run tests...

Note that many different tests can be started from this checkpoint

RnR2

Inputs occurring after the last checkpoint was taken, but before the bug hits

Checkpoint merge

Bug!

Recording of last few inputs

Merged checkpoint and the recording is the bug report contents

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-2423

Page 24: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Debug Multicore Hang: Problem

Multithreaded program, stable on existing system OS changed, hardware and software stack not changed Started to freeze occasionally (1 run in 20)

– Change of OS exposed a latent bug in the code Reporter captured bug as a checkpoint + script Passed checkpoint and script to developer for analysis

MPC8641 8 core

Glibc 2.5.1

Linux 2.6.23

Rule30_threaded.elf

MPC8641 8 core

Glibc 2.5.1

Linux 2.6.27 (WR Linux 3.0)

Rule30_threaded.elf

R

R

R

D

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-2424

Page 25: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Simics

Debug Multicore Hang: Debug

Reproduction of bug trivial with checkpoint and script Developer used OS awareness and source code debug to set

breakpoints inside target program– On data accesses to shared work queue used by all threads– Unintrusive – does not change the behavior of the target system in any way

Custom script catches breakpoints– Diagnostics: state of queue (read target memory, perform calculations), queue

control variable being accessed, source line, thread ID– For both successful and failing runs -> spotted the difference

R

R

MPC8641 8 core

Glibc 2.5.1

Linux 2.6.27 (WR Linux 3.0)

Rule30_threaded.elf D

OS awareness

Source code debug

Custom script

Debug information for binary program, outside the target

DD

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-2425

Page 26: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Debug Multicore HangExample Diagnostic Output[bp] Thread 918, writing variable empty with value 1.

At rule30_packet_queue_get, line 157 Prev. state: Done: 1 Empty: 0 Full: 0 Tail: 0 Head: 0 Elems: 0

[bp] Thread 918, writing variable full with value 0. At rule30_packet_queue_get, line 158 Prev. state: Done: 1 Empty: 1 Full: 0 Tail: 0 Head: 0 Elems: 0

...

[bp] Thread 921, writing variable done with value 1. At rule30_packet_queue_signal_done, line 62 Prev. state: Done: 0 Empty: 0 Full: 0 Tail: 0 Head: 98 Elems: 2

The Bug

68 // - It only wakes up one thread...69 pthread_cond_signal (&(q->notEmpty));70 // To be correct:71 //pthread_cond_broadcast (&(q->notEmpty));

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-2426

Page 27: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Analyzer Looking at the ProgramNice speedup with 1 to 3 worker threads

With four worker threads, the program uses only two cores

With five worker threads, the efficiency is horrible and two of the worker threads are left hanging!

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-2427

Page 28: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Simics and Multicore

Evaluating Software Scalability on Flexible Virtual Hardware

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-2428

Page 29: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Scalable AMP Hardware Scales to any number of cores Configurable in several dimensions

Global shared memory

PPC440 coreLocal memory

InterruptcontrollerSerial port

Interrupt network

PPC440 coreLocal memory

InterruptcontrollerSerial port

PPC440 coreLocal memory

InterruptcontrollerSerial port

Scalable virtual Power Architecture multicore machine

Clock frequency

Size

Size Number of cores

Access delay

Contention to global memory

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-2429

Page 30: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Varying memory latency of shared memory Parallel processing benchmark

– Shared memory restricted single access and high latencies– Testing two different transfer modes, 1 packet and 4 packets per transmission– Scalability quite different

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

9.00

10.00

1 2 3 4 5 6 7 8 9

Pero

frm

ance

rela

tive

to o

ne w

orke

r nd

oe

Number of worker nodes

Scaling as Worker Nodes are AddedPerfect memory100 cycles, single port200 cycles, single port500 cycles, single portPerfect memory, 4 packets/trans100 cycles, single port, 4 packets/trans200 cycles, single port, 4 packets/trans500 cycles, single port, 4 packets/trans

Memory Speed Impact

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-2430

Page 31: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

Simics and Multicore

Speeding up development by smart tricks

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-2431

Page 32: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated

OS Prototyping: Xtratum Timebase A multicore OS needs

consistent time across all cores– First task in development is to

establish such a timebase– On hardware, tricky timing loops

are needed With Simics, we can prototype

using scripting– Mark time sync point with a

magic instruction– Script triggers on the magic

instructions, and resets all local times to the same time

A complex but non-value-added task becomes trivial

– Shorten time to interesting experiments using Simics

http://www.tentech.ca/index.php/2010/09/easy-multi-core-powerpc-timebase-synchronization-with-simics/

The Code: OS and Scriptstatic void __VBOOT synchronize_clocks(void){

if (0 == GET_CPU_ID()) {MAGIC(4);

}BarrierWait(&g_smpPartitionInitBarrier);}

def synchronize_ppc_timebase():# Get number of CPUs from system 0. # Using some assumptionsnum_cpus = conf.sim.cpu_info[0][1]

# Iterate through all the coresfor cpu_id in range(num_cpus):

cpu = getattr(conf, "cpu%d" % cpu_id)

# Simply reset the timebasecpu.tbu = 0cpu.tbl = 0

print "Synchronized the CPU timebases at cpu0 cycle count %ld" % SIM_cycle_count(conf.cpu0)

Wind River - Programming Embedded Multicore - ICES Seminar 2010-11-2432

Page 33: Programming Tools for Embedded Multicore · 4. Wind River - Programming Embedded Multicore - ICES Seminar 2010-11 -24. 10. 12. 10. 10. 10. 8. 10. 6. 10. 4. 10. 2. 1 Software-dominated