The Embedded Software Challenge - itac.caitac.ca/uploads/events/...gwyn_fisher_presentation.pdf · 37 | Company Proprietary . Example concurrency detection • Example taken from

COMPANY PROPRIETARY

October 2012

The Embedded Software Challenge Gwyn Fisher, CTO, Klocwork

2 | Company Proprietary

About me…

• Guide technical direction and strategy at Klocwork

• 20+ years of global technology experience. • Original passion is compiler theory • Background in formal grammars and

computational linguistics • Spent most time in search and natural language

domains • Held senior and exec positions at Hummingbird,

Fulcrum Technologies, PC DOCS and LumaPath

• Thanks to our friends at VDC Research for the data presented in this deck!!


You want the fridge to do what?


Great idea. This should work out well… right?


Oh boy. So, how big is the freight train?

Bad news: kinda big and scary. Good news: it’s all manageable.


Software feature development in action...

First, identify what marketing wants... 1. Features

2. More Features.

3. Yesterday.


Software feature development in action...

Second, let development figure out the details...

1. Ah, we might get hacked...

2. Do we have time to build the

quality we want?

3. Can our platform even support all these new features?

Ah… we might get hacked?


Yes you might get hacked...

No need for more scare slides, here are the key issues:

• How do we develop a threat model?

• What are the risks to my embedded device?

• How to make my development team better at this?


What is Threat Modeling?

• Based on assumption that every system has intrinsic value worth protecting • Involves thinking about system threats and how it can be attacked

• Secure systems start with understanding the threats • Threats are not vulnerabilities • Threats live forever; they are the attacker's goal

Threat

Mitigation

Vulnerability

Attacker


Embedded Devices: Use/Abuse Cases to Consider

• What data does the device have, or controls access to

• What are the inputs to the system: • More obvious: wired and wireless networking, bluetooth, GPS signals,

cellular voice/data, etc. • Less obvious: (and may appear to be connection-free): remote controls

• The kinds of environments where it will be installed

• Physical security is often assumed for servers, but embedded devices don't necessarily have that luxury

• Devices may be deployed in ways that are outside of normal expectations

These help you device potential threats and attack vectors

... for the device and the entire system


Embedded Systems: Threats & Attacks

• Typically resource-constrained devices • Where memory capacity, power consumption, and processing power must

strike a balance with security requirements • A product can be effective in production by carefully considering a threat

model and designing a system to work within the resource limitations

• Crypto implementation common weakness

• Vulnerable to power and timing analysis • “Side-channel" attacks • Risk is private/secret key exposure

• Users often change firmware

• Can be thwarted by using cryptographically signed updates


• An embedded system is less likely to be online all the time • Harder to keep up-to-date (often need physical presence to update) • It's more likely to depend on a single other node for its network access (single

point of failure/attack) • Design system with update mechanism, and fail-safe if update mechanism not

available. • This is both a cost and attack threat

• Need to be very concerned about physical access • Often easy to steal

• Hardware-based and physical presence-based attacks more relevant

• An attacker could potentially swap one embedded device for another and use it to get false information into the system



• If the device is an ID or User Access device • Whoever has physical possession can get access to assets • The countermeasures are more of a system issue than a device issue • Personal and business data at risk • Threats to consider include device being lost, stolen, and spoofed

• Difficult to update security information if the device has… • Limited network access • Little storage • No accurate clock (on-board or online)

• Replay attacks are common



• Defensive Coding • Use automation to remove weak coding practices & vulnerabilities, static

analysis is most common technique • Can be performed on every check-in, every build, or some other interval that

works for your development process

• Code Review • One of the most impactful steps you can take toward more secure code • While design bugs are the most expensive to fix, implementation bugs are the

most common

• Security Testing • Functional test techniques cannot uncover security bugs • Designed to understand what is the application NOT supposed to do • Specific attacks should be applied to uncover vulnerabilities

How to make the threat model work for you...

Do we have time to build the quality we want?


The Costs of Bug Containment

THE key development milestone for most

80 % of defects introduced here


The payoff… deep code analysis at the developer desktop

3rd Generation Source Code Analysis

2nd Generation Source Code Analysis

THE key development milestone for most

Move Defect Detection

HERE


Realities of Software Risk in Safety Critical Systems

• Safety and security incidents resulting from software defects are well documented

• Critical defects and security vulnerabilities: • slows development productivity and reduces time for innovation • impacts competitive advantage and reputation • Costs the U.S. economy almost $60 billion annually

• 80% of development costs are consumed by software developers identifying and correcting defects

• Traditional approaches to address risk are ineffective • Code analysis preformed after check-in

• 80% of defects introduced in implementation and build phases of SDLC • In-person code reviews

• Difficult to schedule, time-consuming, unpleasant process


For example… Ariane 5 Flight 501

• Ariane 5 is an expendable launch system (no crew) used to deliver payloads into low Earth orbit

• June 1996: 1st test flight - Unsuccessful • The rocket veered off its flight path 37 seconds after

launch and was destroyed

• The Cause: • An error in the software design (inadequate

protection from integer overflow)

• The Result: • A loss of more than US$370 million • The subsequent automated analysis of the Ariane

code was the first example of large-scale static code analysis by abstract interpretation


What’s the answer?

• Important to reduce the ‘rinse-repeat’ cycle of rework • Allows testing team to focus more on ensuring the software meets

requirements

• An ideal solution should include: • Ability to find defects while coding In-phase bug containment • Quality & reliability defects, security vulnerabilities, maintainability issues • Configuration and tuning capabilities • Must be a part of the regular build process

• Rethink code reviews

• Bottom-up, rather than imposed, top-down • Asynchronous, rather than “around the table” • Defects found through SCA integrated into the code review

Can our platform support all these new features?


The short answer is probably “No”

# of Processors used in 2 years

Multi

74%

Single

51%


Oh, but we kinda knew that…


…but that’s just for those Google & Apple guys, right?

Projected 2 year multi-core/processor growth

62%

68%

75%

Mil/Aero Medical Industrial


Hope you like working with software engineers

29%

43%

Software Engineers Other

Single Core Project

Multi-Core Project


So, what are some of the specific risks?

• Race conditions • Dead locks • Design “maladies”


How to mitigate these risks?

Developer Education • What is a dead lock?

• How localized or distributed are they?

• When one happens, what happens?

• What is a race condition?

• Why does it matter? • How does it happen? • Do I need multiple cores to

have one? • What about a multi-processor

package?

Tools • Profile-based analysis

• AKA dynamic analysis • Code instrumentation • Execute-under-duress

• Static analysis • Model-based simulation • Akin to compilation

technology • Looks for a variety of

problems in code

Give me some tools…


Tools

• Profile-based analysis • Code instrumentation • Execute-under-duress

• Static analysis • Model-based simulation

• Custom • In many ways, only you can figure this one out


Dynamic testing

• Testing the running code • Instrumented for some goal

• Identifying conflicting thread operations • Identifying harmful delays • Identifying memory leakage/waste/reuse

• Seriously cool when it works • “Your code produces the following conflict…”


Dynamic testing

• Downside is reproduction • Instrumented code is fundamentally

different • Production code runs without hindrance • Maybe that conflict only occurs within

specific conditions (time, memory, environment)

• How do you reproduce that?

• Identification leads to re-design • If you can find it (or even if you can’t), easiest way to mitigate is simply

re-design


Dynamic testing

• What’s available (and viable)? • Valgrind / Helgrind

• C/C++ and Fortran, bizarrely enough • Can identify race conditions and dead locks • Works off profiler core

• Critical Blue

• Modeling for single -> multiple core transitions • Race conditions, cache contention, memory flaws, …

• Others…

• IDE-bundled, mostly profilers


Static analysis

• Tests code-as-written • Symbolic execution • Abstract interpretation • Pick your poison…

• Exercises all code paths • Complete coverage, no test cases required • Attempts proof of problem, not evidence


Static analysis

• Downside is accuracy • May be 1,000s of ways a problem could

occur • But none of them will ever happen • Environmental controls, design

constraints, all the assumptions that go into code-as-written

• But, identification leads to fix • Stick with it, and you get a specific path that needs help, a root cause

perhaps


Static analysis

• What’s available (and viable)? • Coverity Prevent • GrammaTech CodeSonar • Klocwork Insight

• Dead locks • Race conditions • Dumb mistakes


Example concurrency detection

• Example taken from SQLite, years-old version • Locking documented as “seldom required”, with code being

“mostly lock insensitive” • So only singleton locking was required (i.e. one global lock) • But…. Should be recursive

• Allow a thread to reserve the lock when it already owns it

• Common development pattern for speed-sensitive code • Everything useful accomplished via asserts, not conditions • Including thread owner check

• But that’s not all… • To implement recursive locks, a second lock is used to guard a

reference count on the actual lock object • So how do you lock the lock that’s guarding the lock you want to lock?


Example, cont.

• Strip away all the surroundings and you’re left with two fundamental functions • One reserves a lock • One releases it

• Everything is reference counted, so the function performing a reserve may not really reserve the lock, may just increment the ref count, etc.

• Couple this with a function that uses these locking capabilities perfectly correctly, but which has the potential to be interrupted • Add a data race, just for fun

• Net takeaway: incredibly difficult to debug deadlock


Example, cont.

lock_t lock1, lock2; int refCount = 0;

void enter() { reserve_lock(lock1); if( refCount == 0 ) reserve_lock(lock2); release_lock(lock1); refCount++; }

void leave() { reserve_lock(lock1); refCount--; if( refCount == 0 ) release_lock(lock2); release_lock(lock1);

// global lock objects: lock1 guards access to lock2 // global reference count, supports recursive access to lock2 // Function to enter the locked section

// Function to leave the locked section


Example, cont.

Thread A::foo() enter()

refCount == 0 reserve(lock1) reserve(lock2)

release(lock1) interrupted refCount++ leave() reserve(lock1) blocked

Thread B::foo() enter() reserve(lock1) refCount == 0 reserve(lock2) blocked


Example, cont.

Thread B can get lock1, blocks attempting to reserve lock2 in enter()

Thread A has lock2, blocks attempting to regain lock1 in leave()


Summary

• Devices are increasingly connected, and you’re not ready • Take fundamental process shifts, use acknowledged techniques • Protect yourself, because somebody will take advantage

• Devices are increasingly complex, and you’re not ready • More cores, more apps, more requirements, more code • Code is already hard enough to manage, and it’s getting worse • Reliability isn’t a buzz-word

• Our consumers cannot be expected to understand • … and you’re not ready

© Copyright Klocwork Inc. 2011. All Rights Reserved.

The Embedded Software Challenge - itac.caitac.ca/uploads/events/...gwyn_fisher_presentation.pdf · 37 | Company Proprietary . Example concurrency detection • Example taken from

Documents