COMPANY PROPRIETARY October 2012 The Embedded Software Challenge Gwyn Fisher, CTO, Klocwork
COMPANY PROPRIETARY
October 2012
The Embedded Software Challenge Gwyn Fisher, CTO, Klocwork
2 | Company Proprietary
About me…
• Guide technical direction and strategy at Klocwork
• 20+ years of global technology experience. • Original passion is compiler theory • Background in formal grammars and
computational linguistics • Spent most time in search and natural language
domains • Held senior and exec positions at Hummingbird,
Fulcrum Technologies, PC DOCS and LumaPath
• Thanks to our friends at VDC Research for the data presented in this deck!!
3 | Company Proprietary
You want the fridge to do what?
4 | Company Proprietary
Great idea. This should work out well… right?
5 | Company Proprietary
Oh boy. So, how big is the freight train?
Bad news: kinda big and scary. Good news: it’s all manageable.
6 | Company Proprietary
Software feature development in action...
First, identify what marketing wants... 1. Features
2. More Features.
3. Yesterday.
7 | Company Proprietary
Software feature development in action...
Second, let development figure out the details...
1. Ah, we might get hacked...
2. Do we have time to build the
quality we want?
3. Can our platform even support all these new features?
Ah… we might get hacked?
9 | Company Proprietary
Yes you might get hacked...
No need for more scare slides, here are the key issues:
• How do we develop a threat model?
• What are the risks to my embedded device?
• How to make my development team better at this?
10 | Company Proprietary
What is Threat Modeling?
• Based on assumption that every system has intrinsic value worth protecting • Involves thinking about system threats and how it can be attacked
• Secure systems start with understanding the threats • Threats are not vulnerabilities • Threats live forever; they are the attacker's goal
Threat
Mitigation
Vulnerability
Attacker
11 | Company Proprietary
Embedded Devices: Use/Abuse Cases to Consider
• What data does the device have, or controls access to
• What are the inputs to the system: • More obvious: wired and wireless networking, bluetooth, GPS signals,
cellular voice/data, etc. • Less obvious: (and may appear to be connection-free): remote controls
• The kinds of environments where it will be installed
• Physical security is often assumed for servers, but embedded devices don't necessarily have that luxury
• Devices may be deployed in ways that are outside of normal expectations
These help you device potential threats and attack vectors
... for the device and the entire system
12 | Company Proprietary
Embedded Systems: Threats & Attacks
• Typically resource-constrained devices • Where memory capacity, power consumption, and processing power must
strike a balance with security requirements • A product can be effective in production by carefully considering a threat
model and designing a system to work within the resource limitations
• Crypto implementation common weakness
• Vulnerable to power and timing analysis • “Side-channel" attacks • Risk is private/secret key exposure
• Users often change firmware
• Can be thwarted by using cryptographically signed updates
13 | Company Proprietary
• An embedded system is less likely to be online all the time • Harder to keep up-to-date (often need physical presence to update) • It's more likely to depend on a single other node for its network access (single
point of failure/attack) • Design system with update mechanism, and fail-safe if update mechanism not
available. • This is both a cost and attack threat
• Need to be very concerned about physical access • Often easy to steal
• Hardware-based and physical presence-based attacks more relevant
• An attacker could potentially swap one embedded device for another and use it to get false information into the system
Embedded Systems: Threats & Attacks
14 | Company Proprietary
• If the device is an ID or User Access device • Whoever has physical possession can get access to assets • The countermeasures are more of a system issue than a device issue • Personal and business data at risk • Threats to consider include device being lost, stolen, and spoofed
• Difficult to update security information if the device has… • Limited network access • Little storage • No accurate clock (on-board or online)
• Replay attacks are common
Embedded Systems: Threats & Attacks
15 | Company Proprietary
• Defensive Coding • Use automation to remove weak coding practices & vulnerabilities, static
analysis is most common technique • Can be performed on every check-in, every build, or some other interval that
works for your development process
• Code Review • One of the most impactful steps you can take toward more secure code • While design bugs are the most expensive to fix, implementation bugs are the
most common
• Security Testing • Functional test techniques cannot uncover security bugs • Designed to understand what is the application NOT supposed to do • Specific attacks should be applied to uncover vulnerabilities
How to make the threat model work for you...
Do we have time to build the quality we want?
17 | Company Proprietary
The Costs of Bug Containment
THE key development milestone for most
80 % of defects introduced here
18 | Company Proprietary
The payoff… deep code analysis at the developer desktop
3rd Generation Source Code Analysis
2nd Generation Source Code Analysis
THE key development milestone for most
Move Defect Detection
HERE
19 | Company Proprietary
Realities of Software Risk in Safety Critical Systems
• Safety and security incidents resulting from software defects are well documented
• Critical defects and security vulnerabilities: • slows development productivity and reduces time for innovation • impacts competitive advantage and reputation • Costs the U.S. economy almost $60 billion annually
• 80% of development costs are consumed by software developers identifying and correcting defects
• Traditional approaches to address risk are ineffective • Code analysis preformed after check-in
• 80% of defects introduced in implementation and build phases of SDLC • In-person code reviews
• Difficult to schedule, time-consuming, unpleasant process
20 | Company Proprietary
For example… Ariane 5 Flight 501
• Ariane 5 is an expendable launch system (no crew) used to deliver payloads into low Earth orbit
• June 1996: 1st test flight - Unsuccessful • The rocket veered off its flight path 37 seconds after
launch and was destroyed
• The Cause: • An error in the software design (inadequate
protection from integer overflow)
• The Result: • A loss of more than US$370 million • The subsequent automated analysis of the Ariane
code was the first example of large-scale static code analysis by abstract interpretation
21 | Company Proprietary
What’s the answer?
• Important to reduce the ‘rinse-repeat’ cycle of rework • Allows testing team to focus more on ensuring the software meets
requirements
• An ideal solution should include: • Ability to find defects while coding In-phase bug containment • Quality & reliability defects, security vulnerabilities, maintainability issues • Configuration and tuning capabilities • Must be a part of the regular build process
• Rethink code reviews
• Bottom-up, rather than imposed, top-down • Asynchronous, rather than “around the table” • Defects found through SCA integrated into the code review
Can our platform support all these new features?
23 | Company Proprietary
The short answer is probably “No”
# of Processors used in 2 years
Multi
74%
Single
51%
24 | Company Proprietary
Oh, but we kinda knew that…
25 | Company Proprietary
…but that’s just for those Google & Apple guys, right?
Projected 2 year multi-core/processor growth
62%
68%
75%
Mil/Aero Medical Industrial
26 | Company Proprietary
Hope you like working with software engineers
29%
43%
Software Engineers Other
Single Core Project
Multi-Core Project
27 | Company Proprietary
So, what are some of the specific risks?
• Race conditions • Dead locks • Design “maladies”
28 | Company Proprietary
How to mitigate these risks?
Developer Education • What is a dead lock?
• How localized or distributed are they?
• When one happens, what happens?
• What is a race condition?
• Why does it matter? • How does it happen? • Do I need multiple cores to
have one? • What about a multi-processor
package?
Tools • Profile-based analysis
• AKA dynamic analysis • Code instrumentation • Execute-under-duress
• Static analysis • Model-based simulation • Akin to compilation
technology • Looks for a variety of
problems in code
Give me some tools…
30 | Company Proprietary
Tools
• Profile-based analysis • Code instrumentation • Execute-under-duress
• Static analysis • Model-based simulation
• Custom • In many ways, only you can figure this one out
31 | Company Proprietary
Dynamic testing
• Testing the running code • Instrumented for some goal
• Identifying conflicting thread operations • Identifying harmful delays • Identifying memory leakage/waste/reuse
• Seriously cool when it works • “Your code produces the following conflict…”
32 | Company Proprietary
Dynamic testing
• Downside is reproduction • Instrumented code is fundamentally
different • Production code runs without hindrance • Maybe that conflict only occurs within
specific conditions (time, memory, environment)
• How do you reproduce that?
• Identification leads to re-design • If you can find it (or even if you can’t), easiest way to mitigate is simply
re-design
33 | Company Proprietary
Dynamic testing
• What’s available (and viable)? • Valgrind / Helgrind
• C/C++ and Fortran, bizarrely enough • Can identify race conditions and dead locks • Works off profiler core
• Critical Blue
• Modeling for single -> multiple core transitions • Race conditions, cache contention, memory flaws, …
• Others…
• IDE-bundled, mostly profilers
34 | Company Proprietary
Static analysis
• Tests code-as-written • Symbolic execution • Abstract interpretation • Pick your poison…
• Exercises all code paths • Complete coverage, no test cases required • Attempts proof of problem, not evidence
35 | Company Proprietary
Static analysis
• Downside is accuracy • May be 1,000s of ways a problem could
occur • But none of them will ever happen • Environmental controls, design
constraints, all the assumptions that go into code-as-written
• But, identification leads to fix • Stick with it, and you get a specific path that needs help, a root cause
perhaps
36 | Company Proprietary
Static analysis
• What’s available (and viable)? • Coverity Prevent • GrammaTech CodeSonar • Klocwork Insight
• Dead locks • Race conditions • Dumb mistakes
37 | Company Proprietary
Example concurrency detection
• Example taken from SQLite, years-old version • Locking documented as “seldom required”, with code being
“mostly lock insensitive” • So only singleton locking was required (i.e. one global lock) • But…. Should be recursive
• Allow a thread to reserve the lock when it already owns it
• Common development pattern for speed-sensitive code • Everything useful accomplished via asserts, not conditions • Including thread owner check
• But that’s not all… • To implement recursive locks, a second lock is used to guard a
reference count on the actual lock object • So how do you lock the lock that’s guarding the lock you want to lock?
38 | Company Proprietary
Example, cont.
• Strip away all the surroundings and you’re left with two fundamental functions • One reserves a lock • One releases it
• Everything is reference counted, so the function performing a reserve may not really reserve the lock, may just increment the ref count, etc.
• Couple this with a function that uses these locking capabilities perfectly correctly, but which has the potential to be interrupted • Add a data race, just for fun
• Net takeaway: incredibly difficult to debug deadlock
39 | Company Proprietary
Example, cont.
lock_t lock1, lock2; int refCount = 0;
void enter() { reserve_lock(lock1); if( refCount == 0 ) reserve_lock(lock2); release_lock(lock1); refCount++; }
void leave() { reserve_lock(lock1); refCount--; if( refCount == 0 ) release_lock(lock2); release_lock(lock1);
// global lock objects: lock1 guards access to lock2 // global reference count, supports recursive access to lock2 // Function to enter the locked section
// Function to leave the locked section
40 | Company Proprietary
Example, cont.
Thread A::foo() enter()
refCount == 0 reserve(lock1) reserve(lock2)
release(lock1) interrupted refCount++ leave() reserve(lock1) blocked
Thread B::foo() enter() reserve(lock1) refCount == 0 reserve(lock2) blocked
41 | Company Proprietary
Example, cont.
Thread B can get lock1, blocks attempting to reserve lock2 in enter()
Thread A has lock2, blocks attempting to regain lock1 in leave()
42 | Company Proprietary
Summary
• Devices are increasingly connected, and you’re not ready • Take fundamental process shifts, use acknowledged techniques • Protect yourself, because somebody will take advantage
• Devices are increasingly complex, and you’re not ready • More cores, more apps, more requirements, more code • Code is already hard enough to manage, and it’s getting worse • Reliability isn’t a buzz-word
• Our consumers cannot be expected to understand • … and you’re not ready
© Copyright Klocwork Inc. 2011. All Rights Reserved.