Top Banner
FIG: A Prototype Tool for On- Line Verification of Recovery Mechanisms Naveen Sastry, Pete Broadwell, Jonathan Traupman, David Patterson University of California, Berkeley
21

FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

Feb 25, 2016

Download

Documents

katima

FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms. Naveen Sastry, Pete Broadwell, Jonathan Traupman, David Patterson University of California, Berkeley. Presentation Outline. Introduction Objective/Motivation Background Methods Implementation Test setup Evaluation - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

Naveen Sastry, Pete Broadwell,Jonathan Traupman, David Patterson

University of California, Berkeley

Page 2: FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

Presentation Outline1. Introduction

– Objective/Motivation– Background

2. Methods– Implementation– Test setup

3. Evaluation– Test results– Conclusions

Page 3: FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

The Berkeley/Stanford ROC Project

• Purpose: investigating novel techniques for building highly-dependable Internet services

• Example techniques:– Advanced support for operator undo– Stability through targeted restarts– Integrated root cause analysis– Online verification of recovery

mechanisms

Page 4: FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

FIG Project Objective/Motivation

Objective:• Develop a lightweight, extensible tool

for injecting errors to test recovery code/mechanisms

Motivation:• Testing and production environments

are always different• Large systems will require recovery

code, which should be tested as part of normal operation

Page 5: FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

““Software’s Invisible Users”Software’s Invisible Users”

ApplicationOther libraries Other apps

System libraries (libc)

OS

User interface

User Input

Concept: Jim WhittakerFlorida Institute of Technology

Page 6: FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

Related Testing Methods1. Ballista (DeVale, Koopman, Siewiorek)

• “Top-down” testing of POSIX-compliant OS and library interfaces

2. Fuzz (Miller, Fredriksen, So)• Tested UNIX applications by feeding

them random input streams3. Holodeck (Whittaker et al.)

• Similar approach to ours, but only for Windows 2000/XP

Page 7: FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

FIG Implementation• Thin stub library

between app & libraries

• Traps API calls– Logs them– Inserts faults

• Can be inserted into any app without modification– Uses LD_PRELOAD

Application

libfig.so

libc.so, other libs

OS

Normal call path Injected fault

Page 8: FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

Extensibility• API stubs are

automatically generated

• Very easy to add new APIs to log

• Fault injection is under script control

• Can simulate multiple fault models (e.g., memory pressure)

MALLOC_INDEX interval 82 to infinity return 0 errno ENOMEM probability 0.03

OPEN_INDEX // device out of space. interval 100 to infinity return –1 errno ENOSPC probability 0.001 // kernel out of memory. interval 100 to 120 return –1 errno ENOMEM probability 0.1 // too many files open. callnumber 108 return -1 errno EMFILE probability 1.0

Sample control file:

Page 9: FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

Test Setup: Applications• GNU file utilities (ls, mv, etc.)• Emacs 20.7.1 – with and without X• Apache 1.3.22• Berkeley DB 4.0.14• Netscape Navigator 4.76• MySQL server 3.23.36

Page 10: FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

Test Setup:Instrumented Calls & Their Errors

• malloc() – memory exhaustion• read() – I/O error, system call was

interrupted• write() – I/O error, no space left on

device, call interrupted• open() – memory exhaustion, no space

on device, too many files open• select() – memory exhaustion

Page 11: FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

Test Results: Client Appsread() write() select() malloc()

EINTR EIO ENOSPC EIO ENOMEM ENOMEMEmacs – no X o.k. exit warn warn o.k. crash

Emacs -w/X o.k. crash o.k. crash crash/

exit crash

Netscape warn exit exit exit n/a exit

Page 12: FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

Test Results: Server Appsread() write() select() malloc()

EINTR EIO ENOSPC EIO ENOMEM ENOMEMBerkeley DB – Xact retry detec

tXact abort

Xact abort n/a Xact

abortBerkeley DB – no Xact

retry detect

data loss

data loss n/a

detect, or data

lossMySQL Server

Xact abort

retry, warn

Xact abort

Xact abort retry restart

process

Apache o.k. req. drop

req. drop

req. drop o.k. n/a

Page 13: FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

Netscape Reacts

Page 14: FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

Test Results: OverheadTime (s) Overhead

No FIG 33.46 N/AFIG, no logging 34.28 2.5%Logging w/o timestamps 47.83 42.9%Logging w/timestamps 61.74 84.5%strace (all syscalls) 112.85 237.3%

Timing using Berkeley DB (non-transactional) to read, sort and write one million words.

• Note: FIG communicates with a separate logging daemon through shared memory to reduce logging overhead.

Page 15: FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

Strategies forReliable Services:

• Intelligent retry– ls: “bounded retry” of malloc()

• Resource preallocation– Apache: allocates buffer pool at startup

• Degraded service– Apache: deactivates logging if disk full

• Process pools– Apache and MySQL

Page 16: FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

FIG as a Prototype for Online Error Injection• Low run-time overhead• Easy to enable/disable• Easy to configure• Extensible• Can simulate multiple fault

models

Page 17: FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

A Case for OnlineError Injection

• Recovery code is not usually exercised during normal operation

• Deployed environments tend to differ from testing environments

• Can run error injection tests on a subset of deployed systems

• FIG can simulate common environmental errors

Page 18: FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

Conclusions• FIG exposed a variety of deficiencies in

how our test applications handled environmental errors

• Server apps are generally more robust than client applications

• FIG exhibits low overhead• FIG is suitable for online error injection

Page 19: FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms
Page 20: FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

Future Directions• Limitations of FIG:

– Only for UNIX-like OSes– Limited to app/library interface (proxy for

app/OS interaction)• Make FIG part of a larger test suite• Include clock time and event based

error triggers• Greater flexibility in configuration file

Page 21: FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

Other Related Work1. Xept (Vo et al.)

• Instruments object code to ensure that error handling code exists

2. Processor & memory errors• DOCTOR, HYBRID, DEFINE

3. Process memory corruption• FERRARI, DEFINE