Top Banner
DeepXplore: Automated Whitebox Testing of Deep Learning Systems Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana DeepXplore: Automated Whitebox Testing of Deep Learning Systems 05 February 2018 1 / 57
57

DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Sep 29, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

DeepXplore: Automated Whitebox Testing of DeepLearning Systems

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman JanaSOSP 2017

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 1 / 57

Page 2: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Table of Contents

1 Introduction

2 Background

3 Overview

4 Methodology

5 Implemntation

6 Evaluation Setup

7 Results

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 2 / 57

Page 3: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Table of Contents

1 Introduction

2 Background

3 Overview

4 Methodology

5 Implemntation

6 Evaluation Setup

7 Results

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 3 / 57

Page 4: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Introduction

DL systems, despite their impressive capabilities, often demonstrateunexpected or incorrect behaviors in corner cases for several reasons suchas biased training data, overfitting, and underfitting of the models.

Existing DL testing depends heavily on manually labeled data andtherefore often fails to expose erroneous behaviors for rare inputs.

They design, implement, and evaluate DeepXplore, the first whiteboxframework for systematically testing real-world DL systems.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 4 / 57

Page 5: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Introduction

They address two main problems:

Generating inputs that trigger different parts of a DL system’s logic.

Identifying incorrect behaviors of DL systems without manual effort.

First, they introduce neuron coverage. At a high level, neuron coverageof DL systems is similar to code coverage of traditional systems.

However, code coverage itself is not a good metric for estimating coverageof DL systems.

Even a single randomly picked test input was able to achieve 100% codecoverage while the neuron coverage was less than 10%.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 5 / 57

Page 6: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Introduction

Next, they show how multiple DL systems with similar functionality (e.g.,self-driving cars by Google, Tesla, and GM) can be used ascross-referencing oracles to identify erroneous corner cases withoutmanual checks.

For example, if one selfriving car decides to turn left while others turnright for the same input, one of them is likely to be incorrect. (differentialtesting)

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 6 / 57

Page 7: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Introduction

Finally, they demonstrate how the problem of generating test inputs thatmaximize neuron coverage of a DL system while also exposing as manydifferential behaviors (i.e., differences between multiple similar DLsystems) as possible can be formulated as a joint optimization problem.

They design, implement, and evaluate DeepXplore, the first efficientwhitebox testing framework for large-scale DL systems.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 7 / 57

Page 8: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Table of Contents

1 Introduction

2 Background

3 Overview

4 Methodology

5 Implemntation

6 Evaluation Setup

7 Results

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 8 / 57

Page 9: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

DL Systems

We define a DL system to be any software system that includes at leastone Deep Neural Network (DNN) component.

Note that some DL systems might comprise solely of DNNs (e.g.,self-driving car DNNs predicting steering angles without any manual rules)while others may have some DNN components interacting with othertraditional software to produce the final output.

Figure: Comparison between traditional and ML system development processes.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 9 / 57

Page 10: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

DNN Architecture

DNNs are inspired by human brains with millions of interconnectedneurons.

A DNN usually has at least three (often more) layers: one input, oneoutput, and one or more hidden layers.

DNNs can be trained using different training algorithms, but gradientdescent using backpropagation is by far the most popular trainingalgorithm for DNNs.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 10 / 57

Page 11: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

DNN Architecture

Figure: A simple DNN and the computations performed by each of its neurons.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 11 / 57

Page 12: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Limitations of Existing DNN TestingExpensive Labeling Effort

Existing DNN testing techniques require prohibitively expensive humaneffort to provide correct labels/actions for a target task (e.g., self-driving acar, image classification, and malware detection).

For complex and high-dimensional real-world inputs, human beings, evendomain experts, often have difficulty in efficiently performing a taskcorrectly for a large dataset.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 12 / 57

Page 13: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Limitations of Existing DNN TestingLow Test Coverage

None of the existing DNN testing schemes even try to cover different rulesof the DNN.

Therefore, the test inputs often fail to uncover different erroneousbehaviors of a DNN.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 13 / 57

Page 14: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Limitations of Existing DNN TestingProblems with low-coverage DNN tests.

Figure: Comparison between program flows of a traditional program and a neuralnetwork. The nodes in gray denote the corresponding basic blocks or neuronsthat participated while processing an input.

The figure shows the similarity between traditional software and DNNs.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 14 / 57

Page 15: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Limitations of Existing DNN TestingProblems with low-coverage DNN tests

Of course, unlike traditional software, DNNs do not have explicit branchesbut a neuron’s influence on the downstream neurons decreases as theneuron’s output value gets lower.

Note that randomly picked inputs are highly unlikely to set high outputvalues for the unlikely combination of neurons.

For example, if an image causes neurons labeled as “Nose” and “Red” toproduce high output values and the DNN misclassifies the input image asa car, such a behavior will never be seen during regular testing as thechances of an image containing a red nose (e.g., a picture of a clown) isvery small.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 15 / 57

Page 16: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Table of Contents

1 Introduction

2 Background

3 Overview

4 Methodology

5 Implemntation

6 Evaluation Setup

7 Results

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 16 / 57

Page 17: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

OverviewWorkflow

Figure: DeepXplore workflow.

DeepXplore solves a joint optimization problem that maximizes bothdifferential behaviors and neuron coverage.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 17 / 57

Page 18: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

OverviewA working example

Figure: Inputs inducing different behaviors in two similar DNNs.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 18 / 57

Page 19: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

OverviewA working example

Consider that we have two DNNs to test—both perform similar tasks, i.e.,classifying images into cars or faces, but they are trained independentlywith different datasets and parameters. Therefore, the DNNs will learnsimilar but slightly different classification rules.

The joint optimization algorithm will iteratively perform a gradient ascentto find a modified input that satisfies all of the goals described.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 19 / 57

Page 20: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

OverviewA working example

Figure: Gradient ascent starting from a seed input and gradually finding thedifference-inducing test inputs.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 20 / 57

Page 21: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Table of Contents

1 Introduction

2 Background

3 Overview

4 Methodology

5 Implemntation

6 Evaluation Setup

7 Results

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 21 / 57

Page 22: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

MethodologyDefinitions

Neuron coverage: Neuron coverage of a set of test inputs as the ratio ofthe number of unique activated neurons for all test inputs and the totalnumber of neurons in the DNN.

NCov(T , x) =|{n|∀x ∈ T , out(n, x) > t}|

|N|N = {n1, n2, . . . }: all neurons of a DNN.T = {x1, x2, . . . }: all test inputs.out(n, x): a function that returns the output value of neuron n in theDNN for a given test input x .

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 22 / 57

Page 23: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

MethodologyDefinitions

Gradient: The parametric function performed by a neuron can berepresented as y = f (θ, x) where f is a function.

The gradient of f (θ, x) with respect to input x can be defined as:

G = ∇x f (θ, x) =∂y

∂x

θ: parameters of DNN.x : test input of DNN.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 23 / 57

Page 24: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

MethodologyDeepXplore Algorithm

They define test generation process as an optimization problem, thus itcan be solved efficiently using gradient ascent.

Figure: Test input generation via jointoptimization

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 24 / 57

Page 25: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

MethodologyDeepXplore Algorithm

Maximizing differential behaviors: The first objective of theoptimization problem is to generate test inputs that can induce differentbehaviors in the tested DNNs.Suppose we have n DNNs:

Fk∈1...n : x → y

whereFk : function modeled by the k-th neural network.x : the inputy : output class probability vectors.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 25 / 57

Page 26: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

MethodologyDeepXplore

Let Fk(x)[c] be the class probability that Fk predicts x to be c .

They maximize the following objective function:

obj1(x) =∑k 6=j

Fk(x)[c]− λ1 · Fj(x)[c]

λ1: a parameter to balance the objective terms between the DNNs.obj1 can be maximized with gradient ascent.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 26 / 57

Page 27: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

MethodologyDeepXplore

Maximizing neuron coverage: The second objective is to generateinputs that maximize neuron coverage.They want to maximize

obj2(x) = fn(x)

Such that fn(x) > t

t: the neuron activation threshold.fn(x): the function modeled by neuron n that takes x as input andproduce the output of neuron n.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 27 / 57

Page 28: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

MethodologyDeepXplore

Joint optimization: They jointly maximize obj1 and fn described aboveand maximize the following function:

objjoint = (∑i 6=j

Fi (x)[c]− λ1 · Fj(x)[c]) + λ2 · fn(x)

λ2: a parameter for balancing between the two objectives of the jointoptimization processn: the inactivated neuron that we randomly pick at each iteration

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 28 / 57

Page 29: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

MethodologyDeepXplore

Domain-specific constraints: One important aspect of the optimizationprocess is that the generated test inputs need to satisfy severaldomain-specific constraints to be realistic.

They designed a simple rule-based method to ensure that the generatedtests satisfy the custom domain-specific constraints.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 29 / 57

Page 30: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

MethodologyDeepXplore

Hyperparameters in the algorithm:λ1: Larger λ1 puts higher priority on lowering the predictionvalue/confidence of a particular DNN while smaller λ1 puts more weighton maintaining the other DNNs’ predictions.

λ2: Larger λ2 focuses more on covering different neurons while smaller λ2

generates more difference-inducing test inputs.

s: Larger s may lead to oscillation around the local optimum while smallers may need more iterations to reach the objective.

t: Finding inputs that activate a neuron become increasingly harder as tincreases.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 30 / 57

Page 31: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Table of Contents

1 Introduction

2 Background

3 Overview

4 Methodology

5 Implemntation

6 Evaluation Setup

7 Results

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 31 / 57

Page 32: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Implementation

Their code is built on TensorFlow/Keras but does not require anymodifications to these frameworks.

Their experiments were run on a Linux laptop running Ubuntu 16.04 (oneIntel i7-6700HQ 2.60GHz processor with 4 cores, 16GB of memory, and aNVIDIA GTX 1070 GPU).

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 32 / 57

Page 33: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Table of Contents

1 Introduction

2 Background

3 Overview

4 Methodology

5 Implemntation

6 Evaluation Setup

7 Results

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 33 / 57

Page 34: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Evaluation SetupTest Datasets and DNNs

They evaluate DeepXplore on three DNNs for each dataset (i.e., a total offifteen trained DNNs).

MNIST: large handwritten digit dataset containing 28x28 pixelimages with class labels from 0 to 9. (60000 training and 10000testing).ImageNet: large image dataset with over 10000000 hand-annotatedimages that are crowdsourced and labeled manually.Driving: Udacity self-driving car challenge dataset that containsimages captured by a camera of a driving car and the simultaneoussteering wheel angle applied by the human driver for each image.(101396 training and 5614 testing).Contagio/VirusTotal: dataset containing different benign andmalicious PDF documents. (5000 + 12205 training -Contagio-, 5000+ 5000 testing -VirusTotal-. 135 static features from PDFrate)Drebin: dataset with 129013 Android applications among which123453 are benign and 5560 are malicious. There is a total of 545333binary features categorized into eight sets.Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 34 / 57

Page 35: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Evaluation SetupTest Datasets and DNNs

Figure: Details of the DNNs and datasets used to evaluate DeepXplore

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 35 / 57

Page 36: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Evaluation SetupDomain Specific Constraints

Image Constraints: three different types of constraints for simulatingdifferent environment conditions of images:

1 lighting effects for simulating different intensities of lights,

2 occlusion by a single small rectangle for simulating an attackerpotentially blocking some parts of a camera

3 occlusion by multiple tiny black rectangles for simulating effects ofdirt on camera lens.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 36 / 57

Page 37: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Evaluation SetupDomain Specific Constraints

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 37 / 57

Page 38: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Table of Contents

1 Introduction

2 Background

3 Overview

4 Methodology

5 Implemntation

6 Evaluation Setup

7 Results

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 38 / 57

Page 39: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Results

Figure: Number of difference-inducing inputs found by DeepXplore for each tested DNNobtained by randomly selecting 2,000 seeds from the corresponding test set for each run.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 39 / 57

Page 40: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Results

Figure: The features added to the manifest file by DeepXplore for generating two samplemalware inputs which Android app classifiers (Drebin) incorrectly mark as benign.

Figure: The top-3 most in(de)cremented features for generating two sample malware inputswhich PDF classifiers incorrectly mark as benign.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 40 / 57

Page 41: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

ResultsBenefits of Neuron Coverage

It has recently been shown that each neuron in a DNN tends toindependently extract a specific feature of the input instead ofcollaborating with other neurons for feature extraction.

This finding intuitively explains why neuron coverage is a good metric forDNN testing comprehensiveness.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 41 / 57

Page 42: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

ResultsBenefits of Neuron Coverage

Neuron coverage vs. code coverage: They set the threshold t inneuron coverage 0.75.

Figure: Comparison of code coverage and neuron coverage for 10 randomlyselected inputs from the original test set of each DNN.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 42 / 57

Page 43: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

ResultsBenefits of Neuron Coverage

Effect of neuron coverage on the difference-inducing inputs foundby DeepXplore: They evaluate the effectiveness of neuron coverage atgenerating diverse difference-inducing inputs.

Figure: The increase in diversity (L1-distance) in the difference-inducing inputsfound by DeepXplore while using neuron coverage as part of the optimizationgoal. This experiment uses 2,000 randomly picked seed inputs from the MNISTdataset. Higher values denote larger diversity. NC denotes the neuron coverage(with t = 0.25) achieved under each setting.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 43 / 57

Page 44: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

ResultsBenefits of Neuron Coverage

They measure the diversity of the generated difference-inducing inputs interms of averaged L1 distance between all difference-inducing inputsgenerated from the same seed and the original seed. The L1 distancecalculates the sum of absolute differences of each pixel values between thegenerated image and the original one.

Also, the numbers of difference-inducing inputs generated with λ2 = 1 areless than those for λ2 = 0 as setting λ2 = 1 causes DeepXplore to focuson finding diverse differences rather than simply increasing the number ofdifferences with the same underlying root cause.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 44 / 57

Page 45: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

ResultsBenefits of Neuron Coverage

Activation of neurons for different classes of inputs: Figure shows theresults, which confirm our hypothesis that inputs coming from the sameclass share more activated neurons than those coming from differentclasses.

Figure: Average number of overlaps among activated neurons for a pair of inputsof the same class and different classes. Inputs of different classes tend to activatedifferent neurons.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 45 / 57

Page 46: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

ResultsPerformance

They evaluate DeepXplore’s performance using two metrics: neuroncoverage of the generated tests and execution time for generatingdifference-inducing inputs.

Neuron coverage: In this experiment, they compare the neuron coverageachieved by the same number of tests generated by three differentapproaches:

1 DeepXplore

2 adversarial testing

3 random selection from the original test set.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 46 / 57

Page 47: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

ResultsPerformance

(1% of the original test set)Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 47 / 57

Page 48: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

ResultsPerformance

Execution time and number of seed inputs: They measure theexecution time of DeepXplore to generate difference-inducing inputs with100% neuron coverage for all the tested DNNs.

Figure: Averaged over 10 runs.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 48 / 57

Page 49: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

ResultsPerformance

Different choices of hyperparameters: They evaluate how the choicesof different hyperparameters influence DeepXplore’s performance.

Figure: The variation in DeepXplore runtime (in seconds) while generating thefirst difference-inducing input for the tested DNNs with different step size choice.All numbers averaged over 10 runs. The fastest times for each dataset ishighlighted in gray.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 49 / 57

Page 50: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

ResultsPerformance

Figure: The variation in DeepXplore runtime (in seconds) while generating thefirst difference-inducing input for the tested DNNs with different λ1. Higher λ1

values indicate prioritization of minimizing a DNNs’ outputs over maximizing theoutputs of other DNNs showing differential behavior. The fastest times for eachdataset is highlighted in gray.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 50 / 57

Page 51: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

ResultsPerformance

Figure: The variation in DeepXplore runtime (in seconds) while generating thefirst difference-inducing input for the tested DNNs with different λ2. Higher λ2

values indicate higher priority for increasing coverage. All numbers averaged over10 runs. The fastest times for each dataset is highlighted in gray.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 51 / 57

Page 52: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

ResultsPerformance

Testing very similar models with DeepXplore: DeepXplore may fail tofind any difference-inducing inputs within a reasonable time for some casesespecially for DNNs with very similar decision boundaries.

They control three types of differences between two DNNs and measurethe changes in iterations required to generate the first difference-inducinginputs in each case.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 52 / 57

Page 53: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Figure: Changes in the number of iterations DeepXplore takes, on average, to findthe first difference inducing inputs as the type and numbers of differencesbetween the test DNNs increase.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 53 / 57

Page 54: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

ResultsImproving DNNs with DeepXplore

They demonstrate two additional applications of the error-inducing inputsgenerated by DeepXplore:

augmenting training set and then improve DNN’s accuracy

detecting potentially corrupted training data.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 54 / 57

Page 55: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

ResultsImproving DNNs with DeepXplore

Augmenting training data to improve accuracy:

Figure: Improvement in accuracy of three LeNet DNNs when the training set isaugmented with the same number of inputs generated by random selection(“random”), adversarial testing (“adversarial”), and DeepXplore.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 55 / 57

Page 56: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

ResultsImproving DNNs with DeepXplore

Detecting training data pollution attack: tThey use wo LeNet-5DNNs: one trained on 60, 000 hand-written digits from MNIST datasetand the other trained on an artificially polluted version of the same datasetwhere 30% of the images originally labeled as digit 9 are mislabeled as 1.We use DeepXplore to generate error-inducing inputs that are classified asthe digit 9 and 1 by the unpolluted and polluted versions of the LeNet-5DNN respectively. We then search for samples in the training set that areclosest to the inputs generated by DeepXplore in terms of structuralsimilarity and identify them as polluted data. Using this process, we areable to correctly identify 95.6% of the polluted samples.

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 56 / 57

Page 57: DeepXplore: Automated Whitebox Testing of Deep Learning ... · whitebox testing framework for large-scale DL systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP

Discussoins from Yoav Hollander

https://blog.foretellix.com/2017/06/06/

deepxplore-and-new-ideas-for-verifying-ml-systems/

Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana SOSP 2017 (SOSP 2017)DeepXplore: Automated Whitebox Testing of Deep Learning Systems05 February 2018 57 / 57