Top Banner
Feature-Guided Black-Box Safety Testing of Deep Neural Networks Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 Youcheng Sun, Xiaowei Huang, and Daniel Kroening Feature-Guided Black-Box Safety Testing of Deep Neural Networks 05 May 2018 1 / 47
47

Feature-Guided Black-Box Safety Testing of Deep Neural ...

Oct 02, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Feature-Guided Black-Box Safety Testing of DeepNeural Networks

Youcheng Sun, Xiaowei Huang, and Daniel KroeningArxiv 2018

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 1 / 47

Page 2: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Table of Contents

1 Introduction

2 Background

3 Adequacy Criteria for Testing Deep Neural Networks

4 Automated Test Case Generation

5 Experiments

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 2 / 47

Page 3: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Table of Contents

1 Introduction

2 Background

3 Adequacy Criteria for Testing Deep Neural Networks

4 Automated Test Case Generation

5 Experiments

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 3 / 47

Page 4: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Introduction

Artificial intelligence systems are typically implemented in software.

However, (white-box) testing for traditional software cannot be directlyapplied to DNNs, because the software that implements DNNs does nothave suitable structure.

In particular, DNNs do not have traditional flow of control and thus it isnot obvious how to define criteria such as branch coverage for them.

In this paper, we bridge this gap by proposing a novel (white-box) testingmethodology for DNNs, including both test coverage criteria and test casegeneration algorithms.

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 4 / 47

Page 5: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Introduction

Any approach to testing DNNs needs to consider the distinct features ofDNNs, such as

The syntactic connections between neurons in adjacent layers(neurons in a given layer interact with each other and then passinformation to higher layers)

The ReLU activation functions, and

The semantic relationship between layers (e.g., neurons in deeperlayers represent more complex features)

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 5 / 47

Page 6: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Introduction

The contributions of this paper are three-fold.

First, they propose four test criteria, inspired by the MC/DC test criteriafrom traditional software testing, that fit the distinct features of DNNs.

There exist two coverage criteria for DNNs: neuron coverage and safetycoverage, both of which have been proposed recently.

Neuron coverage is too coarse: 100% coverage can be achieved by asimple test suite comprised of few input vectors from the training dataset.

Safety coverage is black-box, too fine, and it is computationally tooexpensive to compute a test suite in reasonable time.

Their four proposed criteria are incomparable with each other, andcomplement each other in guiding the generation of test cases.

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 6 / 47

Page 7: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Introduction

Second, they develop an automatic test case generation algorithm foreach of our criteria.

The algorithms produce a new test case by perturbing a given one usinglinear programming (LP).

LP can be solved efficiently in practice, and thus, their test case generationalgorithms can generate a test suite with low computational cost.

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 7 / 47

Page 8: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Introduction

Finally, they implement our testing approaches in a software tool namedDeepCover (available), and validate it by conducting experiments on a setof DNNs obtained by training on the MNIST dataset.

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 8 / 47

Page 9: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Table of Contents

1 Introduction

2 Background

3 Adequacy Criteria for Testing Deep Neural Networks

4 Automated Test Case Generation

5 Experiments

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 9 / 47

Page 10: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Background

Figure: Given one particular input x , we say that the neural work N is instantiatedand we use N[x ] to denote this instance of the network.

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 10 / 47

Page 11: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Background

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 11 / 47

Page 12: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Table of Contents

1 Introduction

2 Background

3 Adequacy Criteria for Testing Deep Neural Networks

4 Automated Test Case Generation

5 Experiments

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 12 / 47

Page 13: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Adequacy Criteria for Testing Deep Neural NetworksTest Coverage and MC/DC

Let N be a set of neural networks, R the set of requirements, and T theset of test suites.

Usually, the greater the number M(N,R,T ), the more adequate thetesting.

Their new criteria for DNNs are inspired by established practices insoftware testing, in particular MC/DC test coverage, but are designed forthe specific features of neural networks.

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 13 / 47

Page 14: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Adequacy Criteria for Testing Deep Neural NetworksTest Coverage and MC/DC

Modified Condition/Decision Coverage (MC/DC) is a method of ensuringadequate testing for safety-critical software.

At its core is the idea that if a choice can be made, all the possible factors(conditions) that contribute to that choice (decision) must be tested.

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 14 / 47

Page 15: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Adequacy Criteria for Testing Deep Neural NetworksTest Coverage and MC/DC

The first two test cases already satisfy both the condition coverage (i.e.,all possibilities of the conditions are exploited) and the decision coverage(i.e., all possibilities of the decision d are exploited).

The last four cases are needed because for MC/DC each condition shouldevaluate to true and false at least once and should also affect the decisionoutcome.

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 15 / 47

Page 16: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Adequacy Criteria for Testing Deep Neural NetworksDecisions and Conditions in DNNs

The information represented by a neuron in the next layer can be seen as asummary (implemented by the layer function, the weights, and the bias) ofthe information in the current layer.

The core idea of our criteria is to ensure that not only the presence of afeature needs to be tested but also the effects of less complex features ona more complex feature must be tested.

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 16 / 47

Page 17: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Adequacy Criteria for Testing Deep Neural NetworksDecisions and Conditions in DNNs

(absolute change, relative change)

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 17 / 47

Page 18: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Adequacy Criteria for Testing Deep Neural NetworksDecisions and Conditions in DNNs

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 18 / 47

Page 19: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Adequacy Criteria for Testing Deep Neural NetworksCovering Methods

The SS Cover is designed to provide evidence that the change of acondition neuron nk,l ’s activation sign independently affects the sign of thedecision neuron nk+1,j in the next layer.

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 19 / 47

Page 20: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Adequacy Criteria for Testing Deep Neural NetworksCovering Methods

Intuitively, the first condition describes the distance change of neurons inlayer k and the second condition requests the sign change of the neuronnk+1,j .

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 20 / 47

Page 21: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Adequacy Criteria for Testing Deep Neural NetworksCovering Methods

They expect their criteria can provide guidance to the test case generationalgorithms for discovering un-safe cases, by working with two adjacentlayers, which are finer than the input-output relation.

They notice that the label change in the output layer is the direct result ofthe changes to the activation values in the penultimate layer.

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 21 / 47

Page 22: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Adequacy Criteria for Testing Deep Neural NetworksCovering Methods

Intuitively, the SV Cover observes significant change of a decision neuron’svalue, by independently modifying one its condition neuron’s sign.

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 22 / 47

Page 23: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Adequacy Criteria for Testing Deep Neural NetworksCovering Methods

Intuitively, a DV cover targets the scenario that there is no sign change fora neuron pair, but the decision neuron’s value is changed significantly.

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 23 / 47

Page 24: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Adequacy Criteria for Testing Deep Neural NetworksTest Requirements and Criteria

F = {covSS , covdDS , covgSV , cov

d ,gDV }

Intuitively, a test requirement Rf asks that all neuron pairs are covered byat least two test cases in Tf with respect to the covering method f .

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 24 / 47

Page 25: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Adequacy Criteria for Testing Deep Neural NetworksTest Requirements and Criteria

Intuitively, it computes the percentage of the neuron pairs that are coveredby test cases in T with respect to the covering method f.

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 25 / 47

Page 26: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Adequacy Criteria for Testing Deep Neural NetworksTest Requirements and Criteria

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 26 / 47

Page 27: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Table of Contents

1 Introduction

2 Background

3 Adequacy Criteria for Testing Deep Neural Networks

4 Automated Test Case Generation

5 Experiments

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 27 / 47

Page 28: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Automated Test Case Generation

In this paper, they consider approaches based on constraint solving -(Linear Programming).

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 28 / 47

Page 29: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Automated Test Case Generation

The function f̂ represented by a DNN is highly non-linear and cannot beencoded with linear programming (LP) in general.

In this paper, for the efficient generation of a test case x ′, they consider(1) an LP-based approach by fixing the activation pattern ap[x] accordingto a given input x, and (2) encoding a prefix of the network, instead of theentire network, with respect to a given neuron pair.

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 29 / 47

Page 30: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Automated Test Case GenerationLP Model of a DNN Instance

The variables used in the LP model are distinguished in bold.

Given an input x , the input variable x, whose value is to be synthesizedwith LP, is required to have the identical activation pattern as x , i.e.,ap[x ] = ap[x].

Please note that the resulting LP model C [x ] = C1[x ] ∩ C2[x ] represents asymbolic set of inputs that have the identical activation pattern as x .

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 30 / 47

Page 31: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Automated Test Case GenerationOperations on activation patterns

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 31 / 47

Page 32: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Automated Test Case GenerationOperations on activation patterns

In this section, they discuss a safety requirement that is independent ofthe test criteria. This is to check automatically whether a given test casex is a bug.

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 32 / 47

Page 33: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Automated Test Case GenerationAutomatic Test Generation Algorithms

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 33 / 47

Page 34: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Automated Test Case GenerationAutomatic Test Generation Algorithms

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 34 / 47

Page 35: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Automated Test Case GenerationAutomatic Test Generation Algorithms

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 35 / 47

Page 36: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Table of Contents

1 Introduction

2 Background

3 Adequacy Criteria for Testing Deep Neural Networks

4 Automated Test Case Generation

5 Experiments

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 36 / 47

Page 37: Feature-Guided Black-Box Safety Testing of Deep Neural ...

Experiments

They use the well-known MNIST Handwritten Image Dataset to train aset of 10 fully connected DNNs to perform classification.

Each DNN has an input layer of 28 x 28 = 784 neurons and an outputlayer of 10 neurons.

The number of hidden layers for each DNN is randomly sampled from theset of {3, 4, 5} and at each hidden layer, the number of neurons areuniformly selected from 20 to 100.

Every DNN is trained until an accuracy of at least 97.0% is reached on theMNIST validation data

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 37 / 47

Page 38: Feature-Guided Black-Box Safety Testing of Deep Neural ...

ExperimentsHypothesis 1: Neuron Coverage is Easy

In particular, a test suite with high neuron coverage is not sufficient toincrease confidence in the neural network in safety-critical domains.

To demonstrate this, for each DNN tested, they randomly pick 25 imagesfrom the MNIST test dataset.

For each selected image, to maximize the neuron coverage, if an inputneuron is not activated (i.e., its activation value is equal to 0), we sampleits value from [0, 0.1].

Then we measure the neuron coverage of the DNN by using the generatedtest suite of 25 images. As a result, for all 10 DNNs, we obtain almost100% neuron coverage.

Simple experiment here demonstrates that it is straight-forward to obtaina trivial test suite that has high neuron coverage but does not provide anyadversarial examples.Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 38 / 47

Page 39: Feature-Guided Black-Box Safety Testing of Deep Neural ...

ExperimentsHypothesis 2: Random Testing is Inefficient

For each DNN, we first choose an image from the MNIST test dataset,denoted by x.

Subsequently, we randomly sample 105 inputs in the region bounded by x+- 0.1, and we check whether an adversarial example exists for the originalimage x.

Using this process, we have obtained adversarial examples for only a singleone of the 10 DNNs, for the other nine DNNs, we did not observe anyadversarial examples among the 105 randomly generated images.

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 39 / 47

Page 40: Feature-Guided Black-Box Safety Testing of Deep Neural ...

ExperimentsSS, SV, DS, and DV Cover

1- DNN Bug finding

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 40 / 47

Page 41: Feature-Guided Black-Box Safety Testing of Deep Neural ...

ExperimentsSS, SV, DS, and DV Cover

2- DNN safety analysis

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 41 / 47

Page 42: Feature-Guided Black-Box Safety Testing of Deep Neural ...

ExperimentsSS, SV, DS, and DV Cover

2-DNN safety analysis

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 42 / 47

Page 43: Feature-Guided Black-Box Safety Testing of Deep Neural ...

ExperimentsSS, SV, DS, and DV Cover

2- DNN safety analysis

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 43 / 47

Page 44: Feature-Guided Black-Box Safety Testing of Deep Neural ...

ExperimentsSS, SV, DS, and DV Cover

3- SS Cover with top weights

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 44 / 47

Page 45: Feature-Guided Black-Box Safety Testing of Deep Neural ...

ExperimentsSS, SV, DS, and DV Cover

4- Layerwise behavior

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 45 / 47

Page 46: Feature-Guided Black-Box Safety Testing of Deep Neural ...

ExperimentsSS, SV, DS, and DV Cover

3- Cost of LP call

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 46 / 47

Page 47: Feature-Guided Black-Box Safety Testing of Deep Neural ...

ExperimentsConvolutional Neural Networks

4- Convolutional Neural Networks

Youcheng Sun, Xiaowei Huang, and Daniel Kroening Arxiv 2018 (Arxiv 2018)Feature-Guided Black-Box Safety Testing of Deep Neural Networks05 May 2018 47 / 47