Output Range Analysis for Deep Feedforward Neural Networks Souradeep Dutta 1 , Susmit Jha 2 , Sriram Sankaranarayanan 1 and Ashish Tiwari 2 . 1. University of Colorado, Boulder, USA. 2. SRI International, Menlo Park, USA. {souradeep.dutta,sriram.sankaranarayanan}@colorado.edu, {tiwari,susmit.jha}@csl.sri.com Abstract. Given a neural network (NN) and a set of possible inputs to the net- work described by polyhedral constraints, we aim to compute a safe over-approximation of the set of possible output values. This operation is a fundamental primitive enabling the formal analysis of neural networks that are extensively used in a variety of machine learning tasks such as perception and control of autonomous systems. Increasingly, they are deployed in high-assurance applications, leading to a compelling use case for formal verification approaches. In this paper, we present an efficient range estimation algorithm that iterates between an expensive global combinatorial search using mixed-integer linear programming problems, and a relatively inexpensive local optimization that repeatedly seeks a local op- timum of the function represented by the NN. We implement our approach and compare it with Reluplex, a recently proposed solver for deep neural networks. We demonstrate applications of our approach to computing flowpipes for neural network-based feedback controllers. We show that the use of local search in con- junction with mixed-integer linear programming solvers effectively reduces the combinatorial search over possible combinations of active neurons in the network by pruning away suboptimal nodes. 1 Introduction Deep neural networks have emerged as a versatile and popular representation for ma- chine learning models. This is due to their ability to approximate complex functions, as well as the availability of efficient methods for learning these from large data sets. The black box nature of NN models and the absence of effective methods for their analy- sis has confined their use in systems with low integrity requirements. However, more recently, deep NNs are also being adopted in high-assurance systems, such as auto- mated control and perception pipeline of autonomous vehicles [13] or aircraft collision avoidance [12]. While traditional system design approaches include rigorous system verification and analysis techniques to ensure the correctness of systems deployed in safety-critical applications [1], the inclusion of complex machine learning models in the form of deep NNs has created a new challenge to verify these models. In this pa- per, we focus on the range estimation problem, wherein, given a neural network N and a polyhedron φ (x) representing a set of inputs to the network, we wish to estimate a range, denoted as range(l i , φ ), for each of the network’s output l i that subsumes all
17
Embed
Output Range Analysis for Deep Feedforward Neural Networks
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Output Range Analysis for Deep Feedforward Neural
Networks
Souradeep Dutta1, Susmit Jha2, Sriram Sankaranarayanan1 and Ashish Tiwari 2.
Abstract. Given a neural network (NN) and a set of possible inputs to the net-
work described by polyhedral constraints, we aim to compute a safe over-approximation
of the set of possible output values. This operation is a fundamental primitive
enabling the formal analysis of neural networks that are extensively used in a
variety of machine learning tasks such as perception and control of autonomous
systems. Increasingly, they are deployed in high-assurance applications, leading
to a compelling use case for formal verification approaches. In this paper, we
present an efficient range estimation algorithm that iterates between an expensive
global combinatorial search using mixed-integer linear programming problems,
and a relatively inexpensive local optimization that repeatedly seeks a local op-
timum of the function represented by the NN. We implement our approach and
compare it with Reluplex, a recently proposed solver for deep neural networks.
We demonstrate applications of our approach to computing flowpipes for neural
network-based feedback controllers. We show that the use of local search in con-
junction with mixed-integer linear programming solvers effectively reduces the
combinatorial search over possible combinations of active neurons in the network
by pruning away suboptimal nodes.
1 Introduction
Deep neural networks have emerged as a versatile and popular representation for ma-
chine learning models. This is due to their ability to approximate complex functions, as
well as the availability of efficient methods for learning these from large data sets. The
black box nature of NN models and the absence of effective methods for their analy-
sis has confined their use in systems with low integrity requirements. However, more
recently, deep NNs are also being adopted in high-assurance systems, such as auto-
mated control and perception pipeline of autonomous vehicles [13] or aircraft collision
avoidance [12]. While traditional system design approaches include rigorous system
verification and analysis techniques to ensure the correctness of systems deployed in
safety-critical applications [1], the inclusion of complex machine learning models in
the form of deep NNs has created a new challenge to verify these models. In this pa-
per, we focus on the range estimation problem, wherein, given a neural network N and
a polyhedron φ(x) representing a set of inputs to the network, we wish to estimate
a range, denoted as range(li,φ), for each of the network’s output li that subsumes all
possible outputs and is tight within a given tolerance δ . We restrict our attention to feed-
forward deep NNs. While we focus on NNs that use rectified linear units (ReLUs) [17]
as activation functions, we also discuss extensions to other activation functions through
piecewise linear approximations.
Our approach is based on augmenting a mixed-integer linear programming (MILP)
solver. First of all, we use a sound piecewise linearization of the nonlinear activation
function to define an encoding of the neural network semantics into mixed-integer con-
straints involving real-valued variables and binary variables that arise from the (piece-
wise) linearized activation functions. The encoding into MILP is a standard approach to
handling piecewise linear functions [28]. As such, the input constraints φ(x) are added
to the MILP and next, the output variable is separately maximized and minimized to
infer a range. Our approach combines the MILP solver with a local search that exploits
the local continuity and differentiability properties of the function represented by the
network. These properties are not implicit in the MILP encoding that typically relies on
a branch-and-cut approach to solve the problem at hand. On the other hand, local search
alone may get “stuck” in local minima. Our approach handles local minima by using
the MILP solver to search for a solution that is “better” than the current local minimum
or conclude that no such solution exists. Thus, by alternating between inexpensive local
search iterations and relatively expensive MILP solver calls, we seek an approach that
can exploit local properties of the neural network function but at the same time avoid
the problem of local minima.
The range estimation problem has several applications. For instance, a safety fo-
cused application of the range estimation problem arises when we have deep neural
networks implementing a controller. In this case, the range estimation problem enables
us to prove bounds on the output of the NN controller. This is important because out-of-
bounds outputs can drive the physical system into undesirable configurations, such as
the locking of robotic arm, or command a car’s throttle beyond its rated limits. Finding
these errors through verification will enable design-time detection of potential failures
instead of relying on runtime monitoring which can have significant overhead and also
may not allow graceful recovery. Additionally, range analysis can be useful in proving
the safety of a closed loop system by integrating the action of a neural network con-
troller with that of a plant model. In this paper, we focus on the application of range
estimation problem to proving safety of several neural network plant models along with
neural network feedback controllers. Other applications include proving the robustness
of classifiers by showing that all possible input perturbations within some range do not
change the output classification of the network.
Related Work The importance of analytical certification methods for neural networks
has been well-recognized in literature. Neural networks have been observed to be very
sensitive to slight perturbations in their inputs producing incorrect outputs [26, 21]. This
creates a pressing need for techniques to provide formal guarantees on the neural net-
works. The verification of neural networks is a hard problem, and even proving simple
properties about them is known to be NP-complete [14]. The complexity of verifying
neural networks arises primarily from two sources: the nonlinear activation functions
used in the network as elementary neural units and the structural complexity that can
be measured using depth and size of the network. Kurd [16] presented one of the first
categorization of verification goals for NNs used in safety-critical applications. The
proposed approach here targets a subset of these goals, G4 and G5, which aim at ensur-
ing robustness of NNs to disturbances in inputs, and ensuring the output of NNs are not
hazardous.
Recently, there has been a surge of interest in formal verification tools for neural
networks [14, 10, 23, 22, 30, 31, 8, 25, 18]. A detailed discussion of these approaches to
neural networks with piecewise linear activation functions, and empirical evaluations
over benchmark networks has been carried out by Bunel et al [5]. Our approach relies
on a piecewise linearization of the nonlinear activation function. This idea has been
studied in the past, notably by Pulina et al [22, 23]. The key differences include: (a) our
approach do not perform a refinement operation. As such, no refinement is needed for
networks with piecewise linear activation functions, since the activation functions are
encoded precisely. For other kinds of functions such as sigmoid or tanh, a refinement
may be needed to improve the inferred ranges, but is not considered in our work. (b)
We do not rely on existing Satisfiability-Modulo Theory (SMT) solvers [2]. Instead, our
approach uses a mixed-integer linear programming (MILP) solver in combination with
a local search. Recently, Lomuscio and Maganti present an approach that encodes neu-
ral networks into MILP constraints [18]. A similar encoding is also presented by Tjeng
and Tedrake [27] for verifying robustness of neural network classifiers under a class
of perturbations. These encodings are similar to ours. The optimization problems are
solved directly using an off-the-shelf MILP solver [28, 4]. Additionally, our approach
augments the MILP solver with a local search scheme. We note that the use of local
search can potentially speed up our approach, since neural networks represent contin-
uous, piecewise-differentiable functions. On the flip side, these functions may have a
large number of local minima/maxima. Nevertheless, depending on the network, the
function it approximates and the input range, the local search used in conjunction with
a MILP solver can yield rapid improvements to the objective function.
Augmenting existing LP solvers has been at the center of two recent approaches
to the problem. The Reluplex approach by Katz et al focuses on ReLU feed-forward
networks [14]. Their work augments the Simplex algorithm with special functions and
rules that handle the constraints involving ReLU activation functions. The linear pro-
gramming used for comparison in Reluplex performs significantly less efficiently ac-
cording to the experiments reported in this paper [14]. Note, however, that the scenarios
used by Katz et al. are different from those studied here, and were not publicly available
for comparison at the time of writing. Ehlers augments a LP solver with a SAT solver
that maintains partial assignments to decide the linear region for each individual neuron.
The solver is instantiated using facts inferred from a convexification of the activation
function [8], much in the style of conflict clauses and lemmas used by SAT solvers. In
fact, many ideas used by Ehlers can be potentially used to complement our approach
in the form of cuts that are specific to neural networks. Such specialized cuts are very
commonly used in MILP solvers.
A related goal of finding adversarial inputs for deep NNs has received a lot of at-
tention, and can be viewed as a testing approach to NNs instead of verification method
discussed in this paper. A linear programming based approach for finding adversarial
inputs is presented in [3]. A related approach for finding adversarial inputs using SMT
solvers that relies on a layer-by-layer analysis is presented in [10]. Simulation-based
approaches [30] for neural network verification have also been proposed in literature.
This relies on turning the reachable set estimation problem into a neural network max-
imal sensitivity computation, and solving it using a sequence of convex optimization
problems. In contrast, our proposed approach combines numerical gradient-based opti-
mization with mixed-integer linear programming for more efficient verification.
Contributions We present a novel algorithm for propagating convex polyhedral in-
puts through a feedforward deep neural network with ReLU activation units to estab-
lish ranges for the outputs of the network. We have implemented our approach in a
tool called SHERLOCK [6]. We compare SHERLOCK with a recently proposed deep NN
verification engine - Reluplex [14]. We demonstrate the application of SHERLOCK to
establish output range of deep NN controllers. Our approach seems to scale consistently
to neural networks having 100 neurons to as many as over 6000 neurons.
2 Preliminaries
We present the preliminary notions including deep neural networks, polyhedra, and
mixed integer linear programs.
We will study feed forward neural networks (NN) throughout this paper with n > 0
inputs and m > 0 outputs. For simplicity, we will present our techniques primarily for
the single output case (m = 1), explaining how they can be extended to networks with
multiple outputs.
Let x 2 Rn denote the inputs and y 2 R be the output of the network. Structurally,
a NN N consists of k > 0 hidden layers, wherein we assume that each layer has the
same number of neurons N > 0. We use Ni j to denote the jth neuron of the ith layer for
j 2 {1, . . . ,N} and i 2 {1, . . . ,k}.
Definition 1 (Neural Network). A k layer neural network with N neurons per hidden
layer is described by matrices: (W0,b0), . . . ,(Wk�1,bk�1),(Wk,bk), wherein (a) W0,b0
are N⇥ n and N⇥ 1 matrices denoting the weights connecting the inputs to the first
hidden layer, (b) Wi,bi for i 2 [1,k� 1] connect layer i to layer i+ 1 and (c) Wk,bk
connect the last layer k to the output.
Each neuron is defined using its activation function σ linking its input value to the
output value. Although this can be any function, there are a few common activation
functions:
1. ReLU: The ReLU unit is defined by the activation function σ(z) : max(z,0).2. Sigmoid: The sigmoid unit is defined by the activation function σ(z) : 1
1+e�z .
3. Tanh: The activation function for this unit is σ(z) : tanh(z).Figure 1 shows these functions graphically. We will assume that all the neurons of
the network N have the same activation function σ . Furthermore, we assume that σ is
a continuous function and differentiable almost everywhere.
Given a neural network N as described above, the function F : Rn! R computed
by the neural network is given by the composition F := Fk � · · · �F0 wherein Fi(z) :
x
σ(x)
-9 -6 -3 0 3 6 9
-1
1tanh(z)
sigmoid(z)
ReLU(z)
Fig. 1: Activation functions commonly used in neural networks.
σ(Wiz+bi) is the function computed by the ith hidden layer, F0 the function linking the
inputs to the first layer, and Fk linking the last layer to the output.
For a fixed input x, it is easily seen that the function F computed by a NN N
is continuous and nonlinear, due to the activation function σ . For the case of neu-
ral networks with ReLU units, this function is piecewise affine, and differentiable al-
most everywhere in Rn. For smooth activation functions such as tanh and sigmoid, the
function is differentiable as well. If it exists, we denote the gradient of this function
∇F : (∂x1F, . . . , ∂xn F). Computing the gradient can be performed efficiently (as de-
scribed subsequently).
2.1 Mixed Integer Linear Programs
Throughout this paper, we will formulate linear optimization problems with integer
variables. We briefly recall these optimization problems, their computational complex-
ity and solution techniques used in practice.
Definition 2 (Mixed Integer Program). A mixed integer linear program (MILP) in-
volves a set of real-valued variables x and integer valued variables w of the following
form:
max aT x+bT w
s.t. Ax+Bw c
x 2 Rn, w 2 Z
m
The problem is called a linear program (LP) if there are no integer variables w. The
special case wherein w 2 {0,1}m is called a binary MILP. Finally, the case without an
explicit objective function is called an MILP feasibility problem.
It is well known that MILPs are NP-hard problems: the best known algorithms,
thus far, have exponential time worst case complexity. We will later briefly review the
popular branch-and-cut class of algorithms for solving MILPs at a high level. These
algorithms along with the associated heuristics underlie highly successful, commercial
MILP solvers such as Gurobi [9] and CPLEX [11].
3 Problem Definition and MILP Encoding
Let N be a neural network with inputs x 2 Rn, output y 2 R and weights (W0,b0), . . .,
(Wk,bk), activation function σ for each neuron unit, defining the function FN : Rn!R.
Definition 3 (Range Estimation Problem). The problem is defined as follows:
– INPUTS: Neural network N , and input constraints P : Ax b that is compact: i.e,
closed and bounded in Rn. A tolerance parameter is a real number δ > 0.
– OUTPUT: An interval [`,u] such that (8 x2 P) FN(x)2 [`,u]. I.e., [`,u] contains the
range of FN over inputs x 2 P. Furthermore, the interval is δ -tight:
u�δ maxx2P
FN(x) and `+δ �minx2P
FN(x) .
Without loss of generality, we will focus on estimating the upper bound u. The case
for the lower bound will be entirely analogous.
3.1 MILP Encoding
We will first describe the MILP encoding when σ is defined by a ReLU unit. The
treatment of more general activation functions will be described subsequently. The real-
valued variables of the MILP are as follows:
1. x 2 Rn: the inputs to the network with n variables.
2. z1, . . . ,zk�1, the outputs of the hidden layer. Each zi 2 RN .
3. y 2 R: the overall output of the network.
Additionally, we introduce binary (0/1) variables t1, . . . , tk�1, wherein each vector
ti 2 ZN (the same size as zi). These variables will be used to model the piecewise
behavior of the ReLU units.
Next, we encode the constraints. The first set of constraints ensure that x 2 P. Sup-
pose P is defined as Ax b then we simply add the constraints C0 : Ax b.
For each hidden layer i, we require that zi+1 = σ(Wizi +bi). Since σ is not linear,
we use the binary variables ti+1 to encode the same behavior: