Tema 2 AVD Introducción Software Testing Generación basada en ejecución simbólica Repaso: Semántica estándar Semántica simbólica Generación de casos de prueba Avances y retos Lazy initialization Abstract subsumption Concolic execution Bibliografía Tema 2. Validación Análisis, Validación y Depuración (AVD) Germán Vidal, Alicia Villanueva DSIC, ETSInf Curso 2015-2016
115
Embed
Tema 2 AVD Software Testing Generación Tema 2. Validación · 2016. 6. 19. · Software testing automático como parte del proceso de validación del producto. Tema 2 AVD Introducción
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
ValidaciónProceso de revisión que verifica que el sistema de softwareproducido cumple con las especificaciones y que logra sucometido.
• Se llama validación al conjunto de distintas técnicasaplicadas con este fin:• Validación externa• Software testing en distintas fases de desarrollo• Validación informal/manual• . . .
Algunos datos que apoyan su necesidad (1/2):Barron 2002 Aproximadamente el 22 % de los PCs y el
25 % de los portátiles fallan cada año, encomparación al 9 % de los VCRs, el 7 % de lasgrandes pantallas de televisión, el 7 % de lassecadoras de ropa, y el 8 % de las neveras.
US 2002 Los defectos de software están costando a laeconomía de EEUU una estimación de 59.5billones de dólares cada año (Estudio federalllevado a cabo por RTI).
Beizer 1990 Del trabajo empleado para desarrollar unprograma que funcione, normalmente el 50 %se utiliza en tareas de testing y depuración.
Algunos datos que apoyan su necesidad (2/2):Hailpern 2002 Las actividades de validación (depuración,
testing y verificación) pueden fácilmenteabarcar desde el 50 % al 75 % del coste totalde desarrollo
Gould 1975 De un grupo de programadoresexperimentados, los tres mejores en tareas dedepuración eran capaces de encontrardefectos en el 30 % de tiempo del empleadopor los tres peores, y cometían solo un 40 %de los errores.
RTI 2002 Los desarrolladores estiman que las mejorasen testing y depuración podrían reducir elcoste de los errores de software en un tercio
Software testingReglas básicas del software testing (Zeller 2005):
• Especificar: Un programa no puede ser correcto por símismo, lo será con respecto a una especificación quedescriba su objetivo.
• Prueba temprana: No se debe esperar a que esté elsistema desarrollado completamente para probarlo.
• Prueba primero: Escribe los casos de prueba antes deimplementar la unidad de código correspondiente. Los casosde prueba pueden servir como especificación.
• Prueba con frecuencia: Al menos se debe probar con cadaentrega del software. Lo ideal es probar con cada cambiodel programa (automatización).
• Prueba suficiente: Mide la cobertura de las pruebas.
• Que prueben otros: Probar en busca de problemas es unproceso destructivo. Una persona independiente será másobjetiva haciendo las pruebas.
Alcance del testingSe debe probar en distintas fases del desarrollo desoftware:• pruebas unitarias: se prueban pequeños trozos• pruebas de integración: se prueba que distintos trozos
de código (que pueden haber sido desarrollados pordistintos programadores) funcionan bien en conjunto
• pruebas de sistema: se prueba la funcionalidad delsistema
• pruebas de aceptación: el usuario prueba que elsistema cumple con sus necesidades
• pruebas de regresión: se realizan durante la fase demantenimiento, tras haber realizado algún cambio oactualización. El objetivo es comprobar que el sistemasigue cumpliendo con la funcionalidad anterior.
• pruebas de estrés: se prueba el sistema bajocondiciones extremas.
• Pruebas de caja blanca (o transparente): se depurainspeccionando los detalles internos del sistema (elcódigo).
En este contexto es posible analizar los caminos deejecución: secuencias de puntos de control einstrucciones que aparecen en el código.
• Pruebas de caja negra: no tiene acceso al código delprograma, por lo que usa la información conocida através de la interfaz del sistema.
Para las pruebas tempranas (unidad o integración) sueleseguirse la técnica de caja blanca, mientras que para laspruebas de aceptación es más común seguir la técnica decaja negra.
Criterio de coberturaCriterio que permite definir conjuntos de ejecucionesposibles de forma que las ejecuciones en un mismoconjunto puede que tengan el mismo error
• IDEA: En la fase de prueba se probará al menos unaejecución de cada conjunto.
• Llamamos a cada ejecución probada caso de prueba.• Cobertura total: cuando se prueba un caso de prueba
por cada conjunto definido por el criterio de cobertura.• Tener cobertura total no garantiza que no hayan
• Basados en el grafo de control de flujo:• Cobertura de sentencia• Cobertura de arco (también de rama o de decisión)• Cobertura de condición• Cobertura de condición/decisión• Cobertura de condición múltiple• Cobertura de camino
• Basados en el flujo de datos:• Análisis de variables vivas• Se analiza el código hacia atrás• all-defs, all-uses, etc.
Escribe un conjunto de casos de prueba con el que seobtenga cobertura total según los criterios vistos:• Cobertura de sentencia:• Cobertura de arco:• Cobertura de condición:• Cobertura de arco/condición:• Cobertura de condición múltiple:
• Los criterios basados en el grafo de flujo de controltienen algunas limitaciones• Aunque obtengamos una cobertura total, pueden haber
errores no detectados (el testing nunca garantiza laausencia de errores)
• Como se usa la estructura interna del programa, laprueba puede estar condicionada: se prueba conrespecto a cómo se ha implementado, no a cómo sedebería haber hecho.
• Es difícil analizar la idoneidad de los criterios decobertura en general. Deberíamos saber cómo es unprograma tipo.
• Ninguno de los criterios anteriores trata bien los bucles(excepto el de camino pero es poco útil en la práctica)
• Existen algunas estrategias ad-hoc para probar bucles:• Probar el caso donde no se entra en el bucle• Probar el caso donde se ejecuta una vez el cuerpo del
bucle• Probar el caso donde se ejecuta un número típico de
veces el cuerpo del bucle• Si se sabe que el bucle tiene un límite de iteraciones n,
probar el caso para n − 1 iteraciones y para n + 1iteraciones
• Todo se complica cuando hay bucles anidados: elnúmero de casos de prueba necesarios puede crecerexponencialmente.• Se renuncia a ser exhaustivo cuando se tienen bucles
• Se basan en cómo se usan las variables en lassentencias del programa.
• Ejemplo de criterio básico:
Cobertura all-defsPara cada variable x y sentencia s, se incluye un caminoque alcance un nodo s′ en el que se use la variable x (porejemplo su valor es asignado a otra) y s′ es alcanzabledesde s, siendo que en s se define x y que en los nodosintermedios no se redefine x .
• Una de las tareas más costosas es la definición decasos de prueba que garanticen una buena cobertura
• Se puede automatizar la definición de esos casos deprueba explorando los grafos de flujo de control/datossiguiendo alguna estrategia
Idea basePara cada camino debe encontrarse una asignación devalores a variables que garantice la ejecución de dichocamino.• Estrategia: propagación de las condiciones que
• Evaluación de una expresión. Se construye un árbol dederivación.
EjemploDado el estado σ0 = {X 7→ 0,Y 7→ 0, . . .} en el que todavariable tiene asignado el entero 0, desarrolla la evaluaciónde la expresión a ≡ (Y + 5) + (7 + 9).
• Ejecución de instrucciones:• La ejecución de una instrucción implica un cambio de
estado• En el estado inicial σ0 de la ejecución de un programa
todas las variables tienen asignado el valor 0:σ0(X ) = 0 para toda variable X
• La ejecución de un programa puede terminar o diverger(no se alcanza un estado final)
• Definiremos la relación 〈c, σ〉 → 〈c′, σ′〉 que significa:la ejecución de paso pequeño de la instrucción c en elestado σ nos lleva a tener que ejecutar la instrucción c′
en el estado σ′
donde σ′ es el estado resultado de actualizar σ con losefectos consecuencia de la ejecución de c.
Escribe una versión del condicional donde en el mismopaso se evalúe la condición y además se empiece aejecutar la rama correspondiente (se dé un paso).
• Tendrás que considerar el caso en el que la ramacorrespondiente no pueda avanzar en su ejecución(caso skip)
Ejecución estándarDado un programa, su ejecución consiste en aplicar lasreglas de la semántica estándar desde el estado inicialhasta alcanzar (si existe) el estado final.
EjemploEscribe la traza de la ejecución del siguiente programa:
• A diferencia de la ejecución estándar, el estado inicialno contiene valores asignados a variables, sinovalores simbólicos.
Ejecución simbólicaDado un programa, su ejecución simbólica consiste enexplorar los caminos de ejecución posibles, de forma que elresultado de dicha ejecución no es un estado final, sinotodos los estados finales posibles junto con lasrestricciones que deben satisfacerse para ser alcanzados
• Condicional (2/2):Cuando no podemos probar p ⇒ exp ni p ⇒ ¬exp, segenera una bifurcación en el árbol mediante laejecución de las siguientes dos reglas:
〈b, σ〉 → exp〈if b then c0 else c1, σ,p〉 → 〈c0, σ,p ∧ exp〉
〈b, σ〉 → exp〈if b then c0 else c1, σ,p〉 → 〈c1, σ,p ∧ ¬exp〉
• Podemos usar un motor lógico para comprobar sip ∧ exp (y p ∧ ¬exp) son satisfacibles. Así podemospodar ejecuciones imposibles.
• Condicional (2/2):Cuando no podemos probar p ⇒ exp ni p ⇒ ¬exp, segenera una bifurcación en el árbol mediante laejecución de las siguientes dos reglas:
〈b, σ〉 → exp〈if b then c0 else c1, σ,p〉 → 〈c0, σ,p ∧ exp〉
〈b, σ〉 → exp〈if b then c0 else c1, σ,p〉 → 〈c1, σ,p ∧ ¬exp〉
• Podemos usar un motor lógico para comprobar sip ∧ exp (y p ∧ ¬exp) son satisfacibles. Así podemospodar ejecuciones imposibles.
• Necesario para los lenguajes de programaciónmodernos
• Lazy initialization:• Inicializa los componentes de las entradas de los
métodos bajo demanda, sin limitar tamaños máximosde entrada.
• Se exploran las distintas posibilidades (bifurcación en elárbol de ejecución), por ejemplo cuando se accede aun puntero habrá tres casos: puntero nulo, que seacceda a un nuevo objeto o a uno ya existente
Lazy initialization. EstrategiaSe inicializan los campos simbólicos la primera vez que elcódigo accede a ellos durante la ejecución simbólica Paraejecutar el método m de la clase C:
1 Se crea un objeto o de la clase C con campos noinicializados
2 Se invoca o.m() y se sigue la semántica pero:• si encontramos un campo no inicializado, si se trata de
un campo que contiene una referencia f, de forma nodeterminista se inicializa f a:• null, o• un nuevo objeto de la clase T (con campos no
inicializados), o• un objeto existente de la clase T (creado previamente)
Si se trata de un campo de tipo primitivo, se inicializa aun valor simbólico
• si encontramos un punto de elección con una condiciónque implica campos primitivos, de forma nodeterminista actualiza la path condition con la condicióno con su negación
EjemploGeneralized Symbolic Execut ion for Model Checking and Test ing 559
1
2
3
} in stmt 4Initialize "t.next"
} in stmt 2Initialize "next.elem"
} in stmt 2Initialize "elem"
} in stmt 1Initialize "next"
4
5
X
2
?
......
next
X
X
...t
next
t
nextnext
...
PC: E0<=E1, ...
t...
next nextE0 E1 E0 E1
t ...nextnextnext
tnext next
next
tnext nextnext
E0 E1 ?
?E1E0
E0 E1E1E0
tnext next
PC: E0>E1, next next
E0 E1
E0 E1
next nextE0 E1
next nextE0 ?
next next
next next next
? ?
? ? ? ?
?
Fig. 4. Symbolic execut ion t ree (excerpt s), using notat ion described in Sect ion 2
and corresponds to either execut ion of a statement of swapNode or to a lazy ini-t ializat ion step. Branching in the t ree corresponds to a nondeterminist ic choicethat is int roduced to handle aliasing or build a path condit ion.
The algorithm creates a new node object and invokes swapNode on the object .Line (1) accesses the uninit ialized next field and causes it to be init ialized.The algorithm explores three possibilit ies: either the field is null or the fieldpoints to a new symbolic object or the field points to a previously created objectof the same type (with the only opt ion being it self). Intuit ively, this meansthat , at this point in the execut ion, we make three di�erent assumpt ions aboutthe configurat ion of the input list , according to di�erent aliasing possibilit ies.Another init ializat ion happens during execut ion of statement (4), which result sin four possibilit ies, as there are two Node objects at that point in the execut ion.
When a condit ion involving primit ive fields is symbolically executed, e.g.,statement (2), the execut ion t ree has a branch corresponding to each possibleoutcome of the condit ion’s evaluat ion. Evaluat ion of a condit ion involving refer-ence fields does not cause branching unless uninit ialized fields are accessed.
If swapNode has the precondit ion that it s input should be acyclic, the al-gorithm does not explore the t ransit ions marked with an “X”. The input listcorresponding to the output list pointed to by t in the bot tom most t ree nodeis shown on the bot tom row of Figure 1.
• Con la lazy initialization podemos gestionar estructurasde datos dinámicas, pero en general no se controla ladetección de bucles• Comprobar si dos estados hacen matching es
indecidible en general cuando no imponemos un límiteal tamaño de las estructuras de datos dinámicas
• Para garantizar terminación, se recurre a la limitacióndel número de iteraciones en los bucles
Ejemplo (2/2)168 S. Anand, C.S. Pasareanu, and W. Visser
n
n
n
n
n
n
n
n
n
n
this nextnext
nextnextthis
this
this
PC: true
next
nextthis
next
this next next next
thisnext
nextnext next nextthisnull
n
next next nextthis
next next next
n
this next next nextthis
n
next null
next
nullnext next
this
this
Summary
}
}at line 3Update PC
Initialize nextat line 4
Matched
v2v1
v1 v2
v1
v1
v1 v2 v3
v1
v4v2v1 v2v1 v3
v1 v2 v3 v1 v2 v3
v1
v1
v2
v2v1
s6
v1 ! v " v2 ! v
s4
v1 ! v
s1
v1 ! v " v2 > v
v1 > v
v3
v1 ! v " v2 ! v " v3 > v
s2
v1 ! v
s7
s12
v1 ! v " v2 ! v
s10
s8
s11
v1 ! v " v2 ! v
s9
v1 ! v
s3
s5
v1 ! v " v2 ! v " v3 ! v v1 ! v " v2 ! v " v3 ! v
s13
v1 ! v " v2 ! v " v3 ! v
Fig. 2. State space generated during symbolic execution of find (excerpts)
to a “cloud”, while on the other branch, the cloud is replaced with null (e.g.states s4 and s5). Note that if we wouldn’t have imposed the precondition thatthe input list is acyclic, there had been a third branch corresponding to next
pointing to itself.The (symbolic) state space for this example is infinite and there is no sub-
sumption between the generated symbolic states. However, if we use abstraction,the symbolic state space becomes finite. The list abstraction summarizes con-tiguous node segments that are not pointed to by local variables into a summarynode. Since the number of local variables is finite, the number of abstract heapconfigurations is also finite. For the example, two nodes in state s12 are mappedto a summary node. As a result, the abstract state is subsumed by previouslystored state s8, at which point the model checker backtracks. The analysis ter-minates reporting that there are no null pointer exceptions. Note that due toabstract matching, the model checker might miss feasible behaviors. However,for this example, the abstraction is in fact exact – there is no loss of precision
• Los estados simbólicos representan conjuntos deestados concretos
• Un estado simbólico s1 subsume a otro s2 si elconjunto de estados representados por s1 es unsuperconjunto de los representados por s2
IdeaSe representa el heap como un grafo de forma que cadaobjeto o variable que contenga una referencia será un nododel grafo y los arcos las relaciones entre los objetos y lasreferencias.• nodos especiales: uninit y null
Los estados contienen el heap, el punto de ejecución delprograma y la path condition:
Subsunción de heaps simbólicosUn heap H2 subsume a otro H1 (H1 v H2) sii el conjunto deheaps concretos representados por H2 contiene a losrepresentados por H1
• Sólo se comprueba subsunción de estados cuandoestán en el mismo punto de ejecución
• Para que un estado s1 sea subsumido por otro (másgeneral) s2, tiene que cumplirse que su path conditiondebe implicar (ser más fuerte) la de s2
8 Saswat Anand et al.: Symbolic Execution with Abstraction
null null
left right
null null
rightleft
null null null nullnull
left right
rightleft
null
rightleft
left
left right
right rightleft
rightleft
unmatchedmatched
! "!"#
Fig. 4. Matched and unmatched heap shapes
where l is a labeling l : NHO ! [{nolbl} and vs denotes
all the symbolic names that are used in symbolic states; this includes both the values stored in the heap andthe values that appear in the path condition.
As an example, consider two symbolic states in Fig-ure 5, where s1 is subsumed by s2. The matched nodesfrom the two heaps have matching labels l1 and l2. Thevaluations for the two nodes labeled l1 and l2 in the left-hand side list are e1 = v1 and e2 = v3 respectively; e1
and e2 are the names computed by the function fn. Sim-ilarly, valuations for the two nodes with labels l1 and l2in the right-hand side list are e1 = v1 and e2 = v2 respec-tively. The state constraint for s1 and s2 are respectively9v1, v3, v2 : e1 = v1 ^ e2 = v3 ^ v1 < v3 ^ v3 < v2 and9v1, v2, v5 : e1 = v1 ^ e2 = v2 ^ v1 ∑ v5 ^ v5 ∑ v2. Notethat the path conditions may contain symbolic valuesthat are not stored in the heap (e.g. v5 in s2) accordingto the program path that led to the symbolic state.
Definition 6. Let Sol(SCls) denote the set of satisfying
solutions for the state constraint SCls of state s for a la-
beling l. SCls2
subsumes SCls1
iÆ Sol(SCls1
) µ Sol(SCls2
)
for same labeling l : NH1
O [ NH2
O ! [ {nolbl}.
Since in general, it may be computationally expen-sive/impossible to enumerate all the solutions of SC1
and SC2 and check for set inclusion, we rather checkSC1 ) SC2, which if valid ensures that Sol(SC1) µSol(SC2).
Now we combine the definitions of heap shape sub-sumption and state constraint subsumption to definestate subsumption as follows:
Definition 7. A state s1 is subsumed by another states2 (or s2 subsumes s1) iÆ H2 wl H1 and SCl
s1) SCl
s2.
In the example from Figure 5, as described before,Algorithm 1 returns true indicating that the heap shapeof s2 subsumes that of s1. Matching nodes from the twostates are labeled with l1 and l2. Notice that the thirdnode in s1 is not labeled due to the uninit node in s2.To check for subsumption of state constraints we check
if the implication between the state constraint of s2 andthat of s1 is valid. State constraint of s1 and s2 simplifiesto e1 < e2 and e1 <= e2 respectively. Since e1 < e2 )e1 <= e2 is valid, s2 subsumes s1.
The complexity for one subsumption step includesthe complexity of heap traversal (O(n) where n is thesize of the heap) and the complexity for checking nu-meric constraints. While the cost of checking numericalconstraints cannot be avoided, we believe that the costof heap traversal can be somewhat alleviated if it is per-formed during garbage collection. However we need toexperiment further with this idea.
5 Symbolic Execution with (Abstract)Subsumption Checking
Algorithm 2 illustrates the procedure for performing sym-bolic execution with (abstract) subsumption checking.The procedure checks if the input program P can reachan error state ¡ from initial state s0. The procedure usesa depth first search order state exploration and it main-tains a set of VisitedStates for the states visited so farand a Stack for storing the states to be explored. Theprocedure is similar to “classical” model checking stateexploration, except that the explored states are sym-bolic, rather than concrete. The path condition on nu-meric data is checked for satisfiability to ensure explo-ration of feasible paths.
As discussed, we use state subsumption to determineif a state was visited before. Performing symbolic exe-cution and subsumption checking during model check-ing may yield an unbounded number of symbolic statesspace. Therefore, we use abstractions to limit the modelchecker’s search space. For each explored symbolic states0, the model checker computes an abstract state Æ(s0),which is then stored for state comparison. Subsumptionchecking is used to compare the abstracted states, to de-termine if an abstract state is being re-visited. This ef-fectively explores an under-approximation of the feasiblepaths through the program. Therefore, all the reported
Saswat Anand et al.: Symbolic Execution with Abstraction 9
thisthisnext next next next nextv1 v3 v2
valuation : e1 = v1 ! e2 = v3
PC : v1 < v3 ! v3 < v2
s1 :
v1 v2
valuation : e1 = v1 ! e2 = v2
PC : v1 " v5 ! v5 " v2
s2 :
l2 :l1 : l1 : l2 :
Fig. 5. State Subsumption
Data: Program P and error state ¡Result: Counterexample if ¡ is reachablebegin1
add (Æ(s0), V isitedStates);2
push (s0, Stack);3
while Stack is not empty do4
s := pop (Stack);5
if s = ¡ then return counterexample;6
foreach transition t enabled in s do7
s0:= successor (s, t);8
if PathCondition(s0) is not satisfiable9
then continue;if there exists s00 2 V isitedStates s.t. Æ(s0)10
subsumed by s00 then continue;// s’ not subsumed by any of the visited11
statesadd (Æ(s0), V isitedStates);12
push (s0, Stack);13
end14
end15
end16
Algorithm 2: Symbolic Execution with (Abstract)Subsumption Checking
errors correspond to real errors in the analyzed program.Note however that the analysis might miss some errors,due to the imprecision of the abstraction.
6 Abstractions
6.1 Abstraction for Singly Linked Lists
The abstraction that we have implemented is inspiredby [22,31] and it is based on the idea of summarizing allthe nodes in a maximally uninterrupted list segment witha summary node. The main diÆerence between [22, 31]and the abstraction presented here is that we also sum-marize the numeric data stored in the summarized nodesand we give special treatment to un-initialized nodes.The numeric data stored in the abstracted list is sum-marized by setting the valuation for the summary nodeto be a disjunction of the valuations of the summarizednodes. Intuitively, the numeric data stored in a summarynode can be equal to that of any of the summarizednodes.
Shape subsumption between abstract states is doneby Algorithm 1 as before, which treats summary nodeas any other node in the heap. For checking subsump-tion between numeric constraints, we introduce a newvaluation function for the summary nodes as describedbefore.
Definition 8. A node n is defined as an interruptingnode, or simply an interruption if n satisfies at least oneof following conditions:
1. n = null2. n = uninit3. n 2 {m such that (r,m) 2 ER}, i.e., n is pointed to
by at least one reference variable.4. 9n1, n2 such that (n1, next, n), (n2, next, n) 2 EF .
In other words, n is pointed-to by at least two nodes(cyclic list).
An uninterrupted list segment is a segment of the listthat does not contain an interruption. An uninterruptedlist segment [u, v] is maximal if, (a, next, u) 2 EF )a is an interruption and (v, next, b) 2 EF ) b is aninterruption.
The abstraction for linked list replaces all maximallyuninterrupted list segments in heap H with a summarynode in the abstract state. If [u, v] is a maximally un-interrupted list segment in H, the following transforma-tions on H produces its abstract mapping.
1. A new summary node nsum is added to the set ofnodes NH
O .2. If there is an edge (a, next, u) 2 EH
F , a new edge(a, next, nsum) is added to EH
F .3. If there is an edge (v, next, b) 2 EH
F , a new edge(nsum, next, b) is added to EH
F .4. All nodes m in the list segment [u, v], and all edges
incident on or going out of each m are removed fromH.
Note that the edges between the nodes in the listsegment, which are summarized by a summary node,are not represented in the abstraction state. With thisabstraction, Algorithm 1 is used to check subsumptionof shapes for abstracted heaps.
In order to check subsumption of numeric constraints,we define a valuation function for the summary nodes asfollows. Let NS , NS Ω NO, denote the set of summarynodes introduced in the heap during abstraction.
Saswat Anand et al.: Symbolic Execution with Abstraction 9
thisthisnext next next next nextv1 v3 v2
valuation : e1 = v1 ! e2 = v3
PC : v1 < v3 ! v3 < v2
s1 :
v1 v2
valuation : e1 = v1 ! e2 = v2
PC : v1 " v5 ! v5 " v2
s2 :
l2 :l1 : l1 : l2 :
Fig. 5. State Subsumption
Data: Program P and error state ¡Result: Counterexample if ¡ is reachablebegin1
add (Æ(s0), V isitedStates);2
push (s0, Stack);3
while Stack is not empty do4
s := pop (Stack);5
if s = ¡ then return counterexample;6
foreach transition t enabled in s do7
s0:= successor (s, t);8
if PathCondition(s0) is not satisfiable9
then continue;if there exists s00 2 V isitedStates s.t. Æ(s0)10
subsumed by s00 then continue;// s’ not subsumed by any of the visited11
statesadd (Æ(s0), V isitedStates);12
push (s0, Stack);13
end14
end15
end16
Algorithm 2: Symbolic Execution with (Abstract)Subsumption Checking
errors correspond to real errors in the analyzed program.Note however that the analysis might miss some errors,due to the imprecision of the abstraction.
6 Abstractions
6.1 Abstraction for Singly Linked Lists
The abstraction that we have implemented is inspiredby [22,31] and it is based on the idea of summarizing allthe nodes in a maximally uninterrupted list segment witha summary node. The main diÆerence between [22, 31]and the abstraction presented here is that we also sum-marize the numeric data stored in the summarized nodesand we give special treatment to un-initialized nodes.The numeric data stored in the abstracted list is sum-marized by setting the valuation for the summary nodeto be a disjunction of the valuations of the summarizednodes. Intuitively, the numeric data stored in a summarynode can be equal to that of any of the summarizednodes.
Shape subsumption between abstract states is doneby Algorithm 1 as before, which treats summary nodeas any other node in the heap. For checking subsump-tion between numeric constraints, we introduce a newvaluation function for the summary nodes as describedbefore.
Definition 8. A node n is defined as an interruptingnode, or simply an interruption if n satisfies at least oneof following conditions:
1. n = null2. n = uninit3. n 2 {m such that (r,m) 2 ER}, i.e., n is pointed to
by at least one reference variable.4. 9n1, n2 such that (n1, next, n), (n2, next, n) 2 EF .
In other words, n is pointed-to by at least two nodes(cyclic list).
An uninterrupted list segment is a segment of the listthat does not contain an interruption. An uninterruptedlist segment [u, v] is maximal if, (a, next, u) 2 EF )a is an interruption and (v, next, b) 2 EF ) b is aninterruption.
The abstraction for linked list replaces all maximallyuninterrupted list segments in heap H with a summarynode in the abstract state. If [u, v] is a maximally un-interrupted list segment in H, the following transforma-tions on H produces its abstract mapping.
1. A new summary node nsum is added to the set ofnodes NH
O .2. If there is an edge (a, next, u) 2 EH
F , a new edge(a, next, nsum) is added to EH
F .3. If there is an edge (v, next, b) 2 EH
F , a new edge(nsum, next, b) is added to EH
F .4. All nodes m in the list segment [u, v], and all edges
incident on or going out of each m are removed fromH.
Note that the edges between the nodes in the listsegment, which are summarized by a summary node,are not represented in the abstraction state. With thisabstraction, Algorithm 1 is used to check subsumptionof shapes for abstracted heaps.
In order to check subsumption of numeric constraints,we define a valuation function for the summary nodes asfollows. Let NS , NS Ω NO, denote the set of summarynodes introduced in the heap during abstraction.
Definition 9. The “valuation” of a summary node nsum 2NS in state s, with respect to labeling l : NO ! is de-fined as:
vals(nsum, l) :=_
t2sumnodes(nsum)f2primflds(t)
fn(l(nsum), f) = vs(t, f)
where, sumnodes(nsum) denotes the set of nodes that aresummarized by nsum.
6.1.1 Example
To illustrate the approach, let us go back to the examplepresented in Section 3. Figure 6 depicts the abstract heapshape and the valuations of matched nodes for state s12.The abstracted state is subsumed by state s8 as there isa subsumption of heap shape, as represented by label-ings of respective matching nodes and a valid implicationbetween the normalized numeric constraints of the twostates. Note that we don’t explicitly summarize list seg-ments of size one (e.g. the second list element in s8); theabstracted and the un-abstracted states for s8 are in factthe same.
6.1.2 Discussion
Note that the list abstraction ensures that the number ofpossible abstract heap configurations is finite; however,it is still possible to have an infinite number of statesdue to the numeric constraints. To address this issue,we plan to use predicate abstraction in conjunction withthe abstractions presented here, to further abstract thenumeric constraints. This is the subject of future work.Also note that the focus here is on abstracting heapstructures. Therefore we ignored the numeric values oflocal program variables, which may also be unbounded(they are currently discarded in the abstracted state).Predicate abstraction can also be used for the local nu-meric variables.
6.2 Abstraction for Arrays
We extended our framework with subsumption checkingand an abstraction for arrays of integers. The basic ideais to represent symbolic arrays as singly linked lists andto apply the (abstract) subsumption checking methods
developed for lists. Specifically, we maintain the arraysas singly linked lists; nodes in the list represent individ-ual array cells and their ordering in the list correspondto the order of indices of array cells they represent. Con-secutive (initialized) array elements are represented aslinked nodes. Summary nodes are introduced betweenarray elements that are not consecutive. These summarynodes model zero or more uninitialized array elementsthat may possibly exist in the (concrete) array.
With the list representation of arrays we determinesubsumption of program states with arrays as before.However, the roots are now integer program variablesthat are used to index the array, and the special sum-mary nodes representing uninitialized array segments aretreated as any other node in the heap NH
O while check-ing for shape subsumption in Algorithm 1. Abstractionis applied in a way similar to abstraction for linked lists.The definition of interruption is extended to contain thespecial summary nodes.
We must note that this is only one particular ab-straction, and there may be others – for example, ab-stractions based on array representations as ordered se-quences of updates. We adopt this particular representa-tion because in this way we can leverage on our abstrac-tion techniques for lists. Note that subsumption becomes“approximate”, i.e., we might miss the fact that a statesubsumes another.
6.2.1 Array representation
A symbolic array A is represented by a symbolic valuelen representing the array length and an association listof array cells. Each array cell c is a pair (index, elem):index is a symbolic value representing the index in thearray and elem is a symbolic value representing the valuestored in the array at position index.
The array cells are stored in a singly linked list whichis sorted according to the relative order of the indices ofthe cells. Each list element corresponds to an array cellin A. Given array cell c, let index(c) and elem(c) denotethe index and the value of c respectively; also let next(c)denote the cell that is next to c in the list.
The following invariants hold for the list in a programstate with path condition PC.
1. PC ) index(f) >= 0 is valid, where f is the firstcell in the list.
10 Saswat Anand et al.: Symbolic Execution with Abstraction
Definition 9. The “valuation” of a summary node nsum 2NS in state s, with respect to labeling l : NO ! is de-fined as:
vals(nsum, l) :=_
t2sumnodes(nsum)f2primflds(t)
fn(l(nsum), f) = vs(t, f)
where, sumnodes(nsum) denotes the set of nodes that aresummarized by nsum.
6.1.1 Example
To illustrate the approach, let us go back to the examplepresented in Section 3. Figure 6 depicts the abstract heapshape and the valuations of matched nodes for state s12.The abstracted state is subsumed by state s8 as there isa subsumption of heap shape, as represented by label-ings of respective matching nodes and a valid implicationbetween the normalized numeric constraints of the twostates. Note that we don’t explicitly summarize list seg-ments of size one (e.g. the second list element in s8); theabstracted and the un-abstracted states for s8 are in factthe same.
6.1.2 Discussion
Note that the list abstraction ensures that the number ofpossible abstract heap configurations is finite; however,it is still possible to have an infinite number of statesdue to the numeric constraints. To address this issue,we plan to use predicate abstraction in conjunction withthe abstractions presented here, to further abstract thenumeric constraints. This is the subject of future work.Also note that the focus here is on abstracting heapstructures. Therefore we ignored the numeric values oflocal program variables, which may also be unbounded(they are currently discarded in the abstracted state).Predicate abstraction can also be used for the local nu-meric variables.
6.2 Abstraction for Arrays
We extended our framework with subsumption checkingand an abstraction for arrays of integers. The basic ideais to represent symbolic arrays as singly linked lists andto apply the (abstract) subsumption checking methods
developed for lists. Specifically, we maintain the arraysas singly linked lists; nodes in the list represent individ-ual array cells and their ordering in the list correspondto the order of indices of array cells they represent. Con-secutive (initialized) array elements are represented aslinked nodes. Summary nodes are introduced betweenarray elements that are not consecutive. These summarynodes model zero or more uninitialized array elementsthat may possibly exist in the (concrete) array.
With the list representation of arrays we determinesubsumption of program states with arrays as before.However, the roots are now integer program variablesthat are used to index the array, and the special sum-mary nodes representing uninitialized array segments aretreated as any other node in the heap NH
O while check-ing for shape subsumption in Algorithm 1. Abstractionis applied in a way similar to abstraction for linked lists.The definition of interruption is extended to contain thespecial summary nodes.
We must note that this is only one particular ab-straction, and there may be others – for example, ab-stractions based on array representations as ordered se-quences of updates. We adopt this particular representa-tion because in this way we can leverage on our abstrac-tion techniques for lists. Note that subsumption becomes“approximate”, i.e., we might miss the fact that a statesubsumes another.
6.2.1 Array representation
A symbolic array A is represented by a symbolic valuelen representing the array length and an association listof array cells. Each array cell c is a pair (index, elem):index is a symbolic value representing the index in thearray and elem is a symbolic value representing the valuestored in the array at position index.
The array cells are stored in a singly linked list whichis sorted according to the relative order of the indices ofthe cells. Each list element corresponds to an array cellin A. Given array cell c, let index(c) and elem(c) denotethe index and the value of c respectively; also let next(c)denote the cell that is next to c in the list.
The following invariants hold for the list in a programstate with path condition PC.
1. PC ) index(f) >= 0 is valid, where f is the firstcell in the list.
pointer operations. For example, pointers may have aliases.Because alias analysis may only be approximate in the pres-ence of pointer arithmetic, using symbolic values to preciselytrack such pointers may result in constraints whose satisfac-tion is undecidable. This makes the generation of test in-puts by solving such constraints infeasible. In this paper, weprovide a method for representing and solving approximatepointer constraints to generate test inputs. Our method isthus applicable to a broad class of sequential programs.
The key idea of our method is to represent inputs for theunit under test using a logical input map that represents allinputs, including (finite) memory graphs, as a collection ofscalar symbolic variables and then to build constraints onthese inputs by symbolically executing the code under test.
We first instrument the code being tested by insertingfunction calls which perform symbolic execution. We thenrepeatedly run the instrumented code as follows. The logi-cal input map I is used to generate concrete memory inputgraphs for the program and two symbolic states, one forpointer values and one for primitive values. The code is runconcretely on the concrete input graph and symbolically onthe symbolic states, collecting constraints (in terms of thesymbolic variables in the symbolic state) that characterizethe set of inputs that would (likely) take the same executionpath as the current execution path. As in [11], one of thecollected constraints is negated. The resulting constraintsystem is solved to obtain a new logical input map I! thatis similar to I but (likely) leads the execution through adi!erent path. We then set I = I! and repeat the process.Since the goal of this testing approach is to explore feasi-ble execution paths as much as possible, it can be seen asExplicit Path Model-Checking.
An important contribution of our work is separatingpointer constraints from integer constraints and keeping thepointer constraints simple to make our symbolic executionlight-weight and our constraint solving procedure not onlytractable but also e!cient. The pointer constraints are con-ceptually simplified using the logical input map to replacecomplex symbolic expressions involving pointers with sim-ple symbolic pointer variables (while maintaining the precisepointer relations in the logical input map). For example, ifp is an input pointer to a struct with a field f, then aconstraint on p->f will be simplified to a constraint on f0,where f0 is the symbolic variable corresponding to the inputvalue p->f. Although this simplification introduces some ap-proximations that do not precisely capture all executions, itresults in simple pointer constraints of the form x = y orx != y, where x and y are either symbolic pointer variablesor the constant NULL. These constraints can be e"cientlysolved, and the approximations seem to su"ce in practice.
We implemented our method in a tool called CUTE(Concolic Unit Testing Engine, where Concolic standsfor cooperative Concrete and symbolic execution). CUTEis available at http://osl.cs.uiuc.edu/~ksen/cute/.CUTE implements a solver for both arithmetic and pointerconstraints to incrementally generate test inputs. The solverexploits the domain of this particular problem to implementthree novel optimizations which help to improve the testingtime by several orders of magnitude. Our experimental re-sults confirm that CUTE can e"ciently explore paths in Ccode, achieving high branch coverage and detecting bugs. Inparticular, it exposed software bugs that result in assertionviolations, segmentation faults, or infinite loops.
typedef struct cell {int v;struct cell *next;
} cell;
intf(int v) {
return 2*v + 1;}
inttestme(cell *p, int x) {
if (x > 0)if (p != NULL)
if (f(x) == p->v)if (p->next == p)
ERROR;return 0;
}
Input 1:
p x 236 NULL
Input 3:
p x 3 1
NULL
Input 4:
p x 3 1
Input 2:
p x 634 236
NULL
Figure 1: Example C code and inputs that CUTEgenerates for testing the function testme
This paper presents two case studies of testing code usingCUTE. The first study involves the C code of the CUTEtool itself. The second case study found two previously un-known errors (a segmentation fault and an infinite loop)in SGLIB [25], a popular C data structure library used ina commercial tool. We reported the SGLIB errors to theSGLIB developers who fixed them in the next release.
2. EXAMPLEWe use a simple example to illustrate how CUTE performs
testing. Consider the C function testme shown in Figure 1.This function has an error that can be reached given somespecific values of the input. In a narrow sense, the inputto testme consists of the values of the arguments p andx. However, p is a pointer, and thus the input includes thememory graph reachable from that pointer. In this example,the graph is a list of cell allocation units.
For the example function testme, CUTE first non-randomly generates NULL for p and randomly generates 236for x, respectively. Figure 1 shows this input to testme. Asa result, the first execution of testme takes the then branchof the first if statement and the else branch of the secondif. Let p0 and x0 be the symbolic variables representing thevalues of p and x, respectively, at the beginning of the ex-ecution. CUTE collects the constraints from the predicatesof the branches executed in this path: x0 > 0 (for the then
branch of the first if) and p0 = NULL (for the else branch ofthe second if). The predicate sequence "x0 > 0, p0 = NULL#is called a path constraint.
CUTE next solves the path constraint "x0 > 0, p0 !=NULL#, obtained by negating the last predicate, to drivethe next execution along an alternative path. The solu-tion that CUTE proposes is {p0 $% non-NULL, x0 $% 236},which requires that CUTE make p point to an allocated cell
that introduces two new components, p->v and p->next, tothe reachable graph. Accordingly, CUTE randomly gen-erates 634 for p->v and non-randomly generates NULL forp->next, respectively, for the next execution. In the sec-ond execution, testme takes the then branch of the firstand the second if and the else branch of the third if.For this execution, CUTE generates the path constraint"x0 > 0, p0 != NULL, 2 · x0 + 1 != v0#, where p0, v0, n0,and x0 are the symbolic values of p, p->v, p->next, andx, respectively. Note that CUTE computes the expression
Generación basada en concolic execution:• Koushik Sen, Darko Marinov, Gul Agha. CUTE: a
concolic unit testing engine for C. ESEC/SIGSOFT FSE2005: 263-272
• Patrice Godefroid, Nils Klarlund, Koushik Sen. DART:directed automated random testing. PLDI 2005:213-223
• Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski,David L. Dill, Dawson R. Engler. EXE: AutomaticallyGenerating Inputs of Death. ACM Trans. Inf. Syst.Secur. 12(2) (2008)
Generación basada en mutantes de código:• T. A. Budd, R. J. Lipton, R. A. DeMillo, F. G. Satward.
Theoretical and empirical studies on using programmutation to test the functional correctness of programs.Proceedings of the 7th conference on Principles ofProgramming Languages, January 1980, 220-233.
• R. A. DeMillo, A. J. Offutt. Constraint-based automatictest data generation. IEEE Transactions on SoftwareEngineering, 17 (9), 1991.
• Patrice Godefroid, Aditya V. Nori, Sriram K. Rajamani,SaiDeep Tetali. Compositional may-must programanalysis: unleashing the power of alternation. POPL2010: 43-56