Synthesis and Inductive Learning –Part 2...Synthesis and Inductive Learning –Part 2 Sanjit A. Seshia EECS Department UC Berkeley NSF ExCAPE Summer School June 23-25, 2015 Acknowledgments

Synthesis and Inductive Learning – Part 2

Synthesis and Inductive Learning – Part 2

Sanjit A. Seshia

EECS DepartmentUC Berkeley

NSF ExCAPE Summer SchoolJune 23-25, 2015

Acknowledgments to several Ph.D. students, postdoctoral researchers, and collaborators, and to

the students of EECS 219C, Spring 2015, UC Berkeley

– 2 –

Questions of Interest for this TutorialQuestions of Interest for this Tutorial

How can inductive synthesis be used to solve other (non-synthesis) problems?

Reducing a Problem to Synthesis How does inductive synthesis compare with

machine learning? What are the common themes amongst various inductive synthesis efforts?

Oracle-Guided Inductive Synthesis (OGIS) Framework

Is there a complexity/computability theory for inductive synthesis?

Yes! A first step: Theoretical analysis of counterexample-guided inductive synthesis (CEGIS)

– 3 –

Outline for this Lecture SequenceOutline for this Lecture Sequence

Examples of Reduction to Synthesis– Specification– Verification

Differences between Inductive Synthesis and Machine Learning

Oracle-Guided Inductive Synthesis– Examples, CEGIS

Theoretical Analysis of CEGIS– Properties of Learner– Properties of Verifier

Demo: Requirement Mining for Cyber-Physical Systems

– 4 –

Comparison with Machine LearningComparison with Machine Learning

– 5 –

Formal Inductive SynthesisFormal Inductive Synthesis

Given:– Class of Artifacts C– Formal specification – Set of (labeled) examples E (or source of E)

Find, using only E, an f C that satisfies

– 6 –

Counterexample-Guided Inductive Synthesis (CEGIS)Counterexample-Guided Inductive Synthesis (CEGIS)

INITIALIZE

SYNTHESIZE VERIFY

CandidateArtifact

Counterexample

Verification SucceedsSynthesis Fails

Structure Hypothesis (“Syntax-Guidance”), Initial Examples

– 7 –

CEGIS = Learning from Examples & CounterexamplesCEGIS = Learning from Examples & Counterexamples

INITIALIZE

LEARNINGALGORITHM

VERIFICATIONORACLE

CandidateConcept

Counterexample

Learning SucceedsLearning Fails

“Concept Class”, Initial Examples

– 8 –

CEGIS is an instance of ActiveLearningCEGIS is an instance of ActiveLearning

ACTIVELEARNING

ALGORITHM

1. Search Strategy: How to search the space of candidate concepts?

2. Example Selection: Which examples to learn from?

Examples

Search Strategy

SelectionStrategy

– 9 –

Some Instances of CEGIS you’ve seen (will see soon)Some Instances of CEGIS you’ve seen (will see soon)Instance Concept

ClassLearner Verifier

SKETCH Programs in SKETCH

SAT/SMT solver

SAT/SMT solver

Enumerative SyGuS

Defined by Grammar

EnumerativeSearch

SMT solver

Stochastic SyGuS

Defined by Grammar

StochasticSearch

SMT solver

Constraint SyGuS

Defined by Grammar

SMT solver SMT solver

Req. Mining for STL

Parametric STL

Parameter Search

Simulation-based falsifier

(see [Alur et al., FMCAD’13])

– 10 –

Comparison*Comparison*

Feature Formal Inductive Synthesis

MachineLearning

Concept/Program Classes

Programmable, Complex Fixed, Simple

Learning Algorithms

General-Purpose Solvers Specialized

Learning Criteria Exact, w/ Formal Spec

Approximate, w/ Cost Function

Oracle-Guidance Common (can control Oracle)

Rare (black-boxoracles)

* Between typical inductive synthesizer and machine learning algo

– 11 –

Oracle-Guided Inductive SynthesisOracle-Guided Inductive Synthesis

– 12 –

Oracle-Guided Inductive SynthesisOracle-Guided Inductive Synthesis

Given:– Domain of Examples D– Concept Class C – Formal Specification ⊆ D– Oracle O that can answer queries of type Q

Find, by only querying O, an f C that satisfies

– 13 –

Common Oracle Query TypesCommon Oracle Query Types

LEARNER ORACLE

Positive Witnessx , if one exists, else

Negative Witnessx , if one exists, else

Membership: Is x ?Yes / No

Equivalence: Is f = ?Yes / No + x f

Subsumption/Subset: Is f ⊆ ?Yes / No + x f \

Distinguishing Input: f, X ⊆ ff’ s.t. f’ ≠f X ⊆ f’, if it exists;

o.w.

– 14 –

Examples of OGISExamples of OGIS

L* algorithm to learn DFAs:– Membership + Equivalence queries

CEGIS used in SKETCH/SyGuS solvers– (positive) Witness + Equivalence/Subsumption

queries CEGIS used in Reactive Model Predictive Control

– covered in Vasu Raman’s lecture Two different examples:

– Learning Programs from Distinguishing Inputs [Jha et al., ICSE 2010]

– Learning LTL Properties for Synthesis from Counterstrategies [Li et al., MEMOCODE 2011]

– 15 –

Reverse Engineering Malware by Program SynthesisReverse Engineering Malware by Program Synthesis

Obfuscated code:Input: y Output: modified value of y

{ a=1; b=0; z=1; c=0;while(1) { if (a == 0) {if (b == 0) { y=z+y; a =~a;b=~b; c=~c; if (~c) break; }else {z=z+y; a=~a; b=~b; c=~c;if (~c) break; } }else if (b == 0) {z=y << 2; a=~a;}else { z=y << 3; a=~a; b=~b;}

} }FROM CONFICKER WORM

What it does:

y = y * 45

We solve this using program synthesis.

Paper: S. Jha et al., “Oracle-Guided Component-Based Program Synthesis”, ICSE 2010.

– 16 –

Class of Programs: “Loop-Free”Class of Programs: “Loop-Free”

Programs implementing functions: I O

Functions could be if-then-else definitions and hence, the above represents any loop-free code.

P(I):O1 = f1 (V1)O2 = f2 (V2)…On = fn (Vn)

where

f1,f2,…,fn are functions from a given component library

– 18 –

Program Learning as Set CoverProgram Learning as Set Cover

Space of all possible programsEach dot represents semantically unique program

[Goldman & Kearns, 1995]

– 19 –



– 20 –



(i1, o1)

Example Set of programs ruled out by that example

– 21 –



(i1, o1) - E1(i2, o2) - E2

………

(in, on) - En

Theorem: [Goldman & Kearns, ‘95]Smallest set of I/O examples to learn correct program

IS

Minimum size subset of {E1, E2, ……, En } that covers all the incorrect programs

– 22 –



(i1, o1) - E1(i2, o2) - E2

………

(in, on) - En

Smallest set of I/O examples to learn correct design

IS

Minimum size subset of {E1, E2, ……, En } that cover all the incorrect programs

Practical challenge: can’t enumerate all inputs and find set Ei for each

– 23 –



(i1, o1) - E1(i2, o2) - E2

………

(in, on) - En

ONLINE set-cover:

In each step, • choose some (ij,oj) pair• eliminated incorrect programs Ejdisclosed

– 24 –



(i1, o1) - E1(i2, o2) - E2

………

(in, on) - En

ONLINE set-cover:

In each step, • choose some (ij,oj) pair• eliminated incorrect programs Ejdisclosed

Our heuristic:|Ej| 1: atleast one incorrect program identified

– 25 –

Approach: Learning based on Distinguishing InputsApproach: Learning based on Distinguishing Inputs


[Jha et al., ICSE’10]

– 26 –

Space of all possible programs

Make (positive) Witness queryExample I/O set E := {(i1,o1)}

Learning from Distinguishing InputsLearning from Distinguishing Inputs

– 27 –


Example I/O set E := {(i1,o1)}

P1


SMT Learner synthesizes P1 usingSMT solver

– 28 –



P1

P2


SMT

SMT

– 29 –



P1

P2i2


SMT

– 30 –



i2

(i2, o2)

P1

P2


Implemented as a singleDistinguishing input query:• Given P1,• Finds P2 s.t. P1 != P2

and (i2, o2) is dist. input

– 31 –


Example I/O set E := E {(i2,o2)}


– 32 –


Example I/O set E := E {(ij,oj)}


– 33 –


Example I/O set E := E {(ik,ok)}


– 34 –


Example I/O set E := E {(in,on)}

Semantically Unique Program

Correct Program?


– 35 –

SoundnessSoundness

Library of components is sufficient ?

Correct design

I/O pairs show infeasibility ?

YES

Infeasibility reported

Incorrect design

YES

NO

NO

Can put this learning approach within an outer CEGIS loop

Conditional on Validity of Structure Hypothesis

– 38 –

Example 2: Assumption SynthesisExample 2: Assumption Synthesis

Reactive Synthesis from LTL

39

SystemRequirements

EnvironmentAssumptions

SynthesisTool

Unrealizable

Realizable

FSM

Often due to incomplete environment assumptions!

s

e

e s

Example

40

Inputs: request r and cancel c

Outputs: grant g

System specification s:- G (r X F g)- G(c g X g)

Environment assumption e: - True

Is it realizable?

Not realizable because the environment can force c to be high all the time

Counter-strategy and counter-trace

41

Counter-strategy is a strategy for the environment to force violation of the specification.

Counter-trace is a fixed input sequence such that the specification is violated regardless of the outputs generated by the system.

System s:- G (r X F g)- G(c g X g)

A counter-trace:r: 1 1 (1)c: 1 1 (1)

CounterStrategy-Guided Environment Assumption Synthesis [Li et al., MEMOCODE 2011]

42

FormalSpecification

Synthesis Tool

Realizable

Done

ComputeCounterstrategy

Unrealizable

Start

MineAssumptions

AddSpecificationTemplates

UserScenarios

Instance of CEGIS

Concept Class: all LTL formulas over I and O “constraining I”

Domain of Examples: All finite-state transducers with input O and output I (environments)

Formal Spec: set of env transducers for which there exists an implementation of Sys satisfying s

Oracle supports subsumption query: Does the current assumption rule out environments outside ? If

not, give one such counterexample.

MEMOCODE 201143

I

OSysEnv

Assumption Learning Algorithm

45

Generate Candidate φ

e.g. G F ?Specification

Templates

UserScenarios

Counter-strategy

Eliminate or Accept

Candidate

e.g. G F p

DoneAccept φ

Eliminate φ

(check consistency with existing assumptions)

Assumption Synthesis Algorithm

46

Generate Candidate φ

e.g. G F ?Specification

Templates

UserScenarios

Counter-strategy E

Invoke Model Checker: Does E satisfy φ ?

e.g. G F p

DoneYES: Accept φ

NO: Eliminate φ

VERSION SPACES:• Retain candidates

consistent with examples

• Traverse weakest to strongest

(check consistency with existing assumptions)

Version Space Learning

G F p1

true

false

G F p2 G F pn

G X p1

G p1

G p1 p2

G p2 G pn

G pn-1 pn

G X pn

G (p1 X p2)

G X p2

. . .

. . .

. . .

. . .

. . .

. . .

STRONGEST (most specific)

WEAKEST (most general)

Originally due to [Mitchell, 1978]

Example

48


A counter-trace:r: 1 1 (1)c: 1 1 (1)

Test assumption candidates by checking its negation:o G (F c) ------------ o G (F c) ------------


Environment e:- G (F c)

Theoretical Results

49

Theorem 1: [Completeness] If there exist environment assumptions under our structure hypothesis that make the spec realizable, then the procedure finds them (terminates successfully). “conditional completeness” guarantee

Theorem 2: [Soundness] The procedure never adds inconsistent environment assumptions.

– 50 –

Summary of Lecture 2Summary of Lecture 2

Differences between Formal Inductive Synthesis and Machine Learning

Common features across various inductive synthesis methods

Oracle-Guided Inductive Synthesis framework Two Instances

– Learning Programs from Distinguishing Inputshttp://www.eecs.berkeley.edu/~sseshia/pubs/b2hd-jha-

icse10.html– Learning Assumptions from Counterstrategies

http://www.eecs.berkeley.edu/~sseshia/pubs/b2hd-li-memocode11.html

Brief Introduction to the Theory (more tomorrow)

Synthesis and Inductive Learning –Part 2...Synthesis and Inductive Learning –Part 2 Sanjit A. Seshia EECS Department UC Berkeley NSF ExCAPE Summer School June 23-25, 2015 Acknowledgments

Documents