Making neural programming architectures generalize via recursion 20170224

Post on 20-Mar-2017

119 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

Transcript

Making Neural Programming Architectures Generalize via Recursion

ICLR 2017 Katy@Datalab

Background

• AGI: Artificial General Intelligence

Background• Training neural networks to synthesize robust

programs from a small number of examples is a challenging task.

• The space of possible programs is extremely large, and composing a program that performs robustly on the infinite space of possible inputs is difficult

• Because it is impractical to obtain enough training examples to easily disambiguate amongst all possible programs.

Motivation• Curriculum training?

• the network still might not learn the true program semantics like in NPI, generalization becomes poor beyond a threshold level of complexity.

Related Work

• Scott Reed and Nando de Freitas. Neural programmer-interpreters. ICLR, 2016.

NPI Model

• neural network learns spurious dependencies which depend on specific characteristics of the training examples that are irrelevant to the true program semantics, such as length of the training inputs, and thus fails to generalize to more complex inputs.

Main Idea

• Explicitly incorporating recursion into neural architectures.

Why Recursion?

• Recursion divides the problem into smaller pieces and drastically reduces the domain of each neural network component, making it tractable to prove guarantees about the overall system’s behavior.

Why Recursion?

• By nature, recursion reduces the complexity of a problem to simpler instances.

Model

• Using an NPI(Neural Programmer Interpreter) like model, except that a program can call itself.

• Let the model learn recursive program

• Achieve perfect generalization

Partial(tail) and Full Recursive

Experiment

Bubble sort on NPI

Conclusion• Simple idea

• prove to have 100% generalisation

• The trained model has learned the correct program semantics

• Recursion is very important for neural programming architectures

Future Work

• Reduce the amount of supervision:

• Training with only partial or non-recursive traces, and to integrate a notion of recursion into the models themselves by constructing novel Neural Programming Architectures.

Future Work

• on MNIST dataset?

top related