Compiling Stan to Generative Probabilistic Languages and Extension to Deep Probabilistic Programming Guillaume Baudart INRIA Paris École normale supérieure – PSL University France Javier Burroni UMass Amherst USA Martin Hirzel MIT-IBM Watson AI Lab, IBM Research USA Louis Mandel MIT-IBM Watson AI Lab, IBM Research USA Avraham Shinnar MIT-IBM Watson AI Lab, IBM Research USA Abstract Stan is a probabilistic programming language that is popu- lar in the statistics community, with a high-level syntax for expressing probabilistic models. Stan differs by nature from generative probabilistic programming languages like Church, Anglican, or Pyro. This paper presents a comprehensive com- pilation scheme to compile any Stan model to a generative language and proves its correctness. We use our compilation scheme to build two new backends for the Stanc3 compiler targeting Pyro and NumPyro. Experimental results show that the NumPyro backend yields a 2.3x speedup compared to Stan in geometric mean over 26 benchmarks. Building on Pyro we extend Stan with support for explicit variational inference guides and deep probabilistic models. That way, users familiar with Stan get access to new features without having to learn a fundamentally new language. CCS Concepts: • Software and its engineering → Com- pilers; • Theory of computation → Probabilistic com- putation. Keywords: Probabilistic programming, Semantics, Stan, Pyro 1 Introduction Probabilistic Programming Languages (PPLs) are designed to describe probabilistic models and run inference on them. There exists a variety of PPLs. BUGS [21], JAGS [26], and Stan [7] focus on efficiency, constraining what is expressible to a subset of models which support fast inference techniques. These languages enjoy broad adoption by the statistics and social sciences communities [6, 10, 11]. Generative languages like Church [12], Anglican [32], WebPPL [13], Pyro [3], and Gen [9] describe generative models, i.e., stochastic procedures that simulate the data generation process. Coming from a core programming languages heritage, generative PPLs typi- cally support rich control constructs and models over struc- tured data. Generative PPLs are increasingly used in machine- learning research and are rapidly incorporating new ideas, such as Stochastic Gradient Variational Inference (SVI), in what is now called Deep Probabilistic Programming [2, 3, 33]. While the semantics of probabilistic languages have been extensively studied [14, 15, 18, 30], to the best of our knowl- edge little is known about the relationship between Stan and generative PPLs. We show that a simple 1:1 translation is incorrect or incomplete for a set of subtle but widely-used Stan features, such as left expressions or implicit priors. This paper formalizes the relationship between Stan and generative PPLs and introduces, with correctness proof, a comprehensive compilation scheme that can compile any Stan program to a generative PPL. This enables leverag- ing the rich set of existing Stan models for testing, bench- marking, or experimenting with new features or inference techniques. Based on this compilation scheme we imple- mented two new backends for the Stanc3 compiler target- ing Pyro [3] and NumPyro [25], a JAX [5] based version of Pyro. Both Pyro and NumPyro runtimes offer NUTS [16] (No U-Turn Sampler), an optimized Hamiltonian Monte-Carlo (HMC) algorithm that is the preferred inference method for Stan. We can thus validate our approach against Stan. Re- sults show that models compiled using our NumPyro back- end yield equivalent results while being 2.3x faster than 1 arXiv:1810.00873v5 [cs.LG] 11 Apr 2021
20
Embed
Compiling Stan to Generative Probabilistic Languages ... - arXiv
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Compiling Stan to Generative Probabilistic Languagesand Extension to Deep Probabilistic Programming
Guillaume Baudart
INRIA Paris
École normale supérieure – PSL University
France
Javier Burroni
UMass Amherst
USA
Martin Hirzel
MIT-IBM Watson AI Lab, IBM Research
USA
Louis Mandel
MIT-IBM Watson AI Lab, IBM Research
USA
Avraham Shinnar
MIT-IBM Watson AI Lab, IBM Research
USA
AbstractStan is a probabilistic programming language that is popu-
lar in the statistics community, with a high-level syntax for
expressing probabilistic models. Stan differs by nature from
generative probabilistic programming languages like Church,
Anglican, or Pyro. This paper presents a comprehensive com-
pilation scheme to compile any Stan model to a generative
language and proves its correctness. We use our compilation
scheme to build two new backends for the Stanc3 compiler
targeting Pyro and NumPyro. Experimental results show
that the NumPyro backend yields a 2.3x speedup compared
to Stan in geometric mean over 26 benchmarks. Building on
Pyro we extend Stan with support for explicit variational
inference guides and deep probabilistic models. That way,
users familiar with Stan get access to new features without
having to learn a fundamentally new language.
CCS Concepts: • Software and its engineering→ Com-pilers; • Theory of computation→ Probabilistic com-putation.
1 IntroductionProbabilistic Programming Languages (PPLs) are designed
to describe probabilistic models and run inference on them.
There exists a variety of PPLs. BUGS [21], JAGS [26], and
Stan [7] focus on efficiency, constraining what is expressible
to a subset of models which support fast inference techniques.
These languages enjoy broad adoption by the statistics and
social sciences communities [6, 10, 11]. Generative languageslike Church [12], Anglican [32], WebPPL [13], Pyro [3], and
Gen [9] describe generative models, i.e., stochastic proceduresthat simulate the data generation process. Coming from a
core programming languages heritage, generative PPLs typi-
cally support rich control constructs and models over struc-
tured data. Generative PPLs are increasingly used inmachine-
learning research and are rapidly incorporating new ideas,
such as Stochastic Gradient Variational Inference (SVI), in
what is now called Deep Probabilistic Programming [2, 3, 33].
While the semantics of probabilistic languages have been
extensively studied [14, 15, 18, 30], to the best of our knowl-
edge little is known about the relationship between Stan and
generative PPLs. We show that a simple 1:1 translation is
incorrect or incomplete for a set of subtle but widely-used
Stan features, such as left expressions or implicit priors.
This paper formalizes the relationship between Stan and
generative PPLs and introduces, with correctness proof, a
comprehensive compilation scheme that can compile any
Stan program to a generative PPL. This enables leverag-
ing the rich set of existing Stan models for testing, bench-
marking, or experimenting with new features or inference
techniques. Based on this compilation scheme we imple-
mented two new backends for the Stanc3 compiler target-
ing Pyro [3] and NumPyro [25], a JAX [5] based version of
Pyro. Both Pyro and NumPyro runtimes offer NUTS [16] (No
U-Turn Sampler), an optimized Hamiltonian Monte-Carlo
(HMC) algorithm that is the preferred inference method for
Stan. We can thus validate our approach against Stan. Re-
sults show that models compiled using our NumPyro back-
end yield equivalent results while being 2.3x faster than
1
arX
iv:1
810.
0087
3v5
[cs
.LG
] 1
1 A
pr 2
021
their Stan counterpart in the geometric mean over 26 bench-
marks. Our compiler and runtime library are open-source at
https://github.com/deepppl.In addition, recent probabilistic languages offer new fea-
tures to program and reason about complex models. Our
compilation scheme combined with conservative extensions
of Stan can be used to make these benefits available to Stan
users. As a proof of concept, based on our Pyro backend,
this paper introduces DeepStan: Stan extended with sup-
port for explicit variational guides and deep probabilistic
models. Variational inference was central in the design of
Pyro and programmers can easily craft their own inference
guides to run variational inference on probabilistic mod-
els. Pyro is built on top of PyTorch [24]. Programmers can
thus seamlessly import neural networks designed with the
state-of-the-art API provided by PyTorch.
This paper makes the following contributions:
• A comprehensive compilation scheme from Stan to a gen-
erative PPL (Section 2).
• Correctness proof of the compilation scheme (Section 3).
• An open-source implementation of two new backends for
Stanc3 targeting Pyro and NumPyro (Section 4).
• An extension of Stan with explicit variational inference
guides and deep probabilistic models (Section 5).
The fundamental new result of this paper is that every Stan
program can be expressed as a generative probabilistic pro-
gram. Besides advancing the understanding of probabilistic
programming languages at a fundamental level, this paper
aims to provide practical benefits to the communities of both
Stan and Pyro. From the perspective of the Stan community,
this paper presents a new competitive compiler backend and
additional capabilities while retaining familiar syntax and
semantics. This compiler can thus be used to migrate existing
Stan codebases to Pyro and NumPyro. From the perspective
of the Pyro community, this paper presents a new compiler
frontend that unlocks many existing real-world models as
examples and benchmarks.
This paper is a version with appendices presenting the
proofs and the evaluation results of the paper published at
PLDI 2021 [1].
2 OverviewThis section shows how to compile Stan [7], which specifies a
joint probability distribution, to a generative PPL like Church,
Anglican, or Pyro. This translation also demonstrates that
Stan’s expressive power is at most as large as that of genera-
tive languages, a fact that was not clear before our paper.
As a running example, consider the biased coin model in
Figure 1. Stan’s data block defines observed variables for 𝑁
coin flips 𝑥𝑖 , 𝑖 ∈ [1 : 𝑁 ], which can be 0 for tails or 1 for heads.The parameters block introduces a latent variable 𝑧 ∈ [0, 1]for the bias of the coin. The model block sets the prior of
the bias 𝑧 to Beta(1, 1), i.e., a uniform distribution over [0, 1].
data {int N;int<lower=0,upper=1> x[N]; }
parameters {real<lower=0,upper=1> z; }
model {z ~ beta(1, 1);for (i in 1:N) x[i] ~ bernoulli(z); }
𝑧 𝑥
𝑁𝑝 (𝑧 | 𝑥1, . . . , 𝑥𝑁 )
Figure 1. Biased coin model in Stan.
def model(N, x):z = sample(
beta(1.,1.))for i in range(0, N):
observe(bernoulli(z), x[i])
return z
(a) Generative scheme
def model(N, x):z = sample(uniform(0.,1.))observe(beta(1.,1.), z)for i in range(0, N):
observe(bernoulli(z), x[i])
return z
(b) Comprehensive scheme
Figure 2. Compiled coin model of Figure 1.
The for loop indicates that coin flips 𝑥𝑖 are independent and
identically distributed (IID) and depend on 𝑧 via a Bernoulli
distribution. Given concrete observed coin flips, inference
yields a posterior distribution for 𝑧 conditioned on 𝑥1, . . . , 𝑥𝑁 .
2.1 Generative translationGenerative PPLs are general-purpose languages extended
with two probabilistic constructs [14, 30, 34]: sample(𝐷)generates a sample from a distribution 𝐷 and factor(𝑣)assigns a score 𝑣 to the current execution trace. Typically,
factor is used to condition the model on input data [32].
We also introduce observe(𝐷,𝑣) as a syntactic shortcut
for factor(𝐷pdf(𝑣)) where 𝐷pdf denotes the probability
density function of 𝐷 . This construct penalizes executions
according to the score of 𝑣 w.r.t. 𝐷 which captures the as-
sumption that the observed data 𝑣 follows the distribution 𝐷 .
Compilation. Stan uses the same syntax v ~ 𝐷 for both
observed and latent variables. The distinction comes from
the kind of the left-hand-side variable: observed variables are
declared in the data block, latent variables are declared in theparameters block. A straightforward generative translationcompiles a statement v ~ 𝐷 into v = sample(𝐷) if v is
a parameter or observe(𝐷, v) if v is data. For example,
Figure 2a shows the compiled (using the generative scheme)
version of the Stan model of Figure 1 in Python syntax.
2.2 Non-generative featuresIn Stan, a model represents the unnormalized density of the
joint distribution of the parameters defined in the parametersblock given the data defined in the data block [7, 15]. A Stan
program can thus be viewed as a function from parameters
and data to the value of a special variable target that rep-resents the log-density of the model. A Stan model can be
described using classic imperative statements, plus two spe-
cial statements that modify the value of target. The firstone, target+= 𝑒 , increments the value of target by 𝑒 . Thesecond one, e ~ D, is equivalent to target+= 𝐷lpdf (e) [15]where𝐷lpdf denotes the log probability density function of𝐷 .
Unfortunately, these constructs allow the definition of
models that cannot be translated using the generative trans-
lation defined above. Table 1 lists the Stan features that are
not handled correctly. A left expression is where the left-hand-side of ~ is an arbitrary expression. Multiple updates occurwhen the same parameter appears on the left-hand-side of
multiple ~ statements. An implicit prior occurs when there
is no explicit ~ statement in the model for a parameter.
The “%” column of Table 1 indicates the percentage of Stan
models that exercise each of the non-generative features
among the 531 valid files in https://github.com/stan-dev/example-models. The example column contains illus-
trative excerpts from such models. Since these are official
and long-standing examples, we assume that they use the
non-generative features on purpose. Comments in the source
code further corroborate that the programmer knowingly
used the features. While some features only occur in a mi-
nority of models, their prevalence is too high to ignore.
2.3 Comprehensive translationThe previous section illustrates that Stan is centered around
the definition of target, not around generating samples for
parameters, which is required by generative PPLs. The com-
prehensive translation adds an initialization step to generate
samples for all the parameters and compiles all Stan ~ state-
ments as observations. Parameter initialization draws from
the uniform distribution in their definition domain. For the
biased coin example, the result of this translation is shown in
Figure 2b: The parameter z is first sampled uniformly on its
definition domain and then conditioned with an observation.
The compilation column of Table 1 illustrates the transla-
tion of non-generative features. Left expression and multiple
updates are simply compiled into observations. Parameter
initialization uses the uniform distribution over its definition
domain. For unbounded domains, we introduce new distri-
butions (e.g., improper_uniform) with a constant density
that can be normalized away during inference. Section 3.3
details the complete compilation scheme.
Intuition of correctness. The semantics of Stan as de-
scribed in [15] is a classic imperative semantics. Its envi-
ronment includes the special variable target, the unnor-
malized log-density of the model. On the other hand, the
semantics of a generative PPL as described in [30] defines a
kernel mapping an environment to a measurable function.
Our compilation scheme adds uniform initializations for all
parameters which comes down to the Lebesgue measure
on the parameters space, and translates all ~ statements to
observe statements. We can then show that a succession
of observe statements yields a distribution with the same
log-density as the Stan semantics. Section 3.4 details the
correctness proof.
Implementation. The comprehensive compilation scheme
can compile any Stan program to a generative PPL. Section 4
discusses the implementation of two new backends for the
Stanc3 compiler targeting Pyro [3] – a PPL in the line of
WebPPL [13] – and NumPyro – a JAX [5] based version of
Pyro. Section 6 experimentally validates that our backends
can compile most existing Stan models. Results also show
that models compiled using our NumPyro backend outper-
form their Stan counterpart on existing benchmarks.
Extensions. Pyro is a deep universal probabilistic program-ming languages with native support for variational inference.
Building on Pyro, we use our compiler to extend Stan with
support for explicit variational guides (Section 5.1) and deep
neural networks to capture complex relations between pa-
rameters (Section 5.2).
3 Semantics and CompilationThis section, building on previouswork, first formally defines
the semantics of Stan (Section 3.1) and the semantics of
GProb, a small generative probabilistic language (Section 3.2).
It then defines the compilation function from Stan to GProb
(Section 3.3) and proves its correctness (Section 3.4).
3.1 Stan: a Declarative Probabilistic LanguageThe Stan language is informally described in [7]. A Stan pro-
gram is a sequence of blocks which in order: declares func-
tions, declares input names and types, pre-processes input
data, declares the parameters to infer, defines transforma-
tions on parameters, defines the model, and post-processes
the parameters. The only mandatory block is model. Vari-ables declared in a block are visible in subsequent blocks.
In the following, an environment 𝛾 : Var → Val is amapping from variables to values, 𝛾 (𝑥) returns the value ofthe variable 𝑥 in an environment 𝛾 , 𝛾 [𝑥 ← 𝑣] returns theenvironment 𝛾 where the value of 𝑥 is set to 𝑣 , and 𝛾1, 𝛾2denotes the union of two environments.
The notation
∫𝑋` (𝑑𝑥) 𝑓 (𝑥) is the integral of 𝑓 w.r.t. the
measure ` where 𝑥 ∈ 𝑋 is the integration variable. When `
is the Lebesgue measure we also write
∫𝑋𝑓 (𝑥)𝑑𝑥 .
Following [15], we define the semantics of the model block
as a deterministic function that takes an initial environment
containing the input data and the parameters, and returns
an updated environment where the value of the variable
target is the un-normalized log-density of the model.
We can then define the semantics of a Stan program as
a kernel [18, 30, 31], that is, a function {[𝑝]} : D → Σ𝑋 →[0,∞] where Σ𝑋 denotes the 𝜎-algebra of the parameter do-
main 𝑋 , that is, the set of measurable sets of the product
space of parameter values. Given an environment 𝐷 contain-
ing the input data, J𝑝K𝐷 is ameasure that maps a measurable
set of parameter values 𝑈 to a score in [0,∞] obtained by
integrating the density of the model, exp(target), over allthe possible parameter values in𝑈 .
{[𝑝]}𝐷 = _𝑈 .
∫𝑈
exp(Jmodel(𝑝)K𝐷 [params (𝑝)←\ ] (target)) 𝑑\
Given the input data, the posterior distribution of a Stan pro-
gram is obtained by normalizing the measure {[𝑝]}𝐷 . As theintegrals are often intractable, the runtime uses approximate
inference schemes to compute the posterior distribution.
4
Compiling Stan to Generative Probabilistic Languages and Extension to Deep Probabilistic Programming
let 𝑛1 = J𝑒1K𝛾 in let 𝑛2 = J𝑒2K𝛾 inif 𝑛1 > 𝑛2 then 𝛾 else Jfor (𝑥 in 𝑛1 + 1:𝑛2) {𝑠}KJ𝑠K𝛾 [𝑥←𝑛
1]
Jwhile (𝑒) {𝑠}K𝛾 = if J𝑒K𝛾 = 0 then 𝛾 else Jwhile (𝑒) {𝑠}KJ𝑠K𝛾
Jif (𝑒) 𝑠1 else 𝑠2K𝛾 = if J𝑒1K𝛾 ≠ 0 then J𝑠1K𝛾 else J𝑠2K𝛾JskipK𝛾 = 𝛾
Jtarget += 𝑒K𝛾 = 𝛾 [target← 𝛾 (target) + J𝑒K𝛾 ]J𝑒1 ~ 𝑒2K𝛾 = let 𝐷 = J𝑒2K𝛾 in
qtarget += 𝐷lpdf (𝑒1)
y𝛾
Figure 3. Semantics of statements
We now detail the semantics of statements and expres-
sions in a model block. This formalization is similar to the
semantics proposed in [15] but expressed denotationally.
Statements. The semantics of a statement J𝑠K : (Var →Val) → (Var → Val) is a function from an environment 𝛾 to
an updated environment. Figure 3 gives the semantics of Stan
statements. The initial environment contains the input data,
the parameters, and the reserved variable target initialized
to 0. An assignment updates the value of a variable or of a cell
of an indexed structure in the environment. A sequence 𝑠1; 𝑠2evaluates 𝑠2 in the environment produced by 𝑠1. A for loop
on ranges first evaluates the value of the bounds 𝑛1 and 𝑛2and then repeats the execution of the body 1 + 𝑛2 − 𝑛1 times.
Iterations over indexed structures (for (𝑥 in 𝑒) {𝑠}) aresyntactic sugar over loops on ranges. The behavior depends
on the underlying type. For vectors and arrays, iteration is
limited to one dimension.
Jfor (𝑥 in 𝑒) {𝑠}K𝛾 = let 𝑣 = J𝑒K𝛾 in (𝑖 is a fresh variable)
Jfor (𝑖 in 1:length(𝑣)) {𝑥 = 𝑣[𝑖]; 𝑠}K𝛾
For matrices, iteration is over the two dimensions:
Jfor (𝑥 in 𝑒) {𝑠}K𝛾 = (𝑖 and 𝑗 are fresh variables)
let 𝑣 = J𝑒K𝛾 insfor (𝑖 in 1:length(𝑣))
for ( 𝑗 in 1:length(𝑣[𝑖])) {𝑥 = 𝑣[𝑖][ 𝑗]; 𝑠}
{
𝛾
A while loop repeats the execution of its body while the
condition is not 0. An if statement executes one of the two
branches depending on the value of the condition. A skipleaves the environment unchanged. A statement target += 𝑒adds the value of 𝑒 to target in the environment. Finally, a
statement 𝑒1 ~ 𝑒2 evaluates the expression 𝑒2 into a probabil-
ity distribution 𝐷 and updates the target with the value of
the log-density of 𝐷 at 𝑒1.
Expressions. The semantics of an expression J𝑒K : (Var →Val) → Val is a function from a environment to values. Fig-
ure 4 gives the semantics of Stan expressions. Constants
J𝑐K𝛾 = 𝑐 J{𝑒1,...,𝑒𝑛}K𝛾 = {J𝑒1K𝛾 ,..., J𝑒𝑛K𝛾}
J𝑥K𝛾 = 𝛾 (𝑥) J[𝑒1,...,𝑒𝑛]K𝛾 = [J𝑒1K𝛾 ,..., J𝑒𝑛K𝛾]
J𝑒1[𝑒2]K𝛾 = J𝑒1K𝛾[J𝑒2K𝛾] J𝑓 (𝑒)K𝛾 = 𝑓 (J𝑒K𝛾 )
Figure 4. Semantics of expressions
evaluate to themselves. Variables are looked up in the envi-
ronment. Arrays, vectors, and matrix expressions evaluate
all their components. Indexing expressions obtain the corre-
sponding value in the associated data. Function calls apply
the function to the value of the arguments. Functions are
built-ins like + or normal (user-defined functions are inlined).
Limitations. We consider only terminating programs
which means in particular that all loops perform a bounded
number of iterations. We also limit the access and update of
target to the statements target += 𝑒 and 𝑒1 ~ 𝑒2.
Assumption 1. All programs terminate.
Assumption 2. Expressions cannot depend on target.
3.2 GProb: a Simple Generative PPLTo formalize the compilation, we first define the target lan-
guage: GProb, a simple generative probabilistic language
similar to the one defined in [30]. GProb is an expression
The proof is done by induction on the structure of stmtand the finite number of loops iterations (Assumption 1). The
hypothesis 𝛾 (target) = 0 simplifies the induction by avoid-
ing to keep an accumulator of the value of target. Resettingthe value of target in the environment 𝛾 ′[target← 0] forthe evaluation of the continuation 𝑘 is thus necessary for
the inductive step. The proof is given in Appendix B.
Correctness. We now have all the elements to prove that
the comprehensive compilation is correct. That is, generated
code yields the same un-normalizedmeasure up to a constant
factor that will be normalized away by the inference.
Theorem 3.3. For all Stan programs 𝑝 , the semantics of thesource and compiled programs are equal up to a constant:
{[𝑝]}𝐷 ∝ {[C(𝑝)]}𝐷
Proof. The proof is a direct consequence of Lemmas 3.1
and 3.2 and the definition of the two semantics. With stmt =model(𝑝) and P = params(𝑝):
Proof. The proof is done by induction on the structure of stmtand number of reductions using the definition of the compi-
lation function (Section 3.3) and the semantics of GProb.
Assignment. Evaluating 𝑥 = 𝑒 does not update targetand its initial value is 0 by hypothesis. With 𝛾 ′ = J𝑥 = 𝑒K𝛾 wehave 𝛾 ′[target← 0] = 𝛾 ′ and exp(𝛾 ′(target)) = 1. Then
from GProb’s semantics we have:
16
Compiling Stan to Generative Probabilistic Languages and Extension to Deep Probabilistic Programming
{[C𝑘 (𝑥 = 𝑒)]}𝛾 = { by definition of C𝑘 (.) }{[let𝑥 = return(𝑒) in 𝑘]}𝛾
= { by definition of the semantics }
_𝑈 .
∫𝑋
{[return(𝑒)]}𝛾 (𝑑𝑣) × {[𝑘]}𝛾 [𝑥←𝑣 ] (𝑈 )
= { by definition of the semantics }
_𝑈 .
∫𝑋
𝛿J𝑒K𝛾 (𝑑𝑣) × {[𝑘]}𝛾 [𝑥←𝑣 ] (𝑈 )
= { by the integration of the 𝛿 distribution }_𝑈 .1 × {[𝑘]}𝛾 [𝑥←J𝑒K𝛾 ] (𝑈 )
= { by the semantics of J𝑥 = 𝑒K𝛾 }_𝑈 .1 × {[𝑘]}J𝑥 = 𝑒K𝛾 (𝑈 )
= { by definition of 𝛾 ′ }_𝑈 . exp(𝛾 ′(target)) × {[𝑘]}𝛾 ′ [target←0] (𝑈 )
Target update. Since the evaluation of target += 𝑒 onlyupdates the value of target and its initial value is 0, with
𝛾 ′ = Jtarget += 𝑒K𝛾 we have 𝛾 = 𝛾 ′[target ← 0], and𝛾 ′(target) = J𝑒K𝛾 . Then from GProb’s semantics we have:
{[C𝑘 (target += 𝑒)]}𝛾= { by definition of C𝑘 (.) }{[let () = factor(𝑒) in 𝑘]}𝛾
= { by definition of the semantics }
_𝑈 .
∫()
exp(J𝑒K𝛾 )𝛿() (𝑑𝑣) × {[𝑘]}𝛾 (𝑈 )
= { by the integration of the 𝛿 distribution }_𝑈 . exp(J𝑒K𝛾 ) × {[𝑘]}𝛾 (𝑈 )
= { by the semantics of Jtarget += 𝑒K𝛾 }_𝑈 . exp(Jtarget += 𝑒K𝛾 (target)) × {[𝑘]}𝛾 (𝑈 )
= { by definition of 𝛾 ′ }_𝑈 . exp(𝛾 ′(target)) × {[𝑘]}𝛾 ′ [target←0] (𝑈 )
Sequence. If 𝛾1 = Jstmt1K𝛾 and 𝛾2 = Jstmt2K𝛾1 [target←0] ,the induction hypothesis and the semantics of GProb yield:
{[C𝑘 (stmt1; stmt2)]}𝛾 = {[CC𝑘 (stmt2) (stmt1)]}𝛾= { by induction on stmt1 and definition of 𝛾1 }_𝑈 .exp(𝛾1 (target)) × {[C𝑘 (stmt2)]}𝛾1 [target←0] (𝑈 )
= { by induction on stmt2 and definition of 𝛾2 }_𝑈 .exp(𝛾1 (target)) × exp(𝛾2 (target)) × {[𝑘]}𝛾2 [target←0] (𝑈 )