Page 1
Trustworthy, Useful Languages forTrustworthy, Useful Languages forTrustworthy, Useful Languages for
Probabilistic Modeling and InferenceProbabilistic Modeling and InferenceProbabilistic Modeling and Inference
Neil Toronto
Dissertation Defense
Brigham Young University
2014/06/11
Page 2
Master’s Research: Super-ResolutionMaster’s Research: Super-ResolutionMaster’s Research: Super-Resolution
Toronto et al. Super-Resolution via Recapture and Bayesian EffectModeling. CVPR 2009 1111111111111111
Page 3
Master’s Research: Super-ResolutionMaster’s Research: Super-ResolutionMaster’s Research: Super-Resolution
• Model and query: Half a page of beautiful math
2222222222222222
Page 4
Master’s Research: Super-ResolutionMaster’s Research: Super-ResolutionMaster’s Research: Super-Resolution
• Query implementation: 600 lines of Python
2222222222222222
Page 5
Main Results: Super-ResolutionMain Results: Super-ResolutionMain Results: Super-Resolution
• Competitor and BEI on 4x super-resolution:
Resolution Synthesis
3333333333333333
Page 6
Main Results: Super-ResolutionMain Results: Super-ResolutionMain Results: Super-Resolution
• Competitor and BEI on 4x super-resolution:
Resolution Synthesis Bayesian Edge Inference
3333333333333333
Page 7
Main Results: Super-ResolutionMain Results: Super-ResolutionMain Results: Super-Resolution
• Competitor and BEI on 4x super-resolution:
Resolution Synthesis Bayesian Edge Inference
• Beat state-of-the-art on “objective” measures
3333333333333333
Page 8
Main Results: Super-ResolutionMain Results: Super-ResolutionMain Results: Super-Resolution
• Competitor and BEI on 4x super-resolution:
Resolution Synthesis Bayesian Edge Inference
• Beat state-of-the-art on “objective” measures
• Was capable of other reconstruction tasks with few changes 3333333333333333
Page 9
Only Mostly SatisfyingOnly Mostly SatisfyingOnly Mostly Satisfying
Problem 1: Still not sure the program is right
4444444444444444
Page 10
Only Mostly SatisfyingOnly Mostly SatisfyingOnly Mostly Satisfying
Problem 1: Still not sure the program is right
Problem 2: smooth edges instead of discontinuous
4444444444444444
Page 11
Only Mostly SatisfyingOnly Mostly SatisfyingOnly Mostly Satisfying
Problem 1: Still not sure the program is right
Problem 2: smooth edges instead of discontinuous
“To approximate blurring with a spatially varying point-spreadfunction (PSF), we assign each facet a Gaussian PSF andconvolve each analytically before combining outputs.”
4444444444444444
Page 12
Only Mostly SatisfyingOnly Mostly SatisfyingOnly Mostly Satisfying
Problem 1: Still not sure the program is right
Problem 2: smooth edges instead of discontinuous
“To approximate blurring with a spatially varying point-spreadfunction (PSF), we assign each facet a Gaussian PSF andconvolve each analytically before combining outputs.”
i.e. “We can’t model it correctly so here’s a hack.” 4444444444444444
Page 13
Solution Idea: Probabilistic LanguageSolution Idea: Probabilistic LanguageSolution Idea: Probabilistic Language
5555555555555555
Page 14
Solution Idea: Probabilistic LanguageSolution Idea: Probabilistic LanguageSolution Idea: Probabilistic Language
• Also somehow let me model correctly 5555555555555555
Page 15
Prior WorkPrior WorkPrior Work
Defined by animplementation
Defined by a semantics (i.e. mathematically)
6666666666666666
Page 16
Prior WorkPrior WorkPrior Work
Defined by animplementation
Defined by a semantics (i.e. mathematically)
Designed for Bayesian practice
6666666666666666
Page 17
Prior WorkPrior WorkPrior Work
Defined by animplementation
Defined by a semantics (i.e. mathematically)
Designed for Bayesian practice
Mimic human translation
6666666666666666
Page 18
Prior WorkPrior WorkPrior Work
Defined by animplementation
Defined by a semantics (i.e. mathematically)
Designed for Bayesian practice
Mimic human translation
Can’t tell error from feature
6666666666666666
Page 19
Prior WorkPrior WorkPrior Work
Defined by animplementation
Defined by a semantics (i.e. mathematically)
Designed for Bayesian practice
Mimic human translation
Can’t tell error from feature
Limited: usually no recursion orloops; conditions
6666666666666666
Page 20
Prior WorkPrior WorkPrior Work
Defined by animplementation
Defined by a semantics (i.e. mathematically)
Designed for Bayesian practice Designed for functionalprogrammers or FP theorists
Mimic human translation
Can’t tell error from feature
Limited: usually no recursion orloops; conditions
6666666666666666
Page 21
Prior WorkPrior WorkPrior Work
Defined by animplementation
Defined by a semantics (i.e. mathematically)
Designed for Bayesian practice Designed for functionalprogrammers or FP theorists
Mimic human translation May not be implemented
Can’t tell error from feature
Limited: usually no recursion orloops; conditions
6666666666666666
Page 22
Prior WorkPrior WorkPrior Work
Defined by animplementation
Defined by a semantics (i.e. mathematically)
Designed for Bayesian practice Designed for functionalprogrammers or FP theorists
Mimic human translation May not be implemented
Can’t tell error from feature Behavior is well-defined
Limited: usually no recursion orloops; conditions
6666666666666666
Page 23
Prior WorkPrior WorkPrior Work
Defined by animplementation
Defined by a semantics (i.e. mathematically)
Designed for Bayesian practice Designed for functionalprogrammers or FP theorists
Mimic human translation May not be implemented
Can’t tell error from feature Behavior is well-defined
Limited: usually no recursion orloops; conditions
Limited: usually finitedistributions, no conditioning
6666666666666666
Page 24
Prior WorkPrior WorkPrior Work
Defined by animplementation
Defined by a semantics (i.e. mathematically)
Designed for Bayesian practice Designed for functionalprogrammers or FP theorists
Mimic human translation May not be implemented
Can’t tell error from feature Behavior is well-defined
Limited: usually no recursion orloops; conditions
Limited: usually finitedistributions, no conditioning
Best of all worlds: define language using functional programmingtheory, make it for Bayesians, and remove limitations 6666666666666666
Page 25
Thesis StatementThesis StatementThesis Statement
Functional programming theory and measure-theoreticprobability provide a solid foundation
for trustworthy, useful languages for constructiveprobabilistic modeling and inference.
7777777777777777
Page 26
Thesis StatementThesis StatementThesis Statement
Functional programming theory and measure-theoreticprobability provide a solid foundation
for trustworthy, useful languages for constructiveprobabilistic modeling and inference.
• Useful: let you think abstractly and handle details for you
7777777777777777
Page 27
Thesis StatementThesis StatementThesis Statement
Functional programming theory and measure-theoreticprobability provide a solid foundation
for trustworthy, useful languages for constructiveprobabilistic modeling and inference.
• Useful: let you think abstractly and handle details for you
• Trustworthy: defined mathematically
7777777777777777
Page 28
Thesis StatementThesis StatementThesis Statement
Functional programming theory and measure-theoreticprobability provide a solid foundation
for trustworthy, useful languages for constructiveprobabilistic modeling and inference.
• Useful: let you think abstractly and handle details for you
• Trustworthy: defined mathematically
• Functional programming theory has the tools to defineprogramming languages mathematically
7777777777777777
Page 29
Thesis StatementThesis StatementThesis Statement
Functional programming theory and measure-theoreticprobability provide a solid foundation
for trustworthy, useful languages for constructiveprobabilistic modeling and inference.
• Useful: let you think abstractly and handle details for you
• Trustworthy: defined mathematically
• Functional programming theory has the tools to defineprogramming languages mathematically
• Measure-theoretic probability is the most complete account ofprobability; should allow shedding common limitations 7777777777777777
Page 30
Simple Example ProcessSimple Example ProcessSimple Example Process
• Example process: Normal-Normal
8888888888888888
Page 31
Simple Example ProcessSimple Example ProcessSimple Example Process
• Example process: Normal-Normal
• Intuition: Sample , then sample using
8888888888888888
Page 32
Simple Example ProcessSimple Example ProcessSimple Example Process
• Example process: Normal-Normal
• Intuition: Sample , then sample using
• Density model :
8888888888888888
Page 33
Simple Example ProcessSimple Example ProcessSimple Example Process
• Example process: Normal-Normal
• Intuition: Sample , then sample using
• Compute query by integrating:
8888888888888888
Page 34
Conditional QueriesConditional QueriesConditional Queries
• Compute query using Bayes’ law:
9999999999999999
Page 35
Conditional QueriesConditional QueriesConditional Queries
• Compute query using Bayes’ law:
9999999999999999
Page 36
Conditional QueriesConditional QueriesConditional Queries
• Compute query using Bayes’ law:
9999999999999999
Page 37
Conditional QueriesConditional QueriesConditional Queries
• Compute query using Bayes’ law:
9999999999999999
Page 38
Conditional QueriesConditional QueriesConditional Queries
• Compute query using Bayes’ law:
9999999999999999
Page 39
Conditional QueriesConditional QueriesConditional Queries
• Compute query using Bayes’ law:
9999999999999999
Page 40
So What Can’t Densities Model?So What Can’t Densities Model?So What Can’t Densities Model?
10101010101010101010101010101010
Page 41
So What Can’t Densities Model?So What Can’t Densities Model?So What Can’t Densities Model?
• Tons of useful things that are easy to write down
10101010101010101010101010101010
Page 42
So What Can’t Densities Model?So What Can’t Densities Model?So What Can’t Densities Model?
• Tons of useful things that are easy to write down
Distributions given non-axial, zero-probability conditions
10101010101010101010101010101010
Page 43
So What Can’t Densities Model?So What Can’t Densities Model?So What Can’t Densities Model?
• Tons of useful things that are easy to write down
Distributions given non-axial, zero-probability conditions
Discontinuous change of variable (e.g. a thermometer)
10101010101010101010101010101010
Page 44
So What Can’t Densities Model?So What Can’t Densities Model?So What Can’t Densities Model?
• Tons of useful things that are easy to write down
Distributions given non-axial, zero-probability conditions
Discontinuous change of variable (e.g. a thermometer)
Distributions of variable-dimension random variables
10101010101010101010101010101010
Page 45
So What Can’t Densities Model?So What Can’t Densities Model?So What Can’t Densities Model?
• Tons of useful things that are easy to write down
Distributions given non-axial, zero-probability conditions
Discontinuous change of variable (e.g. a thermometer)
Distributions of variable-dimension random variables
Nontrivial distributions on infinite products
10101010101010101010101010101010
Page 46
So What Can’t Densities Model?So What Can’t Densities Model?So What Can’t Densities Model?
• Tons of useful things that are easy to write down
Distributions given non-axial, zero-probability conditions
Discontinuous change of variable (e.g. a thermometer)
Distributions of variable-dimension random variables
Nontrivial distributions on infinite products
• Tricks to get around limitations aren’t general enough 10101010101010101010101010101010
Page 47
Measure-Theoretic ProbabilityMeasure-Theoretic ProbabilityMeasure-Theoretic Probability
• Main ideas:
Don’t assign probability-like quantities to values, assignprobabilities to sets — the probability query is king
11111111111111111111111111111111
Page 48
Measure-Theoretic ProbabilityMeasure-Theoretic ProbabilityMeasure-Theoretic Probability
• Main ideas:
Don’t assign probability-like quantities to values, assignprobabilities to sets — the probability query is king
Confine assumed randomness to one place by makingrandom variables deterministic functions that observe arandom source
11111111111111111111111111111111
Page 49
Measure-Theoretic ProbabilityMeasure-Theoretic ProbabilityMeasure-Theoretic Probability
• Main ideas:
Don’t assign probability-like quantities to values, assignprobabilities to sets — the probability query is king
Confine assumed randomness to one place by makingrandom variables deterministic functions that observe arandom source
• Measure-theoretic model of example process:
11111111111111111111111111111111
Page 50
Measure-Theoretic ProbabilityMeasure-Theoretic ProbabilityMeasure-Theoretic Probability
• Main ideas:
Don’t assign probability-like quantities to values, assignprobabilities to sets — the probability query is king
Confine assumed randomness to one place by makingrandom variables deterministic functions that observe arandom source
• Measure-theoretic model of example process:
11111111111111111111111111111111
Page 51
Measure-Theoretic QueriesMeasure-Theoretic QueriesMeasure-Theoretic Queries
• Specific query:
12121212121212121212121212121212
Page 52
Measure-Theoretic QueriesMeasure-Theoretic QueriesMeasure-Theoretic Queries
• Specific query:
• Generalized:
12121212121212121212121212121212
Page 53
Measure-Theoretic QueriesMeasure-Theoretic QueriesMeasure-Theoretic Queries
• Specific query:
• Generalized:
• Conditional query: if then
12121212121212121212121212121212
Page 54
Measure-Theoretic QueriesMeasure-Theoretic QueriesMeasure-Theoretic Queries
• Specific query:
• Generalized:
• Conditional query: if then
Can we avoid densities when ?
12121212121212121212121212121212
Page 55
Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)
13131313131313131313131313131313
Page 56
Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)
13131313131313131313131313131313
Page 57
Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)
13131313131313131313131313131313
Page 58
Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)
13131313131313131313131313131313
Page 59
Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)
13131313131313131313131313131313
Page 60
Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)
13131313131313131313131313131313
Page 61
Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)Zero-Probability Conditions (Axial)
13131313131313131313131313131313
Page 62
Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)
14141414141414141414141414141414
Page 63
Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)
14141414141414141414141414141414
Page 64
Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)
14141414141414141414141414141414
Page 65
Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)
14141414141414141414141414141414
Page 66
Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)Zero-Probability Conditions (Circular)
14141414141414141414141414141414
Page 67
Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)
• Integration is hard!
15151515151515151515151515151515
Page 68
Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)
• Integration is hard!
• But random variables and are an abstraction boundaryhiding and , so we can choose convenient ones
15151515151515151515151515151515
Page 69
Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)
• Integration is hard!
• But random variables and are an abstraction boundaryhiding and , so we can choose convenient ones
A uniform random source model:
15151515151515151515151515151515
Page 70
Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)
• Integration is hard!
• But random variables and are an abstraction boundaryhiding and , so we can choose convenient ones
A uniform random source model:
where is the Normal CDF
15151515151515151515151515151515
Page 71
Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)Contribution: Don’t Integrate, Compute Backwards (1)
• Integration is hard!
• But random variables and are an abstraction boundaryhiding and , so we can choose convenient ones
A uniform random source model:
where is the Normal CDF
• Stretches instead of integrates 15151515151515151515151515151515
Page 72
Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)
• Generalized query:
16161616161616161616161616161616
Page 73
Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)
• Generalized query:
i.e. output distributions are defined by preimages
16161616161616161616161616161616
Page 74
Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)
• Generalized query:
i.e. output distributions are defined by preimages
• For a uniform random source model,
Compute probabilities by computing preimage areas
16161616161616161616161616161616
Page 75
Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)
• Generalized query:
i.e. output distributions are defined by preimages
• For a uniform random source model,
Compute probabilities by computing preimage areas
Compute conditional probabilities as quotients of preimageareas
16161616161616161616161616161616
Page 76
Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)Contribution: Don’t Integrate, Compute Backwards (2)
• Generalized query:
i.e. output distributions are defined by preimages
• For a uniform random source model,
Compute probabilities by computing preimage areas
Compute conditional probabilities as quotients of preimageareas
• Is this really more feasible than integrating?
16161616161616161616161616161616
Page 77
Queries Using PreimagesQueries Using PreimagesQueries Using Preimages
17171717171717171717171717171717
Page 78
Queries Using PreimagesQueries Using PreimagesQueries Using Preimages
Uniform Random Source Original Model
17171717171717171717171717171717
Page 79
Queries Using PreimagesQueries Using PreimagesQueries Using Preimages
Uniform Random Source Original Model
17171717171717171717171717171717
Page 80
Queries Using PreimagesQueries Using PreimagesQueries Using Preimages
Uniform Random Source Original Model
17171717171717171717171717171717
Page 81
Queries Using PreimagesQueries Using PreimagesQueries Using Preimages
Uniform Random Source Original Model
17171717171717171717171717171717
Page 82
Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...
• Seems like we need:
Standard interpretation of programs as pure functions from arandom source
18181818181818181818181818181818
Page 83
Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...
• Seems like we need:
Standard interpretation of programs as pure functions from arandom source
Efficient way to compute preimage sets
18181818181818181818181818181818
Page 84
Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...
• Seems like we need:
Standard interpretation of programs as pure functions from arandom source
Efficient way to compute preimage sets
Efficient representation of arbitrary sets
18181818181818181818181818181818
Page 85
Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...
• Seems like we need:
Standard interpretation of programs as pure functions from arandom source
Efficient way to compute preimage sets
Efficient representation of arbitrary sets
Efficient way to compute areas of preimage sets
18181818181818181818181818181818
Page 86
Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...
• Seems like we need:
Standard interpretation of programs as pure functions from arandom source
Efficient way to compute preimage sets
Efficient representation of arbitrary sets
Efficient way to compute areas of preimage sets
Proof of correctness w.r.t. standard interpretation
18181818181818181818181818181818
Page 87
Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...
• Seems like we need:
Standard interpretation of programs as pure functions from arandom source
Efficient way to compute preimage sets
Efficient representation of arbitrary sets
Efficient way to compute areas of preimage sets
Proof of correctness w.r.t. standard interpretation
• Completely infeasible! But...
18181818181818181818181818181818
Page 88
What About Approximating?What About Approximating?What About Approximating?
Conservative approximation with rectangles:
19191919191919191919191919191919
Page 89
What About Approximating?What About Approximating?What About Approximating?
Conservative approximation with rectangles:
19191919191919191919191919191919
Page 90
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
20202020202020202020202020202020
Page 91
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
20202020202020202020202020202020
Page 92
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
20202020202020202020202020202020
Page 93
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
20202020202020202020202020202020
Page 94
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
20202020202020202020202020202020
Page 95
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
20202020202020202020202020202020
Page 96
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
20202020202020202020202020202020
Page 97
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
20202020202020202020202020202020
Page 98
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
20202020202020202020202020202020
Page 99
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
20202020202020202020202020202020
Page 100
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
20202020202020202020202020202020
Page 101
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
20202020202020202020202020202020
Page 102
What About Approximating?What About Approximating?What About Approximating?
Sampling: exponential to quadratic (e.g. days to minutes)
21212121212121212121212121212121
Page 103
What About Approximating?What About Approximating?What About Approximating?
Sampling: exponential to quadratic (e.g. days to minutes)
21212121212121212121212121212121
Page 104
Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...
• Standard interpretation of programs as pure functions from arandom source
• Efficient way to compute preimage sets
• Efficient representation of arbitrary sets
• Efficient way to compute volumes of preimage sets
• Proof of correctness w.r.t. standard interpretation
22222222222222222222222222222222
Page 105
Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...
• Standard interpretation of programs as pure functions from arandom source
• Efficient way to compute approximate preimage subsets
• Efficient representation of arbitrary sets
• Efficient way to compute volumes of preimage sets
• Proof of correctness w.r.t. standard interpretation
22222222222222222222222222222222
Page 106
Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...
• Standard interpretation of programs as pure functions from arandom source
• Efficient way to compute approximate preimage subsets
• Efficient representation of approximating sets
• Efficient way to compute volumes of preimage sets
• Proof of correctness w.r.t. standard interpretation
22222222222222222222222222222222
Page 107
Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...
• Standard interpretation of programs as pure functions from arandom source
• Efficient way to compute approximate preimage subsets
• Efficient representation of approximating sets
• Efficient way to sample uniformly in preimage sets
• Proof of correctness w.r.t. standard interpretation
22222222222222222222222222222222
Page 108
Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...
• Standard interpretation of programs as pure functions from arandom source
• Efficient way to compute approximate preimage subsets
• Efficient representation of approximating sets
• Efficient way to sample uniformly in preimage sets
Efficient domain partition sampling
• Proof of correctness w.r.t. standard interpretation
22222222222222222222222222222222
Page 109
Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...Crazy Idea is Actually Feasible If...
• Standard interpretation of programs as pure functions from arandom source
• Efficient way to compute approximate preimage subsets
• Efficient representation of approximating sets
• Efficient way to sample uniformly in preimage sets
Efficient domain partition sampling
Efficient way to determine whether a domain sample isactually in the preimage (just use standard interpretation)
• Proof of correctness w.r.t. standard interpretation
22222222222222222222222222222222
Page 110
Standard InterpretationStandard InterpretationStandard Interpretation
• Grammar:
23232323232323232323232323232323
Page 111
Standard InterpretationStandard InterpretationStandard Interpretation
• Grammar:
• Semantic function
23232323232323232323232323232323
Page 112
Standard InterpretationStandard InterpretationStandard Interpretation
• Grammar:
• Semantic function
• Math has no general recursion, so (i.e. interpretation ofprogram ) is a λ-calculus term
23232323232323232323232323232323
Page 113
Standard InterpretationStandard InterpretationStandard Interpretation
• Grammar:
• Semantic function
• Math has no general recursion, so (i.e. interpretation ofprogram ) is a λ-calculus term
• Easy implementation in any language with lambdas
23232323232323232323232323232323
Page 114
Compositional SemanticsCompositional SemanticsCompositional Semantics
• Compositional: every term’s meaning depends only on itsimmediate subterms’ meanings
24242424242424242424242424242424
Page 115
Compositional SemanticsCompositional SemanticsCompositional Semantics
• Compositional: every term’s meaning depends only on itsimmediate subterms’ meanings
• Advantage: proofs about all programs by structural induction
24242424242424242424242424242424
Page 116
Compositional SemanticsCompositional SemanticsCompositional Semantics
• Compositional: every term’s meaning depends only on itsimmediate subterms’ meanings
• Advantage: proofs about all programs by structural induction
• Example: meaning of
24242424242424242424242424242424
Page 117
Compositional SemanticsCompositional SemanticsCompositional Semantics
• Compositional: every term’s meaning depends only on itsimmediate subterms’ meanings
• Advantage: proofs about all programs by structural induction
• Example: meaning of
24242424242424242424242424242424
Page 118
Compositional SemanticsCompositional SemanticsCompositional Semantics
• Compositional: every term’s meaning depends only on itsimmediate subterms’ meanings
• Advantage: proofs about all programs by structural induction
• Example: meaning of
• Nonexample:
24242424242424242424242424242424
Page 119
Compositional SemanticsCompositional SemanticsCompositional Semantics
• Compositional: every term’s meaning depends only on itsimmediate subterms’ meanings
• Advantage: proofs about all programs by structural induction
• Example: meaning of
• Nonexample:
24242424242424242424242424242424
Page 120
Compositional SemanticsCompositional SemanticsCompositional Semantics
• Compositional: every term’s meaning depends only on itsimmediate subterms’ meanings
• Advantage: proofs about all programs by structural induction
• Example: meaning of
• Nonexample:
• Can preimages be computed compositionally? 24242424242424242424242424242424
Page 121
Pair PreimagesPair PreimagesPair Preimages
25252525252525252525252525252525
Page 122
Pair PreimagesPair PreimagesPair Preimages
25252525252525252525252525252525
Page 123
Pair PreimagesPair PreimagesPair Preimages
:
25252525252525252525252525252525
Page 124
Pair PreimagesPair PreimagesPair Preimages
and :
25252525252525252525252525252525
Page 125
Pair PreimagesPair PreimagesPair Preimages
:
25252525252525252525252525252525
Page 126
Nonstandard Interpretation: Computing PreimagesNonstandard Interpretation: Computing PreimagesNonstandard Interpretation: Computing Preimages
• Preimage computation:
26262626262626262626262626262626
Page 127
Nonstandard Interpretation: Computing PreimagesNonstandard Interpretation: Computing PreimagesNonstandard Interpretation: Computing Preimages
• Preimage computation:
26262626262626262626262626262626
Page 128
Nonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under Pairing
• Pairing types:
27272727272727272727272727272727
Page 129
Nonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under Pairing
• Pairing types:
Theorem (correctness under pairing). If
computes preimages under
27272727272727272727272727272727
Page 130
Nonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under Pairing
• Pairing types:
Theorem (correctness under pairing). If
computes preimages under
computes preimages under
27272727272727272727272727272727
Page 131
Nonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under Pairing
• Pairing types:
Theorem (correctness under pairing). If
computes preimages under
computes preimages under
then computes preimages under .
27272727272727272727272727272727
Page 132
Nonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under Pairing
• Pairing types:
Theorem (correctness under pairing). If
computes preimages under
computes preimages under
then computes preimages under .
Proof sketch. Preimages distribute over cartesian products.
27272727272727272727272727272727
Page 133
Nonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under PairingNonstandard Interpretation: Preimages Under Pairing
• Pairing types:
Theorem (correctness under pairing). If
computes preimages under
computes preimages under
then computes preimages under .
Proof sketch. Preimages distribute over cartesian products.
• Similar theorems for every kind of term 27272727272727272727272727272727
Page 134
Nonstandard Interpretation: CorrectnessNonstandard Interpretation: CorrectnessNonstandard Interpretation: Correctness
Theorem. For all programs , computes preimages under.
Proof. By structural induction on program terms.
28282828282828282828282828282828
Page 135
Wait a MinuteWait a MinuteWait a Minute
• Q. Don’t the interpretations of do uncountable things?
29292929292929292929292929292929
Page 136
Wait a MinuteWait a MinuteWait a Minute
• Q. Don’t the interpretations of do uncountable things?
A. Yes. Yes, they do.
29292929292929292929292929292929
Page 137
Wait a MinuteWait a MinuteWait a Minute
• Q. Don’t the interpretations of do uncountable things?
A. Yes. Yes, they do.
• Q. Where do I get a computer that runs them?
29292929292929292929292929292929
Page 138
Wait a MinuteWait a MinuteWait a Minute
• Q. Don’t the interpretations of do uncountable things?
A. Yes. Yes, they do.
• Q. Where do I get a computer that runs them?
A. Nowhere, but we’ll approximate them soon.
29292929292929292929292929292929
Page 139
Wait a MinuteWait a MinuteWait a Minute
• Q. Don’t the interpretations of do uncountable things?
A. Yes. Yes, they do.
• Q. Where do I get a computer that runs them?
A. Nowhere, but we’ll approximate them soon.
• Q. Why interpret programs as uncomputable functions, then?
29292929292929292929292929292929
Page 140
Wait a MinuteWait a MinuteWait a Minute
• Q. Don’t the interpretations of do uncountable things?
A. Yes. Yes, they do.
• Q. Where do I get a computer that runs them?
A. Nowhere, but we’ll approximate them soon.
• Q. Why interpret programs as uncomputable functions, then?
A. So we know exactly what to approximate.
29292929292929292929292929292929
Page 141
Wait a MinuteWait a MinuteWait a Minute
• Q. Don’t the interpretations of do uncountable things?
A. Yes. Yes, they do.
• Q. Where do I get a computer that runs them?
A. Nowhere, but we’ll approximate them soon.
• Q. Why interpret programs as uncomputable functions, then?
A. So we know exactly what to approximate.
• Q. Where did you get a λ-calculus that could operate on arbitrary,possibly infinite sets, anyway?
A. Well...
29292929292929292929292929292929
Page 142
Lambda-ZFCLambda-ZFCLambda-ZFC
λ calculus
30303030303030303030303030303030
Page 143
Lambda-ZFCLambda-ZFCLambda-ZFC
λ calculus
+
Infinite sets and operations
30303030303030303030303030303030
Page 144
Lambda-ZFCLambda-ZFCLambda-ZFC
λ calculus
+
Infinite sets and operations
=
λZFC
30303030303030303030303030303030
Page 145
Lambda-ZFCLambda-ZFCLambda-ZFC
λ calculus
+
Infinite sets and operations
=
λZFC
• Contemporary math, but with lambdas and general recursion; orfunctional programming, but with infinite sets
30303030303030303030303030303030
Page 146
Lambda-ZFCLambda-ZFCLambda-ZFC
λ calculus
+
Infinite sets and operations
=
λZFC
• Contemporary math, but with lambdas and general recursion; orfunctional programming, but with infinite sets
• Can express uncountably infinite operations, can’t solve its ownhalting problem
30303030303030303030303030303030
Page 147
Lambda-ZFCLambda-ZFCLambda-ZFC
λ calculus
+
Infinite sets and operations
=
λZFC
• Contemporary math, but with lambdas and general recursion; orfunctional programming, but with infinite sets
• Can express uncountably infinite operations, can’t solve its ownhalting problem
• Can use contemporary mathematical theorems directly
30303030303030303030303030303030
Page 148
Rectangular ApproximationRectangular ApproximationRectangular Approximation
• A rectangle is
An interval or union of intervals
for rectangles and
31313131313131313131313131313131
Page 149
Rectangular ApproximationRectangular ApproximationRectangular Approximation
• A rectangle is
An interval or union of intervals
for rectangles and
• Easy representation; easy intersection and join (union-like)operation, empty test, other operations
31313131313131313131313131313131
Page 150
Rectangular ApproximationRectangular ApproximationRectangular Approximation
• A rectangle is
An interval or union of intervals
for rectangles and
• Easy representation; easy intersection and join (union-like)operation, empty test, other operations
• Recall:
31313131313131313131313131313131
Page 151
Rectangular ApproximationRectangular ApproximationRectangular Approximation
• A rectangle is
An interval or union of intervals
for rectangles and
• Easy representation; easy intersection and join (union-like)operation, empty test, other operations
• Recall:
• Define:
31313131313131313131313131313131
Page 152
Rectangular ApproximationRectangular ApproximationRectangular Approximation
• A rectangle is
An interval or union of intervals
for rectangles and
• Easy representation; easy intersection and join (union-like)operation, empty test, other operations
• Recall:
• Define:
• Derive 31313131313131313131313131313131
Page 153
In Theory...In Theory...In Theory...
Theorem (sound). computes overapproximations of thepreimages computed by .
• Consequence: Sampling within preimages doesn’t leave anythingout
32323232323232323232323232323232
Page 154
In Theory...In Theory...In Theory...
Theorem (sound). computes overapproximations of thepreimages computed by .
• Consequence: Sampling within preimages doesn’t leave anythingout
Theorem (monotone). is monotone.
• Consequence: Partitioning the domain never increasesapproximate preimages
32323232323232323232323232323232
Page 155
In Theory...In Theory...In Theory...
Theorem (sound). computes overapproximations of thepreimages computed by .
• Consequence: Sampling within preimages doesn’t leave anythingout
Theorem (monotone). is monotone.
• Consequence: Partitioning the domain never increasesapproximate preimages
Theorem (decreasing). never returns preimages larger thanthe given subdomain.
• Consequence: Refining preimage partitions never explodes32323232323232323232323232323232
Page 156
In Practice...In Practice...In Practice...
Theorems prove this always works:
33333333333333333333333333333333
Page 157
In Practice...In Practice...In Practice...
Theorems prove this always works:
33333333333333333333333333333333
Page 158
In Practice...In Practice...In Practice...
Theorems prove this always works:
33333333333333333333333333333333
Page 159
In Practice...In Practice...In Practice...
Theorems prove this always works:
33333333333333333333333333333333
Page 160
In Practice...In Practice...In Practice...
Theorems prove this always works:
33333333333333333333333333333333
Page 161
In Practice...In Practice...In Practice...
Theorems prove this always works:
33333333333333333333333333333333
Page 162
In Practice...In Practice...In Practice...
Theorems prove this always works:
33333333333333333333333333333333
Page 163
In Practice...In Practice...In Practice...
Theorems prove this always works:
33333333333333333333333333333333
Page 164
In Practice...In Practice...In Practice...
Theorems prove this always works:
33333333333333333333333333333333
Page 165
Importance SamplingImportance SamplingImportance Sampling
• Alternative to arbitrarily low-rate rejection sampling:
34343434343434343434343434343434
Page 166
Importance SamplingImportance SamplingImportance Sampling
• Alternative to arbitrarily low-rate rejection sampling:
First, refine using preimage computation:
34343434343434343434343434343434
Page 167
Importance SamplingImportance SamplingImportance Sampling
• Alternative to arbitrarily low-rate rejection sampling:
Second, randomly choose from arbitrarily fine partition:
34343434343434343434343434343434
Page 168
Importance SamplingImportance SamplingImportance Sampling
• Alternative to arbitrarily low-rate rejection sampling:
Third, refine again:
34343434343434343434343434343434
Page 169
Importance SamplingImportance SamplingImportance Sampling
• Alternative to arbitrarily low-rate rejection sampling:
Fourth, sample uniformly:
34343434343434343434343434343434
Page 170
Importance SamplingImportance SamplingImportance Sampling
• Alternative to arbitrarily low-rate rejection sampling:
Do process “in the limit”; i.e. choose :
34343434343434343434343434343434
Page 171
What About Recursion?What About Recursion?What About Recursion?
• General recursion, programs that halt with probability 1; e.g.
(define/drbayes (geometric p) (if (bernoulli p)
0(+ 1 (geometric p))))
35353535353535353535353535353535
Page 172
What About Recursion?What About Recursion?What About Recursion?
• General recursion, programs that halt with probability 1; e.g.
(define/drbayes (geometric p) (if (bernoulli p)
0(+ 1 (geometric p))))
• Consider programs as being fully inlined (thus infinite):
(if (bernoulli p)0(+ 1 (if (bernoulli p)
0(+ 1 (if (bernoulli p)
0(+ 1 ...))))))
35353535353535353535353535353535
Page 173
What About Recursion?What About Recursion?What About Recursion?
• General recursion, programs that halt with probability 1; e.g.
(define/drbayes (geometric p) (if (bernoulli p)
0(+ 1 (geometric p))))
• Consider programs as being fully inlined (thus infinite):
(if (bernoulli p)0(+ 1 (if (bernoulli p)
0(+ 1 (if (bernoulli p)
0(+ 1 ...))))))
• Random domain needs to be big enough and the right shape35353535353535353535353535353535
Page 174
Program Domain ValuesProgram Domain ValuesProgram Domain Values
• Values are infinite binary trees:
36363636363636363636363636363636
Page 175
Program Domain ValuesProgram Domain ValuesProgram Domain Values
• Values are infinite binary trees:
• Every expression in a program is assigned a node
36363636363636363636363636363636
Page 176
Program Domain ValuesProgram Domain ValuesProgram Domain Values
• Values are infinite binary trees:
• Every expression in a program is assigned a node
• Implemented using lazy trees of random values
36363636363636363636363636363636
Page 177
Program Domain ValuesProgram Domain ValuesProgram Domain Values
• Values are infinite binary trees:
• Every expression in a program is assigned a node
• Implemented using lazy trees of random values
• No probability density for domain, but there is a measure 36363636363636363636363636363636
Page 178
Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition
• Normal-Normal process:
37373737373737373737373737373737
Page 179
Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition
• Normal-Normal process:
• Objective: Find the distribution of
37373737373737373737373737373737
Page 180
Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition
• Normal-Normal process:
• Objective: Find the distribution of
• Implementation:
(define/drbayes e (let* ([x (normal 0 1)]
[y (normal x 1)]) (list x y (sqrt (+ (sqr x) (sqr y))))))
37373737373737373737373737373737
Page 181
Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition
• Normal-Normal process:
• Objective: Find the distribution of
• Implementation:
(define/drbayes e (let* ([x (normal 0 1)]
[y (normal x 1)]) (list x y (sqrt (+ (sqr x) (sqr y))))))
• Goal: Sample in the preimage of
(set-list reals reals (interval (- 1 ε) (+ 1 ε)))
37373737373737373737373737373737
Page 182
Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition
For ε = 0.01:
Preimage rectangles
38383838383838383838383838383838
Page 183
Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition
For ε = 0.01:
Preimage samples
38383838383838383838383838383838
Page 184
Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition
For ε = 0.01:
Preimage samples Output samples
38383838383838383838383838383838
Page 185
Demo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular ConditionDemo: Normal-Normal With Circular Condition
For ε = 0.01:
Preimage samples Output samples
• Works fine with much smaller ε38383838383838383838383838383838
Page 186
Demo: ThermometerDemo: ThermometerDemo: Thermometer
• Normal-Normal thermometer process:
39393939393939393939393939393939
Page 187
Demo: ThermometerDemo: ThermometerDemo: Thermometer
• Normal-Normal thermometer process:
• Objective: Find the distribution of
39393939393939393939393939393939
Page 188
Demo: ThermometerDemo: ThermometerDemo: Thermometer
• Normal-Normal thermometer process:
• Objective: Find the distribution of
• Implementation:
(define/drbayes e (let* ([x (normal 90 10)]
[y (normal x 1)]) (list x (if (> y 100) 100 y))))
39393939393939393939393939393939
Page 189
Demo: ThermometerDemo: ThermometerDemo: Thermometer
• Normal-Normal thermometer process:
• Objective: Find the distribution of
• Implementation:
(define/drbayes e (let* ([x (normal 90 10)]
[y (normal x 1)]) (list x (if (> y 100) 100 y))))
• Goal: Sample in the preimage of
(set-list reals (interval 100 100)) 39393939393939393939393939393939
Page 190
Demo: ThermometerDemo: ThermometerDemo: Thermometer
Preimage rectangles
40404040404040404040404040404040
Page 191
Demo: ThermometerDemo: ThermometerDemo: Thermometer
Preimage samples
40404040404040404040404040404040
Page 192
Demo: ThermometerDemo: ThermometerDemo: Thermometer
Preimage samples Density of
40404040404040404040404040404040
Page 193
Demo: ThermometerDemo: ThermometerDemo: Thermometer
Preimage samples Density of
Calculated from samples: mean 105.1, stddev 4.6
40404040404040404040404040404040
Page 194
Demo: Stochastic Ray TracingDemo: Stochastic Ray TracingDemo: Stochastic Ray Tracing
• Idea: Model light transmission and reflection, condition on pathsthat pass through aperture
41414141414141414141414141414141
Page 195
Demo: Stochastic Ray TracingDemo: Stochastic Ray TracingDemo: Stochastic Ray Tracing
• Idea: Model light transmission and reflection, condition on pathsthat pass through aperture
41414141414141414141414141414141
Page 196
Demo: Stochastic Ray TracingDemo: Stochastic Ray TracingDemo: Stochastic Ray Tracing
• Part of the implementation (totals ~50 lines):(define/drbayes (ray-plane-intersect p0 v n d) (let ([denom (- (vec-dot v n))])
(if (positive? denom)(let ([t (/ (+ d (vec-dot p0 n)) denom)]) (if (positive? t)
(collision t (vec+ p0 (vec-scale v t)) n)#f))
#f)))
42424242424242424242424242424242
Page 197
Demo: Stochastic Ray TracingDemo: Stochastic Ray TracingDemo: Stochastic Ray Tracing
• Part of the implementation (totals ~50 lines):(define/drbayes (ray-plane-intersect p0 v n d) (let ([denom (- (vec-dot v n))])
(if (positive? denom)(let ([t (/ (+ d (vec-dot p0 n)) denom)]) (if (positive? t)
(collision t (vec+ p0 (vec-scale v t)) n)#f))
#f)))
• Constrained light path outputs:
Paths Through Aperture Projected and Accumulated
42424242424242424242424242424242
Page 198
Other Inference TasksOther Inference TasksOther Inference Tasks
• Typical
Hierarchical models
Bayesian regression
Model selection
43434343434343434343434343434343
Page 199
Other Inference TasksOther Inference TasksOther Inference Tasks
• Typical
Hierarchical models
Bayesian regression
Model selection
• Atypical
Programs that halt with probability < 1, or never halt
Probabilistic program verification (sample in preimage of errorcondition)
43434343434343434343434343434343
Page 200
Thesis StatementThesis StatementThesis Statement
Functional programming theory and measure-theoreticprobability provide a solid foundation
for trustworthy, useful languages for constructiveprobabilistic modeling and inference.
44444444444444444444444444444444
Page 201
Thesis StatementThesis StatementThesis Statement
Functional programming theory and measure-theoreticprobability provide a solid foundation
for trustworthy, useful languages for constructiveprobabilistic modeling and inference.
True.
44444444444444444444444444444444
Page 202
Thesis StatementThesis StatementThesis Statement
Functional programming theory and measure-theoreticprobability provide a solid foundation
for trustworthy, useful languages for constructiveprobabilistic modeling and inference.
True.
• Was it falsifiable?
44444444444444444444444444444444
Page 203
MeasurabilityMeasurabilityMeasurability
• Only measurable sets can have probabilities
45454545454545454545454545454545
Page 204
MeasurabilityMeasurabilityMeasurability
• Only measurable sets can have probabilities
• Computing preimages under must preserve measurability—wesay itself is measurable
45454545454545454545454545454545
Page 205
MeasurabilityMeasurabilityMeasurability
• Only measurable sets can have probabilities
• Computing preimages under must preserve measurability—wesay itself is measurable
Theorem (measurability). For all programs , is measurable,regardless of errors or nontermination, if language primitives aremeasurable.
45454545454545454545454545454545
Page 206
MeasurabilityMeasurabilityMeasurability
• Only measurable sets can have probabilities
• Computing preimages under must preserve measurability—wesay itself is measurable
Theorem (measurability). For all programs , is measurable,regardless of errors or nontermination, if language primitives aremeasurable.
• Primitives include uncomputable operations like limits
45454545454545454545454545454545
Page 207
MeasurabilityMeasurabilityMeasurability
• Only measurable sets can have probabilities
• Computing preimages under must preserve measurability—wesay itself is measurable
Theorem (measurability). For all programs , is measurable,regardless of errors or nontermination, if language primitives aremeasurable.
• Primitives include uncomputable operations like limits
• Applies to all probabilistic programming languages45454545454545454545454545454545
Page 208
What I DidWhat I DidWhat I Did
45454545454545454545454545454545
Page 209
What I DidWhat I DidWhat I Did
The core calculus for this:
45454545454545454545454545454545
Page 210
Future WorkFuture WorkFuture Work
• Expressiveness
Lambdas and macros
Exceptions, parameters (or continuations and marks)
46464646464646464646464646464646
Page 211
Future WorkFuture WorkFuture Work
• Expressiveness
Lambdas and macros
Exceptions, parameters (or continuations and marks)
• Optimization
Direct implementation is in depth; cut to
Incremental computation
Adaptive sampling algorithms
Static analysis
46464646464646464646464646464646
Page 212
Future WorkFuture WorkFuture Work
• Expressiveness
Lambdas and macros
Exceptions, parameters (or continuations and marks)
• Optimization
Direct implementation is in depth; cut to
Incremental computation
Adaptive sampling algorithms
Static analysis
• Branching out: investigate preimage computation connection withtype systems and predicate transformer semantics 46464646464646464646464646464646