Top Banner
Inference 1 Wikipedia
98
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Inference 1

Inference 1Wikipedia

Page 2: Inference 1

Contents

1 Adverse inference 11.1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Arbitrary inference 22.1 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

3 Biological network inference 33.1 Biological networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3.1.1 Transcriptional regulatory networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33.1.2 Signal transduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.1.3 Metabolic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.1.4 Protein-protein interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3.2 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

4 Constraint inference 64.1 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64.2 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

5 Correspondent inference theory 75.1 Attributing intention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75.2 Non-Common effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75.3 Low-Social desirability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85.4 Expectancies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85.5 Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85.6 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95.8 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

6 Deep inference 106.1 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106.2 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

7 Dictum de omni et nullo 11

i

Page 3: Inference 1

ii CONTENTS

7.1 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117.2 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127.3 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127.4 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

8 Downward entailing 138.1 Strawson-DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138.2 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148.3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

9 Grammar induction 159.1 Grammar Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159.2 Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159.3 Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

9.3.1 Grammatical inference by trial-and-error . . . . . . . . . . . . . . . . . . . . . . . . . . 169.3.2 Grammatical inference by genetic algorithms . . . . . . . . . . . . . . . . . . . . . . . . 169.3.3 Grammatical inference by greedy algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 169.3.4 Distributional Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169.3.5 Learning of Pattern languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179.3.6 Pattern theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

9.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179.5 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

10 Implicature 1910.1 Types of implicature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

10.1.1 Conversational implicature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1910.1.2 Conventional implicature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

10.2 Implicature vs entailment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2010.3 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2010.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2010.5 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2110.6 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2110.7 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

11 Inductive functional programming 2211.1 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

12 Inductive probability 2312.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

12.1.1 Minimum description/message length . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2412.1.2 Inference based on program complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Page 4: Inference 1

CONTENTS iii

12.1.3 Universal artificial intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2512.2 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

12.2.1 Comparison to deductive probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2612.2.2 Probability as estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2612.2.3 Combining probability approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

12.3 Probability and information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2712.3.1 Combining information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2712.3.2 The internal language of information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

12.4 Probability and frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2812.4.1 Conditional probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2912.4.2 The frequentest approach applied to possible worlds . . . . . . . . . . . . . . . . . . . . . 2912.4.3 The law of total of probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3012.4.4 Alternate possibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3012.4.5 Negation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3112.4.6 Implication and condition probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

12.5 Bayesian hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3212.5.1 Set of hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

12.6 Boolean inductive inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3312.6.1 Generalization and specialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3312.6.2 Newton’s use of induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3312.6.3 Probabilities for inductive inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

12.7 Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3512.7.1 Derivation of inductive probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3512.7.2 A model for inductive inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

12.8 Key people . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3912.9 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3912.10References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4012.11External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

13 Inference 4113.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

13.1.1 Example for definition #2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4213.2 Incorrect inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4213.3 Automatic logical inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

13.3.1 Example using Prolog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4313.3.2 Use with the semantic web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4313.3.3 Bayesian statistics and probability logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4313.3.4 Nonmonotonic logic[2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

13.4 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4413.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4513.6 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4513.7 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Page 5: Inference 1

iv CONTENTS

14 Inference engine 4714.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4714.2 Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4814.3 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4814.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

15 Inference objection 5015.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5015.2 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

16 Logical hexagon 5316.1 Summary of relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5416.2 Interpretations of the logical hexagon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

16.2.1 Modal logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5416.3 Further extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5516.4 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5516.5 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5516.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

17 Material inference 5617.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5617.2 Material inferences vs. enthymemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5617.3 Non-monotonic inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5617.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5717.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

18 Resolution inference 5818.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5818.2 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

19 Rule of inference 6019.1 The standard form of rules of inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6019.2 Axiom schemas and axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6119.3 Example: Hilbert systems for two propositional logics . . . . . . . . . . . . . . . . . . . . . . . . 6119.4 Admissibility and derivability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6219.5 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6219.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

20 Scalar implicature 6420.1 Origin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6420.2 Examples of scalar implicature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6420.3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6520.4 Endnotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6520.5 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Page 6: Inference 1

CONTENTS v

21 Solomonoff’s theory of inductive inference 6721.1 Origin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

21.1.1 Philosophical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6721.1.2 Mathematical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

21.2 Modern applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6721.2.1 Artificial intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6821.2.2 Turing machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

21.3 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6921.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6921.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7021.6 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

22 Square of opposition 7222.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7222.2 The problem of existential import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7522.3 Modern squares of opposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7522.4 Logical hexagons and other bi-simplexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7622.5 Square of opposition (or logical square) and modal logic . . . . . . . . . . . . . . . . . . . . . . . 7622.6 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7722.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7722.8 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

23 Strong inference 7823.1 The single hypothesis problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7823.2 Strong Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7823.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7823.4 Strong inference plus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7823.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

24 Type inference 8024.1 Nontechnical explanation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8024.2 Technical description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8124.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8124.4 Hindley–Milner type inference algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8224.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8224.6 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

25 Uncertain inference 8325.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8325.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8325.3 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8425.4 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8425.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

Page 7: Inference 1

vi CONTENTS

26 Veridicality 8526.1 Veridicality in semantic theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

26.1.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8526.1.2 Nonveridical operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8626.1.3 Downward entailment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8626.1.4 Non-monotone quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8626.1.5 Hardly and barely . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8626.1.6 Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8726.1.7 Future . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8726.1.8 Habitual aspect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8726.1.9 Generic sentences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8726.1.10 Modal verbs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8726.1.11 Imperatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8826.1.12 Protasis of conditionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8826.1.13 Directive intensional verbs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

26.2 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8826.3 Text and image sources, contributors, and licenses . . . . . . . . . . . . . . . . . . . . . . . . . . 89

26.3.1 Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8926.3.2 Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9026.3.3 Content license . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Page 8: Inference 1

Chapter 1

Adverse inference

Adverse inference is a legal inference, adverse to the concerned party, drawn from silence or absence of requestedevidence. It is part of evidence codes based on common law in various countries.According to Lawvibe, “the 'adverse inference' can be quite damning at trial. Essentially, when plaintiffs try to presentevidence on a point essential to their case and can’t because the document has been destroyed (by the defendant),the jury can infer that the evidence would have been adverse to (the defendant), and adopt the plaintiff’s reasonableinterpretation of what the document would have said...” [1]

The United States Court of Appeals for the Eighth Circuit pointed out in 2004, in a case involving spoliation (de-struction) of evidence, that "...the giving of an adverse inference instruction often terminates the litigation in that it is'too difficult a hurdle' for the spoliating party to overcome. The court therefore concluded that the adverse inferenceinstruction is an 'extreme' sanction that should 'not be given lightly'...”. [2]

This rule applies not only to evidence which is destroyed, but also to evidence which exists but the party refuses toproduce, and to evidence which the party has under his control, and which is not produced. See Notice to produce.This adverse inference is based upon the presumption that the party who controls the evidence would have producedit, if it had been supportive of his/her position.It can also apply to a witness who is known to exist but which the party refuses to identify or produce.After a change in the law in 1994 the right to silence under English law was curtailed because the court and jury wereallowed to draw adverse inference from such a silence.[3] Under English law when the police caution someone they say“You do not have to say anything. But it may harm your defence if you do not mention, when questioned, somethingwhich you later rely on in court.” because under English law the court and jury can draw an adverse inference fromfact that someone did not mention a defence when given the chance to do so when charged with an offence.[3][4]

1.1 References[1] Virgin Gets Hammered by Adverse Inference, LawVibe.com, April 4, 2007.

[2] Morris v. Union Pacific R. R., 373 F.3d 896, 900 (8th Cir.2004)

[3] Baksi, Catherine (24 May 2012), “Going “no comment": a delicate balancing act”, Law Society Gazette

[4] CPP (26 September 2014), Adverse Inferences, Crown Prosecution Service

1

Page 9: Inference 1

Chapter 2

Arbitrary inference

In clinical psychology, arbitrary inference is a type of cognitive bias in which a person quickly draws a conclusionwithout the requisite evidence.[1] It commonly appears in Aaron Beck's work in cognitive therapy.

2.1 See also• Aaron T. Beck

• Clinical Psychology

• Cognitive bias

• Cognitive therapy

• Jumping to conclusions

2.2 References[1] Sundberg, Norman (2001). Clinical Psychology: Evolving Theory, Practice, and Research. Englewood Cliffs: Prentice Hall.

ISBN 0-13-087119-2.

2

Page 10: Inference 1

Chapter 3

Biological network inference

Biological network inference is the process of making inferences and predictions about biological networks.

3.1 Biological networks

In a topological sense, a network is a set of nodes and a set of directed or undirected edges between the nodes. Manytypes of biological networks exist, including transcriptional, signalling and metabolic. Few such networks are knownin anything approaching their complete structure, even in the simplest bacteria. Still less is known on the parametersgoverning the behavior of such networks over time, how the networks at different levels in a cell interact, and how topredict the complete state description of a eukaryotic cell or bacterial organism at a given point in the future. Systemsbiology, in this sense, is still in its infancy.There is great interest in network medicine for the modelling biological systems. This article focuses on a necessaryprerequisite to dynamic modeling of a network: inference of the topology, that is, prediction of the “wiring diagram”of the network. More specifically, we focus here on inference of biological network structure using the growing setsof high-throughput expression data for genes, proteins, and metabolites. Briefly, methods using high-throughput datafor inference of regulatory networks rely on searching for patterns of partial correlation or conditional probabilitiesthat indicate causal influence.[1][2] Such patterns of partial correlations found in the high-throughput data, possiblycombined with other supplemental data on the genes or proteins in the proposed networks, or combined with otherinformation on the organism, form the basis upon which such algorithms work. Such algorithms can be of use ininferring the topology of any network where the change in state of one node can affect the state of other nodes.

3.1.1 Transcriptional regulatory networks

Genes are the nodes and the edges are directed. A gene serves as the source of a direct regulatory edge to a targetgene by producing an RNA or protein molecule that functions as a transcriptional activator or inhibitor of the targetgene. If the gene is an activator, then it is the source of a positive regulatory connection; if an inhibitor, then it is thesource of a negative regulatory connection. Computational algorithms take as primary input data measurements ofmRNA expression levels of the genes under consideration for inclusion in the network, returning an estimate of thenetwork topology. Such algorithms are typically based on linearity, independence or normality assumptions, whichmust be verified on a case-by-case basis.[3] Clustering or some form of statistical classification is typically employed toperform an initial organization of the high-throughput mRNA expression values derived from microarray experiments,in particular to select sets of genes as candidates for network nodes.[4] The question then arises: how can the clusteringor classification results be connected to the underlying biology? Such results can be useful for pattern classification– for example, to classify subtypes of cancer, or to predict differential responses to a drug (pharmacogenomics). Butto understand the relationships between the genes, that is, to more precisely define the influence of each gene on theothers, the scientist typically attempts to reconstruct the transcriptional regulatory network. This can be done by dataintegration in dynamic models supported by background literature, or information in public databases, combined withthe clustering results.[5] The modelling can be done by a Boolean network, by Ordinary differential equations or Linearregression models, e.g. Least-angle regression, by Bayesian network or based on Information theory approaches.[6]

For instance it can be done by the application of a correlation-based inference algorithm, as will be discussed below,

3

Page 11: Inference 1

4 CHAPTER 3. BIOLOGICAL NETWORK INFERENCE

an approach which is having increased success as the size of the available microarray sets keeps increasing [1][7][8]

3.1.2 Signal transduction

Signal transduction networks (very important in the biology of cancer). Proteins are the nodes and directed edgesrepresent interaction in which the biochemical conformation of the child is modified by the action of the parent (e.g.mediated by phosphorylation, ubiquitylation, methylation, etc.). Primary input into the inference algorithm would bedata from a set of experiments measuring protein activation / inactivation (e.g., phosphorylation / dephosphorylation)across a set of proteins. Inference for such signalling networks is complicated by the fact that total concentrationsof signalling proteins will fluctuate over time due to transcriptional and translational regulation. Such variation canlead to statistical confounding. Accordingly, more sophisticated statistical techniques must be applied to analyse suchdatasets.[9]

3.1.3 Metabolic

Metabolite networks. Metabolites are the nodes and the edges are directed. Primary input into an algorithm wouldbe data from a set of experiments measuring metabolite levels.

3.1.4 Protein-protein interaction

Protein-protein interaction networks are also under very active study. However, reconstruction of these networksdoes not use correlation-based inference in the sense discussed for the networks already described (interaction doesnot necessarily imply a change in protein state), and a description of such interaction network reconstruction is leftto other articles.

3.2 See also• Cytoscape tool

• Bayesian probability

• Network medicine

3.3 References[1] Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, Allison KR, The DREAM5 Consortium, Kellis M,

Collins JJ, Stolovitzky G (2012). “Wisdom of crowds for robust gene network inference”. Nature Methods 9 (8): 796–804.doi:10.1038/nmeth.2016. PMC 3512113. PMID 22796662.

[2] Sprites, P; Glymour, C; Scheines, R (2000). Causation, Prediction, and Search: Adaptive Computation and Machine Learn-ing (2nd ed.). MIT Press.

[3] Oates, C.J. and Mukherjee, S.; Mukherjee (2012). “Network Inference and Biological Dynamics”. To appear in Ann.Appl. Stat. arXiv 1112: 1047. arXiv:1112.1047. Bibcode:2011arXiv1112.1047O.

[4] Guthke, R et al. (2005). “Dynamic network reconstruction from gene expression data applied to immune response duringbacterial infection.”. Bioinformatics 21 (8): 1626–34. doi:10.1093/bioinformatics/bti226. PMID 15613398.

[5] Hecker, M et al. (2009). “Gene regulatory network inference: Data integration in dynamic models - A review.”. Biosystems96 (1): 86–103. doi:10.1016/j.biosystems.2008.12.004. PMID 19150482.

[6] van Someren, E et al. (2002). “Genetic network modeling.”. Pharmacogenomics 3 (4): 507–525. doi:10.1517/14622416.3.4.507.PMID 12164774.

[7] Faith, JJ et al. (2007). “Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Com-pendium of Expression Profiles”. PLoS Biology 5 (1): 54–66. doi:10.1371/journal.pbio.0050008. PMC 1764438. PMID17214507.

Page 12: Inference 1

3.3. REFERENCES 5

[8] Hayete, B; Gardner, TS; Collins, JJ (2007). “Size matters: network inference tackles the genome scale”. Molecular SystemsBiology 3 (1): 77. doi:10.1038/msb4100118. PMC 1828748. PMID 17299414.

[9] Oates, C.J. and Mukherjee, S. (2012). “Structural inference using nonlinear dynamics”. CRiSM Working Paper 12 (7).

Page 13: Inference 1

Chapter 4

Constraint inference

In constraint satisfaction, constraint inference is a relationship between constraints and their consequences. A set ofconstraints D entails a constraint C if every solution to D is also a solution to C . In other words, if V is a valuationof the variables in the scopes of the constraints in D and all constraints in D are satisfied by V , then V also satisfiesthe constraint C .Some operations on constraints produce a new constraint that is a consequence of them. Constraint compositionoperates on a pair of binary constraints ((x, y), R) and ((y, z), S) with a common variable. The composition of suchtwo constraints is the constraint ((x, z), Q) that is satisfied by every evaluation of the two non-shared variables forwhich there exists a value of the shared variable y such that the evaluation of these three variables satisfies the twooriginal constraints ((x, y), R) and ((y, z), S) .Constraint projection restricts the effects of a constraint to some of its variables. Given a constraint (t, R) itsprojection to a subset t′ of its variables is the constraint (t′, R′) that is satisfied by an evaluation if this evaluation canbe extended to the other variables in such a way the original constraint (t, R) is satisfied.Extended composition is similar in principle to composition, but allows for an arbitrary number of possibly non-binary constraints; the generated constraint is on an arbitrary subset of the variables of the original constraints. Givenconstraints C1, . . . , Cm and a list A of their variables, the extended composition of them is the constraint (A,R)where an evaluation of A satisfies this constraint if it can be extended to the other variables so that C1, . . . , Cm areall satisfied.

4.1 See also• Constraint satisfaction problem

4.2 References• Dechter, Rina (2003). Constraint processing. Morgan Kaufmann. ISBN 1-55860-890-7

• Apt, Krzysztof (2003). Principles of constraint programming. Cambridge University Press. ISBN 0-521-82583-0

• Marriott, Kim; Peter J. Stuckey (1998). Programming with constraints: An introduction. MIT Press. ISBN0-262-13341-5

6

Page 14: Inference 1

Chapter 5

Correspondent inference theory

Correspondent inference theory is a psychological theory proposed by Edward E. Jones and Keith Davis (1965)that “systematically accounts for a perceiver’s inferences about what an actor was trying to achieve by a particularaction.” [1] The purpose of this theory is to explain why people make internal or external attributions. People comparetheir actions with alternative actions to evaluate the choices that they have made, and by looking at various factorsthey can decide if their behaviour was caused by an internal disposition. The covariation model is used within this,more specifically that the degree in which one attributes behavior to the person as opposed to the situation. Thesefactors are; does the person have a choice in the partaking in the action, is their behavior expected by their social role,and is their behavior consequence of their normal behavior?

5.1 Attributing intention

The problem of accurately defining intentions is a difficult one. For every observed act, there are a multitude ofpossible motivations. If a person buys someone a drink in the pub, he may be trying to curry favour, his friend mayhave bought him a drink earlier, or he may be doing a favour for a friend with no cash.The work done by Jones and Davis only deals with how people make attributions to the person; they do not deal withhow people make attributions about situational or external causes.Jones and Davis make the assumption that, in order to infer that any effects of an action were intended, the perceivermust believe that (1) the actor knew the consequences of the actions (e.g., the technician who pushed that button atChernobyl did not know the consequences of that action), (2) the actor had the ability to perform the action (couldLee Harvey Oswald really have shot John Kennedy?), and (3) the actor had the intention to perform the action.

5.2 Non-Common effects

The consequences of a chosen action must be compared with the consequences of possible alternative actions. Thefewer effects the possible choices have in common, the more confident one can be in inferring a correspondent dis-position. Or, put another way, the more distinctive the consequences of a choice, the more confidently you can inferintention and disposition.Suppose you are planning to go on a postgraduate course, and you short-list two colleges - University College Londonand the London School of Economics. You choose UCL rather than the LSE. What can the social perceiver learnfrom this? First there are a lot of common effects - urban environment, same distance from home, same examsystem, similar academic reputation, etc. These common effects do not provide the perceiver with any clues aboutyour motivation. But if the perceiver believes that UCL has better sports facilities, or easier access to the UniversityLibrary then these non-common or unique effects which can provide a clue to your motivation. But, suppose you hadshort-listed UCL and University of Essex and you choose UCL. Now the perceiver is faced with a number of non-common effects; size of city; distance from home; academic reputation; exam system. The perceiver would then bemuch less confident about inferring a particular intention or disposition when there are a lot of non-common effects.The fewer the non-common effects, the more certain the attribution of intent.

7

Page 15: Inference 1

8 CHAPTER 5. CORRESPONDENT INFERENCE THEORY

5.3 Low-Social desirability

People usually intend socially desirable outcomes, hence socially desirable outcomes are not informative about aperson’s intention or disposition. The most that you can infer is that the person is normal - which is not sayinganything very much. But socially undesirable actions are more informative about intentions & dispositions. Supposeyou asked a friend for a loan of £1 and it was given (a socially desirable action) - the perceiver couldn't say a great dealabout your friend’s kindness or helpfulness because most people would have done the same thing. If, on the otherhand, the friend refused to lend you the money (a socially undesirable action), the perceiver might well feel that yourfriend is rather stingy, or even miserly.In fact, social desirability - although an important influence on behaviour - is really only a special case of the moregeneral principle that behaviour which deviates from the normal, usual, or expected is more informative about aperson’s disposition than behaviour that conforms to the normal, usual, or expected. So, for example, when peopledo not conform to group pressure we can be more certain that they truly believe the views they express than peoplewho conform to the group. Similarly, when people in a particular social role (e.g. doctor, teacher, salesperson, etc.)behave in ways that are not in keeping with the role demands, we can be more certain about what they are really likethan when people behave in role.

5.4 Expectancies

Only behaviours that disconfirm expectancies are truly informative about an actor. There are two types of expectancy.Category-based expectancies are those derived from our knowledge about particular types or groups of people. Forexample, if you were surprised to hear a wealthy businessman extolling the virtues of socialism, your surprise wouldrest on the expectation that businessmen (a category of people) are not usually socialist.Target-based expectancies derive from knowledge about a particular person. To know that a person is a supporter ofMargaret Thatcher sets up certain expectations and associations about their beliefs and character.

5.5 Choice

Another factor in inferring a disposition from an action is whether the behaviour of the actor is constrained by situ-ational forces or whether it occurs from the actor’s choice. If you were assigned to argue a position in a classroomdebate (e.g. for or against Neoliberalism), it would be unwise of your audience to infer that your statements in thedebate reflect your true beliefs - because you did not choose to argue that particular side of the issue. If, however,you had chosen to argue one side of the issue, then it would be appropriate for the audience to conclude that yourstatements reflect your true beliefs.Although choice ought to have an important effect on whether or not people make correspondent inferences, researchshows that people do not take choice sufficiently into account when judging another person’s attributes or attitudes.There is a tendency for perceivers to assume that when an actor engages in an activity, such as stating a point of viewor attitude, the statements made are indicative of the actor’s true beliefs, even when there may be clear situationalforces affecting the behaviour. In fact, earlier, psychologists had foreseen that something like this would occur; theythought that the actor-act relation was so strong - like a perceptual Gestalt - that people would tend to over-attributeactions to the actor even when there are powerful external forces on the actor that could account for the behaviour.

5.6 See also

• Edward E. Jones

• Attribution theory

• Revealed preferences

Page 16: Inference 1

5.7. REFERENCES 9

5.7 References[1] Berkowitz, Leonard (1965). Advances in Experimental Social Psychology Vol 2, p.222. Academic Press, . ISBN 978-0-12-

015202-5.

5.8 External links• Gilbert, D. T. (1998). Speeding with Ned: A personal view of the correspondence bias. In J. M. Darley &

J. Cooper (Eds.), Attribution and social interaction: The legacy of E. E. Jones. Washington, DC: APA Press.PDF.

Page 17: Inference 1

Chapter 6

Deep inference

Deep inference names a general idea in structural proof theory that breaks with the classical sequent calculus bygeneralising the notion of structure to permit inference to occur in contexts of high structural complexity. The termdeep inference is generally reserved for proof calculi where the structural complexity is unbounded; in this article wewill use non-shallow inference to refer to calculi that have structural complexity greater than the sequent calculus,but not unboundedly so, although this is not at present established terminology.Deep inference is not important in logic outside of structural proof theory, since the phenomena that lead to theproposal of formal systems with deep inference are all related to the cut-elimination theorem. The first calculus ofdeep inference was proposed by Kurt Schütte,[1] but the idea did not generate much interest at the time.Nuel Belnap proposed display logic in an attempt to characterise the essence of structural proof theory. The calculusof structures was proposed in order to give a cut-free characterisation of noncommutative logic.

6.1 Notes[1] Kurt Schütte. Proof Theory. Springer-Verlag, 1977.

6.2 Further reading• Kai Brünnler, “Deep Inference and Symmetry in Classical Proofs” (Ph.D. thesis 2004) , also published in book

form by Logos Verlag (ISBN 978-3-8325-0448-9).

• Deep Inference and the Calculus of Structures Intro and reference web page about ongoing research in deepinference.

10

Page 18: Inference 1

Chapter 7

Dictum de omni et nullo

In Aristotelean logic, dictum de omni et nullo (Latin: “the maxim of all and none”) is the principle that whatever isaffirmed or denied of a whole kind K may be affirmed or denied (respectively) of any subkind of K. This principle isfundamental to syllogistic logic in the sense that all valid syllogistic argument forms are reducible to applications ofthe two constituent principles dictum de omni and dictum de nullo.[1]

Dictum de omni (sometimes misinterpreted as universal instantiation) is the principle that whatever is universallyaffirmed of a kind is affirmable as well for any subkind of that kind.Example:

(1) Dogs are mammals.(2) Mammals have livers.Therefore (3) dogs have livers.

Premise (1) states that “dog” is a subkind of the kind “mammal”.Premise (2) is a (universal affirmative) claim about the kind “mammal”.Statement (3) concludes that what is true of the kind “mammal” is true of the subkind “dog”.Dictum de nullo is the related principle that whatever is denied of a kind is likewise denied of any subkind of thatkind.Example:

(1) Dogs are mammals.(4) Mammals do not have gills.Therefore (5) dogs do not have gills.

Premise (1) states that “dog” is a subkind of the kind “mammal”.Premise (4) is a (universal negative) claim about the kind “mammal”.Statement (5) concludes that what is denied of the kind “mammal” is denied of the subkind “dog”.Each of these two principles is an instance of a valid argument form known as universal hypothetical syllogism in first-order predicate logic. In Aristotelean syllogistic, they correspond respectively to the two argument forms, Barbaraand Celarent.

7.1 See also• Aristotle

• Syllogism

• Term logic

• Class (philosophy)

• Class (set theory)

11

Page 19: Inference 1

12 CHAPTER 7. DICTUM DE OMNI ET NULLO

• Natural kind

• Type (metaphysics)

• Downward entailing

• Monotonic function

7.2 References• Aristotle, Prior Analytics, 24b, 28-30.

7.3 Notes[1] John Stuart Mill (15 January 2001). System of Logic Ratiocinative and Inductive: Being a Connected View of the Principles

of Evidence and the Methods of Scientific Investigation. Elibron.com. p. 114. ISBN 978-1-4021-8157-3. Retrieved 6March 2011.

7.4 External links• Logical Form (Stanford Encyclopedia of Philosophy)

Page 20: Inference 1

Chapter 8

Downward entailing

In linguistic semantics, a downward entailing (DE) propositional operator is one that denotes a monotone decreasingfunction. A downward entailing operator reverses the relation of semantic strength among expressions. An expressionlike “run fast” is semantically stronger than the expression “run” since “run fast” is true of fewer things than the latter.Thus the proposition “John ran fast” entails the proposition “John ran”.Examples of DE contexts include “not”, “nobody”, “few people”, “at most two boys”. They reverse the entailmentrelation of sentences formed with the predicates “run fast” and “run”, for example. The proposition “Nobody ran”entails that “Nobody ran fast”. The proposition “At most two boys ran” entails that “At most two boys ran fast”.Conversely, an upward entailing operator is one that preserves the relation of semantic strength among a set ofexpressions (for example, “more”). A context that is neither downward nor upward entailing is non-monotone, suchas “exactly”.Ladusaw (1980) proposed that downward entailment is the property that licenses polarity items. Indeed, “Nobodysaw anything“ is downward entailing and admits the negative polarity item anything, while * “I saw anything” isungrammatical (the upward entailing context does not license such a polarity item). This approach explains many butnot all typical cases of polarity item sensitivity. Subsequent attempts to describe the behavior of polarity items relyon a broader notion of nonveridicality.

8.1 Strawson-DE

Downward entailment does not explain the licensing of any in certain contexts such as with only:

Only John ate any vegetables for breakfast.

This is not a downward entailing context because the above proposition does not entail “Only John ate kale forbreakfast” (John may have eaten spinach, for example).Von Fintel (1999) claims that although only does not exhibit the classical DE pattern, it can be shown to be DE ina special way. He defines a notion of Strawson-DE for expressions that come with presuppositions. The reasoningscheme is as follows:

1. P → Q

2. [[ only John ]] (P) is defined.

3. [[ only John ]] (Q) is true.

4. Therefore, [[ only John ]] (P) is true.

Here, (2) is the intended presupposition. For example:

1. Kale is a vegetable.

13

Page 21: Inference 1

14 CHAPTER 8. DOWNWARD ENTAILING

2. Somebody ate kale for breakfast.

3. Only John ate any vegetables for breakfast.

4. Therefore, only John ate kale for breakfast.

Hence only is a Strawson-DE and therefore licenses any.Giannakidou (2002) argues that Strawson-DE allows not just the presupposition of the evaluated sentence but just anyarbitrary proposition to count as relevant. This results in over-generalization that validates the use if any' in contextswhere it is, in fact, ungrammatical, such as clefts, preposed exhaustive focus, and each/both:

* It was John who talked to anybody.* JOHN talked to anybody.* Each student who saw anything reported to the Dean.* Both students who saw anything reported to the Dean.

8.2 See also• Entailment (pragmatics)

• Veridicality

• Polarity item

8.3 References• Ladusaw, William (1980). Polarity Sensitivity as Inherent Scope Relations. Garland, NY.

• Von Fintel, Kai (1999). “NPI-Licensing, Strawson-Entailment, and Context-Dependency”. Journal of Seman-tics (16): 97–148.

• Giannakidou, Anastasia (2002). “Licensing and sensitivity in polarity items: from downward entailment tononveridicality”. In Maria Andronis; Anne Pycha; Keiko Yoshimura. CLS 38: Papers from the 38th AnnualMeeting of the Chicago Linguistic Society, Parasession on Polarity and Negation. Retrieved 2011-12-15.

Page 22: Inference 1

Chapter 9

Grammar induction

Grammar induction, also known as grammatical inference or syntactic pattern recognition, refers to the process inmachine learning of learning a formal grammar (usually as a collection of re-write rules or productions or alternativelyas a finite state machine or automaton of some kind) from a set of observations, thus constructing a model whichaccounts for the characteristics of the observed objects. More generally, grammatical inference is that branch ofmachine learning where the instance space consists of discrete combinatorial objects such as strings, trees and graphs.There is now a rich literature on learning different types of grammar and automata, under various different learningmodels and using various different methodologies.

9.1 Grammar Classes

Grammatical inference has often been very focused on the problem of learning finite state machines of various types(see the article Induction of regular languages for details on these approaches), since there have been efficient algo-rithms for this problem since the 1980s.More recently these approaches have been extended to the problem of inference of context-free grammars and richerformalisms, such as multiple context-free grammars and parallel multiple context-free grammars. Other classes ofgrammars for which grammatical inference has been studied are contextual grammars, and pattern languages.

9.2 Learning Models

The simplest form of learning is where the learning algorithm merely receives a set of examples drawn from thelanguage in question, but other learning models have been studied. One frequently studied alternative is the casewhere the learner can ask membership queries as in the exact query learning model or minimally adequate teachermodel introduced by Angluin.

9.3 Methodologies

There are a wide variety of methods for grammatical inference. Two of the classic sources are Fu (1977) and Fu(1982). Duda, Hart & Stork (2001) also devote a brief section to the problem, and cite a number of references. Thebasic trial-and-error method they present is discussed below. For approaches to infer subclasses of regular languagesin particular, see Induction of regular languages. A more recent textbook is de la Higuera (2010) [1] which coversthe theory of grammatical inference of regular languages and finite state automata. D'Ulizia, Ferri and Grifoni [2]

provide a survey that explores grammatical inference methods for natural languages.

15

Page 23: Inference 1

16 CHAPTER 9. GRAMMAR INDUCTION

9.3.1 Grammatical inference by trial-and-error

The method proposed in Section 8.7 of Duda, Hart & Stork (2001) suggests successively guessing grammar rules(productions) and testing them against positive and negative observations. The rule set is expanded so as to be able togenerate each positive example, but if a given rule set also generates a negative example, it must be discarded. Thisparticular approach can be characterized as “hypothesis testing” and bears some similarity to Mitchel’s version spacealgorithm. The Duda, Hart & Stork (2001) text provide a simple example which nicely illustrates the process, butthe feasibility of such an unguided trial-and-error approach for more substantial problems is dubious.

9.3.2 Grammatical inference by genetic algorithms

Grammatical Induction using evolutionary algorithms is the process of evolving a representation of the grammar ofa target language through some evolutionary process. Formal grammars can easily be represented as a tree structureof production rules that can be subjected to evolutionary operators. Algorithms of this sort stem from the geneticprogramming paradigm pioneered by John Koza. Other early work on simple formal languages used the binary stringrepresentation of genetic algorithms, but the inherently hierarchical structure of grammars couched in the EBNFlanguage made trees a more flexible approach.Koza represented Lisp programs as trees. He was able to find analogues to the genetic operators within the stan-dard set of tree operators. For example, swapping sub-trees is equivalent to the corresponding process of geneticcrossover, where sub-strings of a genetic code are transplanted into an individual of the next generation. Fitness ismeasured by scoring the output from the functions of the lisp code. Similar analogues between the tree structured lisprepresentation and the representation of grammars as trees, made the application of genetic programming techniquespossible for grammar induction.In the case of Grammar Induction, the transplantation of sub-trees corresponds to the swapping of production rulesthat enable the parsing of phrases from some language. The fitness operator for the grammar is based upon somemeasure of how well it performed in parsing some group of sentences from the target language. In a tree representationof a grammar, a terminal symbol of a production rule corresponds to a leaf node of the tree. Its parent nodescorresponds to a non-terminal symbol (e.g. a noun phrase or a verb phrase) in the rule set. Ultimately, the root nodemight correspond to a sentence non-terminal.

9.3.3 Grammatical inference by greedy algorithms

Like all greedy algorithms, greedy grammar inference algorithms make, in iterative manner, decisions that seem tobe the best at that stage. These made decisions deal usually with things like the making of a new or the removing ofthe existing rules, the choosing of the applied rule or the merging of some existing rules. Because there are severalways to define 'the stage' and 'the best', there are also several greedy grammar inference algorithms.These context-free grammar generating algorithms make the decision after every read symbol:

• Lempel-Ziv-Welch algorithm creates a context-free grammar in a deterministic way such that it is necessaryto store only the start rule of the generated grammar.

• Sequitur and its modifications.

These context-free grammar generating algorithms first read the whole given symbol-sequence and then start to makedecisions:

• Byte pair encoding and its optimizations.

9.3.4 Distributional Learning

A more recent approach is based on Distributional Learning. Algorithms using these approaches have been appliedto learning context-free grammars and mildly context-sensitive languages and have been proven to be correct andefficient for large subclasses of these grammars.[3]

Page 24: Inference 1

9.4. APPLICATIONS 17

9.3.5 Learning of Pattern languages

Angluin defines a pattern to be a string of constant symbols from Σ and variable symbols from a disjoint set. Thelanguage of such a pattern is the set of all its nonempty ground instances i.e. all strings resulting from consistentreplacement of its variable symbols by nonempty strings of constant symbols.[note 1] A pattern is called descriptivefor a finite input set of strings if its language is minimal (with respect to set inclusion) among all pattern languagessubsuming the input set.Angluin gives a polynomial algorithm to compute, for a given input string set, all descriptive patterns in one variablex.[note 2] To this end, she builds an automaton representing all possibly relevant patterns; using sophisticated argumentsabout word lengths, which rely on x being the only variable, the state count can be drastically reduced.[4]

Erlebach et al. give a more efficient version of Angluin’s pattern learning algorithm, as well as a parallelized version.[5]

Arimura et al. show that a language class obtained from limited unions of patterns can be learned in polynomialtime.[6]

9.3.6 Pattern theory

Pattern theory, formulated by Ulf Grenander,[7] is a mathematical formalism to describe knowledge of the world aspatterns. It differs from other approaches to artificial intelligence in that it does not begin by prescribing algorithmsand machinery to recognize and classify patterns; rather, it prescribes a vocabulary to articulate and recast the patternconcepts in precise language.In addition to the new algebraic vocabulary, its statistical approach was novel in its aim to:

• Identify the hidden variables of a data set using real world data rather than artificial stimuli, which was com-monplace at the time.

• Formulate prior distributions for hidden variables and models for the observed variables that form the verticesof a Gibbs-like graph.

• Study the randomness and variability of these graphs.

• Create the basic classes of stochastic models applied by listing the deformations of the patterns.

• Synthesize (sample) from the models, not just analyze signals with it.

Broad in its mathematical coverage, Pattern Theory spans algebra and statistics, as well as local topological and globalentropic properties.

9.4 Applications

The principle of grammar induction has been applied to other aspects of natural language processing, and have beenapplied (among many other problems) to morpheme analysis, and even place name derivations. Grammar inductionhas also been used for lossless data compression and statistical inference via MML and MDL principles.

9.5 See also• Artificial grammar learning• Syntactic pattern recognition

• Inductive inference

• Straight-line grammar

• Kolmogorov complexity• Automatic distillation of structure

• Inductive programming

Page 25: Inference 1

18 CHAPTER 9. GRAMMAR INDUCTION

9.6 Notes[1] The language of a pattern with at least two occurrences of the same variable is not regular due to the pumping lemma.

[2] x may occur several times, but no other variable y may occur

9.7 References[1] de la Higuera, Colin (2010). Grammatical Inference: Learning Automata and Grammars. Cambridge: Cambridge Univer-

sity Press.

[2] D’Ulizia, A., Ferri, F., Grifoni, P. (2011) “A Survey of Grammatical Inference Methods for Natural Language Learning”,Artificial Intelligence Review, Vol. 36, No. 1, pp. 1-27.

[3] Clark and Eyraud (2007) Journal of Machine Learning Research, Ryo Yoshinaka (2011) Theoretical Computer Science

[4] Dana Angluin (1980). “Finding Patterns Common to a Set of Strings” (PDF). Journal of Computer and System Sciences21: 46–62. doi:10.1016/0022-0000(80)90041-0.

[5] T. Erlebach, P. Rossmanith, H. Stadtherr, A. Steger, T. Zeugmann (1997). “Learning One-Variable Pattern LanguagesVery Efficiently on Average, in Parallel, and by Asking Queries”. In M. Li and A. Maruoka. Proc. 8th InternationalWorkshop on Algorithmic Learning Theory — ALT'97. LNAI 1316. Springer. pp. 260–276.

[6] Hiroki Arimura, Takeshi Shinohara, Setsuko Otsuki (1994). “Finding Minimal Generalizations for Unions of PatternLanguages and Its Application to Inductive Inference from Positive Data”. Proc. STACS 11. LNCS 775. Springer. pp.649–660.

[7] Grenander, Ulf, and Michael I. Miller. Pattern theory: from representation to inference. Vol. 1. Oxford: Oxford universitypress, 2007.

• Duda, Richard O.; Hart, Peter E.; Stork, David G. (2001), Pattern Classification (2 ed.), New York: John Wiley& Sons

• Fu, King Sun (1982), Syntactic Pattern Recognition and Applications, Englewood Cliffs, NJ: Prentice-Hall

• Fu, King Sun (1977), Syntactic Pattern Recognition, Applications, Berlin: Springer-Verlag

• Horning, James Jay (1969), A Study of Grammatical Inference (Ph.D. Thesis ed.), Stanford: Stanford UniversityComputer Science Department

• Gold, E. Mark (1967), Language Identification in the Limit (PDF) 10, Information and Control, pp. 447–474

, see also the corresponding Wikipedia article

Page 26: Inference 1

Chapter 10

Implicature

Implicature is a technical term in the pragmatics subfield of linguistics, coined by H. P. Grice, which refers to whatis suggested in an utterance, even though neither expressed nor strictly implied (that is, entailed) by the utterance.[1]

For example, the sentence "Mary had a baby and got married" strongly suggests that Mary had the baby before thewedding, but the sentence would still be strictly true if Mary had her baby after she got married. Further, if we add thequalification "— not necessarily in that order" to the original sentence, then the implicature is cancelled even thoughthe meaning of the original sentence is not altered.“Implicature” is an alternative to "implication,” which has additional meanings in logic and informal language.

10.1 Types of implicature

10.1.1 Conversational implicature

Paul Grice identified three types of general conversational implicatures:1. The speaker deliberately flouts a conversational maxim to convey an additional meaning not expressed literally.For instance, a speaker responds to the question “How did you like the guest lecturer?" with the following utterance:

Well, I’m sure he was speaking English.

If the speaker is assumed to be following the cooperative principle,[2] in spite of flouting the Maxim of Quantity,then the utterance must have an additional nonliteral meaning, such as: “The content of the lecturer’s speech wasconfusing.”2. The speaker’s desire to fulfill two conflicting maxims results in his or her flouting one maxim to invoke the other.For instance, a speaker responds to the question “Where is John?" with the following utterance:

He’s either in the cafeteria or in his office.

In this case, the Maxim of Quantity and the Maxim of Quality are in conflict. A cooperative speaker does not want tobe ambiguous but also does not want to give false information by giving a specific answer in spite of his uncertainty.By flouting the Maxim of Quantity, the speaker invokes the Maxim of Quality, leading to the implicature that thespeaker does not have the evidence to give a specific location where he believes John is.3. The speaker invokes a maxim as a basis for interpreting the utterance. In the following exchange:

Do you know where I can get some gas?There’s a gas station around the corner.

The second speaker invokes the Maxim of Relevance, resulting in the implicature that “the gas station is open andone can probably get gas there”.

19

Page 27: Inference 1

20 CHAPTER 10. IMPLICATURE

Scalar implicature

According to Grice (1975), another form of conversational implicature is also known as a scalar implicature. Thisconcerns the conventional uses of words like “all” or “some” in conversation.

I ate some of the pie.

This sentence implies “I did not eat all of the pie.” While the statement “I ate some pie” is still true if the entire piewas eaten, the conventional meaning of the word “some” and the implicature generated by the statement is “not all”.

10.1.2 Conventional implicature

Conventional implicature is independent of the cooperative principle and its four maxims. A statement always carriesits conventional implicature.

Donovan is poor but happy.

This sentence implies poverty and happiness are not compatible but in spite of this Donovan is still happy. Theconventional interpretation of the word “but” will always create the implicature of a sense of contrast. So Donovanis poor but happy will always necessarily imply “Surprisingly Donovan is happy in spite of being poor”.

10.2 Implicature vs entailment

This can be contrasted with cases of entailment. For example, the statement “The president was assassinated” notonly suggests that “The president is dead” is true, but requires that it be true. The first sentence could not be true ifthe second were not true; if the president were not dead, then whatever it is that happened to him would not havecounted as a (successful) assassination. Similarly, unlike implicatures, entailments cannot be cancelled; there is noqualification that one could add to “The president was assassinated” which would cause it to cease entailing “Thepresident is dead” while also preserving the meaning of the first sentence.

10.3 See also

• Allofunctional implicature

• Cooperative principle

• Gricean maxims

• Entailment, or implication, in logic

• Entailment (pragmatics)

• Explicature

• Indirect speech act

• Intrinsic and extrinsic properties

• Presupposition

10.4 References[1] Blackburn 1996, p. 189.

[2] Kordić 1991, pp. 89–92.

Page 28: Inference 1

10.5. BIBLIOGRAPHY 21

10.5 Bibliography• Blackburn, Simon (1996). “implicature,” The Oxford Dictionary of Philosophy, Oxford, pp. 188-89.

• P. Cole (1975) “The synchronic and diachronic status of conversational implicature.” In Syntax and Semantics,3: Speech Acts (New York: Academic Press) ed. P. Cole & J. L. Morgan, pp. 257–288. ISBN 0-12-785424-X.

• A. Davison (1975) “Indirect speech acts and what to do with them.” ibid, pp. 143–184.

• G. M. Green (1975) “How to get people to do things with words.” ibid, pp. 107–141. New York: AcademicPress

• H. P. Grice (1975) “Logic and conversation.” ibid. Reprinted in Studies in the Way of Words, ed. H. P. Grice,pp. 22–40. Cambridge, MA: Harvard University Press (1989) ISBN 0-674-85270-2.

• Michael Hancher (1978) “Grice’s “Implicature” and Literary Interpretation: Background and Preface” Twen-tieth Annual Meeting Midwest Modern Language Association

• Kordić, Snježana (1991). “Konverzacijske implikature” [Conversational implicatures]. Suvremena lingvistika(in Serbo-Croatian) 17 (31-32): 87–96. ISSN 0586-0296. Archived from the original (PDF) on 2 September2012. Retrieved 6 September 2012.

• John Searle (1975) “Indirect speech acts.” ibid. Reprinted in Pragmatics: A Reader, ed. S. Davis, pp. 265–277.Oxford: Oxford University Press. (1991) ISBN 0-19-505898-4.

10.6 Further reading• Kent, Bach (2006). “The Top 10 Misconceptions about Implicature” (PDF). in: Birner, B.; Ward, G. A

Festschrift for Larry Horn. Amsterdam: John Benjamins.

10.7 External links• “Implicature” in the Stanford Encyclopedia of Philosophy

• The Top 10 Misconceptions about Implicature by Kent Bach (2005)

Page 29: Inference 1

Chapter 11

Inductive functional programming

Inductive Functional Programming (IFP) is a special kind of inductive programming that uses functional pro-grams as representation for examples, programs and background knowledge. The term is frequently used to make adistinction from inductive logic programming, which uses logic programs.

11.1 See also• Inductive reasoning

• Inductive logic programming

• Inductive programming

• Functional programming

22

Page 30: Inference 1

Chapter 12

Inductive probability

Inductive probability attempts to give the probability of future events based on past events. It is the basis forinductive reasoning, and gives the mathematical basis for learning and the perception of patterns. It is a source ofknowledge about the world.There are three sources of knowledge: inference, communication, and deduction. Communication relays informationfound using other methods. Deduction established new facts based on existing facts. Only inference establishes newfacts from data.The basis of inference is Bayes’ theorem. But this theorem is sometimes hard to apply and understand. The simplermethod to understand inference is in terms of quantities of information.Information describing the world is written in a language. For example a simple mathematical language of proposi-tions may be chosen. Sentences may be written down in this language as strings of characters. But in the computer itis possible to encode these sentences as strings of bits (1s and 0s). Then the language may be encoded so that the mostcommonly used sentences are the shortest. This internal language implicitly represents probabilities of statements.Occam’s razor says the “simplest theory, consistent with the data is most likely to be correct”. The “simplest theory” isinterpreted as the representation of the theory written in this internal language. The theory with the shortest encodingin this internal language is most likely to be correct.

12.1 History

Probability and statistics was focused on probability distributions and tests of significance. Probability was formal,well defined, but limited in scope. In particular its application was limited to situations that could be defined as anexperiment or trial, with a well defined population.Bayes’s theorem is named after Rev. Thomas Bayes 1701–1761. Bayesian inference broadened the application ofprobability to many situations where a population was not well defined. But Bayes’ theorem always depended on priorprobabilities, to generate new probabilities. It was unclear where these prior probabilities should come from.Ray Solomonoff developed algorithmic probability which gave an explanation for what randomness is and how pat-terns in the data may be represented by computer programs, that give shorter representations of the data circa 1964.Chris Wallace and D. M. Boulton developed minimum message length circa 1968. Later Jorma Rissanen developedthe minimum description length circa 1978. These methods allow information theory to be related to probability, ina way that can be compared to the application of Bayes’ theorem, but which give a source and explanation for the roleof prior probabilities.Marcus Hutter combined decision theory with the work of Ray Solomonoff and Andrey Kolmogorov to give a theoryfor the Pareto optimal behavior for an Intelligent agent, circa 1998.

23

Page 31: Inference 1

24 CHAPTER 12. INDUCTIVE PROBABILITY

12.1.1 Minimum description/message length

The program with the shortest length that matches the data is the most likely to predict future data. This is the thesisbehind the Minimum message length[1] and Minimum description length[2] methods.At first sight Bayes’ theorem appears different from the minimimum message/description length principle. At closerinspection it turns out to be the same. Bayes’ theorem is about conditional probabilities. What is the probability thatevent B happens if firstly event A happens?

P (A ∧B) = P (B) · P (A | B) = P (A) · P (B | A)

Becomes in terms of message length L,

L(A ∧B) = L(B) + L(A | B) = L(A) + L(B | A)

What this means is that in describing an event, if all the information is given describing the event then the length ofthe information may be used to give the raw probability of the event. So if the information describing the occurrenceof A is given, along with the information describing B given A, then all the information describing A and B has beengiven.[3] [4]

Overfitting

Overfitting is where the model matches the random noise and not the pattern in the data. For example take thesituation where a curve is fitted to a set of points. If polynomial with many terms is fitted then it can more closelyrepresent the data. Then the fit will be better, and the information needed to describe the deviances from the fittedcurve will be smaller. Smaller information length means more probable.However the information needed to describe the curve must also be considered. The total information for a curve withmany terms may be greater than for a curve with fewer terms, that has not as good a fit, but needs less information todescribe the polynomial.

12.1.2 Inference based on program complexity

Solomonoff’s theory of inductive inference is also inductive inference. A bit string x is observed. Then consider allprograms that generate strings starting with x. Cast in the form of inductive inference, the programs are theories thatimply the observation of the bit string x.The method used here to give probabilities for inductive inference is based on Solomonoff’s theory of inductiveinference.

Detecting patterns in the data

If all the bits are 1, then people infer that there is a bias in the coin and that it is more likely also that the next bit is 1also. This is described as learning from, or detecting a pattern in the data.Such a pattern may be represented by a computer program. A short computer program may be written that producesa series of bits which are all 1. If the length of the program K is L(K) bits then its prior probability is,

P (K) = 2−L(K)

The length of the shortest program that represents the string of bits is called the Kolmogorov complexity.Kolmogorov complexity is not computable. This is related to the halting problem. When searching for the shortestprogram some programs may go into an infinite loop.

Page 32: Inference 1

12.1. HISTORY 25

Considering all theories

The Greek philosopher Epicurus is quoted as saying “If more than one theory is consistent with the observations,keep all theories”.[5]

As in a crime novel all theories must be considered in determining the likely murderer, so with inductive probabilityall programs must be considered in determining the likely future bits arising from the stream of bits.Programs that are already longer than n have no predictive power. The raw (or prior) probability that the pattern ofbits is random (has no pattern) is 2−n .Each program that produces the sequence of bits, but is shorter than the n is a theory/pattern about the bits with aprobability of 2−k where k is the length of the program.The probability of receiving a sequence of bits y after receiving a series of bits x is then the conditional probabilityof receiving y given x, which is the probability of x with y appended, divided by the probability of x. [6] [7] [8]

Universal priors

The programming language effects the predictions of the next bit in the string. The language acts as a prior probability.This is particularly a problem where the programming language codes for numbers and other data types. Intuitivelywe think that 0 and 1 are simple numbers, and that prime numbers are somehow more complex the numbers may befactorized.Using the Kolmogorov complexity gives an unbiased estimate (a universal prior) of the prior probability of a number.As a thought experiment an intelligent agent may be fitted with a data input device giving a series of numbers, afterapplying some transformation function to the raw numbers. Another agent might have the same input device with adifferent transformation function. The agents do not see or know about these transformation functions. Then thereappears no rational basis for preferring one function over another. A universal prior insures that although two agentsmay have different initial probability distributions for the data input, the difference will be bounded by a constant.So universal priors do not eliminate an initial bias, but they reduce and limit it. Whenever we describe an event ina language, either using a natural language or other, the language has encoded in it our prior expectations. So somereliance on prior probabilities are inevitable.A problem arises where an intelligent agents prior expectations interact with the environment to form a self reinforcingfeed back loop. This is the problem of bias or prejudice. Universal priors reduce but do not eliminate this problem.

12.1.3 Universal artificial intelligence

The theory of universal artificial intelligence applies decision theory to inductive probabilities. The theory shows howthe best actions to optimize a reward function may be chosen. The result is a theoretical model of intelligence. [9]

It is a fundamental theory of intelligence, which optimizes the agents behavior in,

• Exploring the environment; performing actions to get responses that broaden the agents knowledge.

• Competing or co-operating with another agent; games.

• Balancing short and long term rewards.

In general no agent will always provide the best actions in all situations. A particular choice made by an agent maybe wrong, and the environment may provide no way for the agent to recover from an initial bad choice. However theagent is Pareto optimal in the sense that no other agent will do better than this agent in this environment, withoutdoing worse in another environment. No other agent may, in this sense, be said to be better.At present the theory is limited by incomputability (the halting problem). Approximations may be used to avoid this.Processing speed and combinatorial explosion remain the primary limiting factors for artificial intelligence.

Page 33: Inference 1

26 CHAPTER 12. INDUCTIVE PROBABILITY

12.2 Probability

Probability is the representation of uncertain or partial knowledge about the truth of statements. Probabilities aresubjective and personal estimates of likely outcomes based on past experience and inferences made from the data.This description of probability may seem strange at first. In natural language we refer to “the probability” that thesun will rise tomorrow. We do not refer to “your probability” that the sun will rise. But in order for inference to becorrectly modeled probability must be personal, and the act of inference generates new posterior probabilities fromprior probabilities.Probabilities are personal because they are conditional on the knowledge of the individual. Probabilities are subjectivebecause they always depend, to some extend, on prior probabilities assigned by the individual. Subjective should notbe taken here to mean vague or undefined.The term intelligent agent is used to refer to the holder of the probabilities. The intelligent agent may be a humanor a machine. If the intelligent agent does not interact with the environment then the probability will converge overtime to the frequency of the event.If however the agent uses the probability to interact with the environment there may be a feedback, so that two agentsin the identical environment starting with only slightly different priors, end up with completely different probabilities.In this case optimal decision theory as in Marcus Hutter’s Universal Artificial Intelligence will give Pareto optimalperformance for the agent. This means that no other intelligent agent could do better in one environment withoutdoing worse in another environment.

12.2.1 Comparison to deductive probability

In deductive probability theories, probabilities are absolutes, independent of the individual making the assessment.But deductive probabilities are based on,

• Shared knowledge.

• Assumed facts, that should be inferred from the data.

For example in a trial the participants are aware the outcome of all the previous history of trials. They also assumethat each outcome is equally probable. Together this allows a single unconditional value of probability to be defined.But in reality each individual does not have the same information. And in general the probability of each outcome isnot equal. The dice may be loaded, and this loading needs to be inferred from the data.

12.2.2 Probability as estimation

The principle of indifference has played a key role in probability theory. It says that if N statements are symmetricso that one condition cannot be preferred over another then all statements are equally probable.[10]

Taken seriously, in evaluating probability this principle leads to contradictions. Suppose there are 3 bags of gold inthe distance and you are asked to select one. Then because of the distance you cant see the bag sizes. You estimateusing the principle of indifference that each bag has equal amounts of gold, and each bag has one third of the gold.Now, while you are not looking, I take one of the bags and divide it into 3 bags. Now there are 5 bags of gold. Theprinciple of indifference now says each bag has one fifth of the gold. A bag that was estimated to have one third ofthe gold is now estimated to have one fifth of the gold.Taken as a value associated with the bag the values are different therefore contradictory. But taken as an estimategiven under a particular scenario, both values are separate estimates given under different circumstances and there isno reason to believe they are equal.Estimates of prior probabilities are particularly suspect. Estimates will be constructed that do not follow any consistentfrequency distribution. For this reason prior probabilities are considered as estimates of probabilities rather thanprobabilities.A full theoretical treatment would associate with each probability,

• The statement

Page 34: Inference 1

12.3. PROBABILITY AND INFORMATION 27

• Prior knowledge

• Prior probabilities

• The estimation procedure used to give the probability.

12.2.3 Combining probability approaches

Inductive probability combines two different approaches to probability.

• Probability and information

• Probability and frequency

Each approach gives a slightly different viewpoint. Information theory is used in relating probabilities to quantitiesof information. This approach is often used in giving estimates of prior probabilities.Frequentist probability defines probabilities as objective statements about how often an event occurs. This approachmay be stretched by defining the trials to be over possible worlds. Statements about possible worlds define events.

12.3 Probability and information

Whereas logic represents only two values; true and false as the values of statement, probability associates a numberbetween 0.0 and 1.0 with each statement. If the probability of a statement is 0 the statement is false. If the probabilityof a statement is 1 the statement is true.In considering some data as a string of bits the prior probabilities for a sequence of 1 and 0s, the probability of 1 and0 is equal. Therefore each extra bit halves the probability of a sequence of bits. This leads to the conclusion that,

P (x) = 2−L(x)

Where

• P (x) is the probability of a string of bits x

• L(x) is the length of the string of bits x.

• 2−L(x) means 1 divided by 2 to the power of the length of the string of bits x.

The prior probability of any statement is calculated from the number of bits needed to state it. See also informationtheory.

12.3.1 Combining information

Two statements A and B may be represented by two separate encodings. Then the length of the encoding is,

L(A ∧B) = L(A) + L(B)

or in terms of probability,

P (A ∧B) = P (A)P (B)

But this law is not always true because there may be a shorter method of encoding B if we assume A. So the aboveprobability law applies only if A and B are “independent”.

Page 35: Inference 1

28 CHAPTER 12. INDUCTIVE PROBABILITY

12.3.2 The internal language of information

The primary use of the information approach to probability is to provide estimates of the complexity of statements.Recall that Occam’s razor states that “All things being equal, the simplest theory is the most likely to be correct”.In order to apply this rule, first there needs to be a definition of what “simplest” means. Information theory definessimplest to mean having the shortest encoding.Knowledge is represented as statements. Each statement is a Boolean expression. Expressions are encoded by afunction that takes a description (as against the value) of the expression and encodes it as a bit string.The length of the encoding of a statement gives an estimate of the probability of a statement. This probability estimatewill often be used as the prior probability of a statement.Technically this estimate is not a probability because it is not constructed from a frequency distribution. The proba-bility estimates given by it do not always obey the law of total of probability. Applying the law of total probabilityto various scenarios will usually give a more accurate probability estimate of the prior probability than the estimatefrom the length of the statement.

Encoding expressions

An expression is constructed from sub expressions,

• Constants (including function identifier).

• Application of functions.

• quantifiers.

A Huffman code must distinguish the 3 cases. The length of each code is based on the frequency of each type of subexpressions.Initially constants are all assigned the same length/probability. Later constants may be assigned a probability using theHuffman code based on the number of uses of the function id in all expressions recorded so far. In using a Huffmancode the goal is to estimate probabilities, not to compress the data.The length of a function application is the length of the function identifier constant plus the sum of the sizes of theexpressions for each parameter.The length of a quantifier is the length of the expression being quantified over.

Distribution of numbers

No explicit representation of natural numbers is given. However natural numbers may be constructed by applying thesuccessor function to 0, and then applying other arithmetic functions. A distribution of natural numbers is impliedby this, based on the complexity of constructing each number.Rational numbers are constructed by the division of natural numbers. The simplest representation has no commonfactors between the numerator and the denominator. This allows the probability distribution of natural numbers maybe extended to rational numbers.

12.4 Probability and frequency

The probability of an event may be interpreted as the frequencies of outcomes where the statement is true dividedby the total number of outcomes. Technically the outcomes may form a continuum the frequency may need to bereplaced with a measure.Events are sets of outcomes. Statements may be related to events. A Boolean statement B about outcomes defines aset of outcomes b,

b = {x : B(x)}

Page 36: Inference 1

12.4. PROBABILITY AND FREQUENCY 29

12.4.1 Conditional probability

Each probability is always associated with the state of knowledge at a particular point in the argument. Probabilitiesbefore an inference are known as prior probabilities, and probabilities after are known as posterior probabilities.Probability depends on the facts known. The truth of a fact limits the domain of outcomes to the outcomes consistentwith the fact. Prior probabilities are the probabilities before a fact is known. Posterior probabilities are after a fact isknown. The posterior probabilities are said to be conditional on the fact. Conditional probabilities are written,

P (B | A)

This means the probability that B is true given that A is true.All probabilities are in some sense conditional. The prior probability of B is,

P (B) = P (B | true)

12.4.2 The frequentest approach applied to possible worlds

In the frequentest approach, probabilities are defined as the ratio of the number of outcomes within an event to thetotal number of outcomes. In the possible world model each possible world is an outcome, and statements aboutpossible worlds define events. The probability of a statement being true is the number of possible worlds divided bythe total number of worlds.The total number of worlds may be infinite. In this case instead of counting the elements of the set a measure mustbe used. In general the cardinality |S|, where S is a set, is a measure.The probability of a statement A being true about possible worlds is then,

P (A) =|{x : A(x)}||x : true|

For a conditional probability.

P (B | A) = |{x : A(x) ∧B(X)}||x : A(x)|

then

P (A ∧B)

=|{x : A(x) ∧B(x)}|

|x : true|

=|{x : A(x) ∧B(x)}|

|{x : A(x)}||{x : A(x)}||x : true|

= P (A)P (B | A)

Using symmetry this equation may be written out as Bayes’ law.

P (A ∧B) = P (A)P (B | A) = P (B)P (A | B)

This law describes the relationship between prior and posterior probabilities when new facts are learnt.Written as quantities of information Bayes’ Theorem becomes,

Page 37: Inference 1

30 CHAPTER 12. INDUCTIVE PROBABILITY

L(A ∧B) = L(A) + L(B | A) = L(B) + L(A | B)

Two statements A and B are said to be independent if knowing the truth of A does not change the probability of B.Mathematically this is,

P (B) = P (B | A)

then Bayes’ Theorem reduces to,

P (A ∧B) = P (A)P (B)

12.4.3 The law of total of probability

For a set of mutually exclusive possibilities Ai , the sum of the posterior probabilities must be 1.

∑i

P (Ai | B) = 1

Substituting using Bayes’ theorem gives the law of total probability

∑i

P (B | Ai)P (Ai) =∑i

P (Ai | B)P (B)

P (B) =∑i

P (B | Ai)P (Ai)

This result is used to give the extended form of Bayes’ theorem,

P (Ai | B) =P (B | Ai)P (Ai)∑j P (B | Aj)P (Aj)

This is the usual form of Bayes’ theorem used in practice, because it guarantees the sum of all the posterior proba-bilities for Ai is 1.

12.4.4 Alternate possibilities

For mutually exclusive possibilities, the probabilities add.

P (A ∨B) = P (A) + P (B) if P (A ∧B) = 0

Using

A ∨B = (A ∧ ¬(A ∧B)) ∨ (B ∧ ¬(A ∧B)) ∨ (A ∧B)

Then the alternatives

A ∧ ¬(A ∧B)

B ∧ ¬(A ∧B)

Page 38: Inference 1

12.4. PROBABILITY AND FREQUENCY 31

A ∧B

are all mutually exclusiveAlso,

(A ∧ ¬(A ∧B)) ∨ (A ∧B) = A

P (A ∧ ¬(A ∧B)) + P (A ∧B) = P (A)

P (A ∧ ¬(A ∧B)) = P (A)− P (A ∧B)

so, putting it all together,

P (A ∨B)

= P ((A ∧ ¬(A ∧B)) ∨ (B ∧ ¬(A ∧B)) ∨ (A ∧B))

= P (A ∧ ¬(A ∧B) + P (B ∧ ¬(A ∧B)) + P (A ∧B)

= P (A)− P (A ∧B) + P (B)− P (A ∧B) + P (A ∧B)

= P (A) + P (B)− P (A ∧B)

12.4.5 Negation

As,

A ∨ ¬A = true

then

P (A) + P (¬A) = 1

12.4.6 Implication and condition probability

Implication is related to conditional probability by the following equation,

A → B ⇐⇒ P (B | A) = 1

Derivation,

A → B

⇐⇒ P (A → B) = 1

⇐⇒ P (A ∧B ∨ ¬A) = 1

⇐⇒ P (A ∧B) + P (¬A) = 1

⇐⇒ P (A ∧B) = P (A)

⇐⇒ P (A) · P (B | A) = P (A)

⇐⇒ P (B | A) = 1

Page 39: Inference 1

32 CHAPTER 12. INDUCTIVE PROBABILITY

12.5 Bayesian hypothesis testing

Bayes’ theorem may be used to estimate the probability of a hypothesis or theory H, given some facts F. The posteriorprobability of H is then

P (H | F ) =P (H)P (F | H)

P (F )

or in terms of information,

P (H | F ) = 2−(L(H)+L(F |H)−L(F ))

By assuming the hypothesis is true, a simpler representation of the statement F may be given. The length of theencoding of this simpler representation is L(F \mid H).L(H) + L(F | H) represents the amount of information needed to represent the facts F, if H is true. L(F) is theamount of information needed to represent F without the hypothesis H. The difference is how much the representationof the facts has been compressed by assuming that H is true. This is the evidence that the hypothesis H is true.If L(F) is estimated from encoding length then the probability obtained will not be between 0 and 1. The valueobtained is proportional to the probability, without being a good probability estimate. The number obtained is some-times referred to as a relative probability, being how much more probable the theory is than not holding the theory.If a full set of mutually exclusive hypothesis that provide evidence is known, a proper estimate may be given for theprior probability P (F ) .

12.5.1 Set of hypothesis

Probabilities may be calculated from the extended form of Bayes’ theorem. Given all mutually exclusive hypothesisHi which give evidence, such that,

L(Hi) + L(F | Hi) < L(F )

and also the hypothesis R, that none of the hypothesis is true, then,

P (Hi | F ) =P (Hi)P (F | Hi)

P (F |R) +∑

j P (Hj)P (F | Hj)

P (R | F ) =P (F | R)

P (F | R) +∑

j P (Hj)P (F | Hj)

In terms of information,

P (Hi | F ) =2−(L(Hi)+L(F |Hi))

2−L(F |R) +∑

j 2−(L(Hj)+L(F ||Hj))

P (R | F ) =2−L(F |R)

2−L(F |R) +∑

j 2−(L(Hj)+L(F |Hj))

In most situations it is a good approximation to assume that F is independent of R,

P (F | R) = P (F )

giving,

Page 40: Inference 1

12.6. BOOLEAN INDUCTIVE INFERENCE 33

P (Hi | F ) ≈ 2−(L(Hi)+L(F |Hi))

2−L(F ) +∑

j 2−(L(Hj)+L(F |Hj))

P (R | F ) ≈ 2−L(F )

2−L(F ) +∑

j 2−(L(Hj)+L(F |Hj))

12.6 Boolean inductive inference

Abductive inference [11] [12] [13] [14] starts with a set of facts F which is a statement (Boolean expression). Abductivereasoning is of the form,

A theory T implies the statement F. As the theory T is simpler than F, abduction says that there is aprobability that the theory T is implied by F.

The theory T, also called an explanation of the condition F, is an answer to the ubiquitous factual “why” question.For example for the condition F is “Why do apples fall?". The answer is a theory T that implies that apples fall;

F = Gm1m2

r2

Inductive inference is of the form,

All observed objects in a class C have a property P. Therefore there is a probability that all objects in aclass C have a property P.

In terms of abductive inference, all objects in a class C or set have a property P is a theory that implies the observedcondition, All observed objects in a class C have a property P.So inductive inference is a special case of abductive inference. In common usage the term inductive inference is oftenused to refer to both abductive and inductive inference.

12.6.1 Generalization and specialization

Inductive inference is related to generalization. Generalizations may be formed from statements by replacing a specificvalue with membership of a category, or by replacing membership of a category with membership of a broadercategory. In deductive logic, generalization is a powerful method of generating new theories that may be true. Ininductive inference generalization generates theories that have a probability of being true.The opposite of generalization is specialization. Specialization is used in applying a general rule to a specific case.Specializations are created from generalizations by replacing membership of a category by a specific value, or byreplacing a category with a sub category.The Linnaen classification of living things and objects forms the basis for generalization and specification. The abilityto identify, recognize and classify is the basis for generalization. Perceiving the world as a collection of objectsappears to be a key aspect of human intelligence. It is the object oriented model, in the non computer science sense.The object oriented model is constructed from our perception. In particularly vision is based on the ability to comparetwo images and calculate how much information is needed to morph or map one image into another. Computer visionuses this mapping to construct 3D images from stereo image pairs.Inductive logic programming is a means of constructing theory that implies a condition. Plotkin’s [15][16] "relativeleast general generalization (rlgg)" approach constructs the simplest generalization consistent with the condition.

12.6.2 Newton’s use of induction

Isaac Newton used inductive arguments in constructing his law of universal gravitation.[17] Starting with the statement,

Page 41: Inference 1

34 CHAPTER 12. INDUCTIVE PROBABILITY

• The center of an apple falls towards the center of the earth.

Generalizing by replacing apple for object, and earth for object gives, in a two body system,

• The center of an object falls towards the center of another object.

The theory explains all objects falling, so there is strong evidence for it. The second observation,

• The planets appear to follow an elliptical path.

After some complicated mathematical calculus, it can be seen that if the acceleration follows the inverse square lawthen objects will follow an ellipse. So induction gives evidence for the inverse square law.Using Galileo’s observation that all objects drop with the same speed,

F1 = m1a1 =m1k1r2

i1

F2 = m2a2 =m2k2r2

i2

where i1 and i2 vectors towards the center of the other object. Then using Newton’s third law F1 = −F2

F = Gm1m2

r2

12.6.3 Probabilities for inductive inference

Implication determines condition probability as,

T → F ⇐⇒ P (F | T ) = 1

So,

P (F | T ) = 1

L(F | T ) = 0

This result may be used in the probabilities given for Bayesian hypothesis testing. For a single theory, H = T and,

P (T | F ) =P (T )

P (F )

or in terms of information, the relative probability is,

P (T | F ) = 2−(L(T )−L(F ))

Note that this estimate for P(T|F) is not a true probability. If L(Ti) < L(F ) then the theory has evidence to supportit. Then for a set of theories Ti = Hi , such that L(Ti) < L(F ) ,

P (Ti | F ) =P (Ti)

P (F | R) +∑

j P (Tj)

Page 42: Inference 1

12.7. DERIVATIONS 35

P (R | F ) =P (F | R)

P (F | R) +∑

j P (Tj)

giving,

P (Ti | F ) ≈ 2−L(Ti)

2−L(F ) +∑

j 2−L(Tj)

P (R | F ) ≈ 2−L(F )

2−L(F ) +∑

j 2−L(Tj)

12.7 Derivations

12.7.1 Derivation of inductive probability

Make a list of all the shortest programs Ki that each produce a distinct infinite string of bits, and satisfy the relation,

Tn(R(Ki)) = x

where,

R(Ki) is the result of running the program Ki .Tn truncates the string after n bits.

The problem is to calculate the probability that the source is produced by program Ki , given that the truncated sourceafter n bits is x. This is represented by the conditional probability,

P (s = R(Ki) | Tn(s) = x)

Using the extended form of Bayes’ theorem

P (Ai | B) =P (B | Ai)P (Ai)∑

j

P (B | Aj)P (Aj)·

where,

B = (Tn(s) = x)

Ai = (s = R(Ki))

The extended form relies on the law of total probability. This means that the Ai must be distinct possibilities, whichis given by the condition that each Ki produce a different infinite string. Also one of the conditions Ai must be true.This must be true, as in the limit as n tends to infinity, there is always at least one program that produces Tn(s) .Then using the extended form and substituting for B and Ai gives,

P (s = R(Ki) | Tn(s) = x) =P (Tn(s) = x | s = R(Ki))P (s = R(Ki))∑

j

P (Tn(s) = x | s = R(Kj))P (s = R(Kj))·

As Ki are chosen so that Tn(R(Ki)) = x , then,

Page 43: Inference 1

36 CHAPTER 12. INDUCTIVE PROBABILITY

P (Tn(s) = x | s = R(Ki)) = 1

The a-priori probability of the string being produced from the program, given no information about the string, isbased on the size of the program,

P (s = R(Ki)) = 2−I(Ki)

giving,

P (s = R(Ki) | Tn(s) = x) =2−I(Ki)∑j

2−I(Kj)·

Programs that are the same or longer than the length of x provide no predictive power. Separate them out giving,

P (s = R(Ki) | Tn(s) = x) =2−I(Ki)∑

j:I(Kj)<n

2−I(Kj) +∑

j:I(Kj)>=n

2−I(Kj)·

Then identify the two probabilities as,

Probability that x has a pattern =∑

j:I(Kj)<n

2−I(Kj)

The opposite of this,

Probability that x is a random set of bits =∑

j:I(Kj)>=n

2−I(Kj)

But the prior probability that x is a random set of bits is 2−n . So,

P (s = R(Ki) | Tn(s) = x) =2−I(Ki)

2−n +∑

j:I(Kj)<n

2−I(Kj)·

The probability that the source is random, or unpredictable is,

P (random(s) | Tn(s) = x) =2−n

2−n +∑

j:I(Kj)<n

2−I(Kj)·

12.7.2 A model for inductive inference

A model of how worlds are constructed is used in determining the probabilities of theories,

• A random bit string is selected.

• A condition is constructed from the bit string.

• A world is constructed that is consistent with the condition.

Page 44: Inference 1

12.7. DERIVATIONS 37

If w is the bit string then the world is created such that R(w) is true. An intelligent agent has some facts about theword, represented by the bit string c, which gives the condition,

C = R(c)

The set of bit strings identical with any condition x is E(x) .

∀x,E(x) = {w : R(w) ≡ x}

A theory is a simpler condition that explains (or implies) C. The set of all such theories is called T,

T (C) = {t : t → C}

Applying Bayes’ theorem

extended form of Bayes’ theorem may be applied

P (Ai | B) =P (B | Ai)P (Ai)∑

j

P (B | Aj)P (Aj)·

where,

B = E(C)

Ai = E(t)

To apply Bayes’ theorem the following must hold,

• Ai is a partition of the event space.

For T (C) to be a partition, no bit string n may belong to two theories. To prove this assume they can and derive acontradiction,

N ∈ T ∧N ∈ M ∧N ̸= M ∧ n ∈ E(N) ∧ n ∈ E(M)

=⇒ N ̸= M ∧R(n) ≡ N ∧R(n) ≡ M

=⇒ falseSecondly prove that T includes all outcomes consistent with the condition. As all theories consistent with C areincluded then R(w) must be in this set.So Bayes theorem may be applied as specified giving,

∀t ∈ T (C), P (E(t) | E(C)) =P (E(t)) · P (E(C) | E(t))∑

j∈T (C) P (E(j)) · P (E(C) | E(j))

Using the implication and condition probability law, the definition of T (C) implies,

∀t ∈ T (C), P (E(C) | E(t)) = 1

Page 45: Inference 1

38 CHAPTER 12. INDUCTIVE PROBABILITY

The probability of each theory in T is given by,

∀t ∈ T (C), P (E(t)) =∑

n:R(n)≡t

2−L(n)

so,

∀t ∈ T (C), P (E(t) | E(C)) =

∑n:R(n)≡t 2

−L(n)∑j∈T (C)

∑m:R(m)≡j 2

−L(m))

Finally the probabilities of the events may be identified with the probabilities of the condition which the outcomes inthe event satisfy,

∀t ∈ T (C), P (E(t) | E(C)) = P (t | C)

giving

∀t ∈ T (C), P (t | C) =

∑n:R(n)≡t 2

−L(n)∑j∈T (C)

∑m:R(m)≡j 2

−L(m)

This is the probability of the theory t after observing that the condition C holds.

Removing theories without predictive power

Theories that are less probable than the condition C have no predictive power. Separate them out giving,

∀t ∈ T (C), P (t | C) =P (E(t))

(∑

j:j∈T (C)∧P (E(j))>P (E(C)) P (E(j))) + (∑

j:j∈T (C)∧P (E(j))≤P (E(C)) P (j))

The probability of the theories without predictive power on C is the same as the probability of C. So,

P (E(C)) =∑

j:j∈T (C)∧P (E(j))≤P (E(C))

P (j)

So the probability

∀t ∈ T (C), P (t | C) =P (E(t))

P (E(C)) +∑

j:j∈T (C)∧P (E(j))>P (E(C)) P (E(j))

and the probability of no prediction for C, written as random(C) ,

P (random(C) | C) =P (E(C))

P (E(C)) +∑

j:j∈T (C)∧P (E(j))>P (E(C)) P (E(j))

The probability of a condition was given as,

∀t, P (E(t)) =∑

n:R(n)≡t

2−L(n)

Page 46: Inference 1

12.8. KEY PEOPLE 39

Bit strings for theories that are more complex than the bit string given to the agent as input have no predictive power.There probabilities are better included in the random case. To implement this a new definition is given as F in,

∀t, P (F (t, c)) =∑

n:R(n)≡t∧L(n)<L(c)

2−L(n)

Using F, an improved version of the abductive probabilities is,

∀t ∈ T (C), P (t | C) =P (F (t, c))

P (F (C, c)) +∑

j:j∈T (C)∧P (F (j,c))>P (F (C,c)) P (E(j, c))

P (random(C) | C) =P (F (C, c))

P (F (C, c)) +∑

j:j∈T (C)∧P (F (j,c))>P (F (C,c)) P (F (j, c))

12.8 Key people• William of Ockham

• Thomas Bayes

• Ray Solomonoff

• Andrey Kolmogorov

• Chris Wallace

• D. M. Boulton

• Jorma Rissanen

• Marcus Hutter

12.9 See also• Abductive reasoning

• Algorithmic probability

• Algorithmic information theory

• Bayesian inference

• Information theory

• Inductive inference

• Inductive logic programming

• Inductive reasoning

• Learning

• Minimum message length

• Minimum description length

• Occam’s razor

• Solomonoff’s theory of inductive inference

• Universal artificial intelligence

Page 47: Inference 1

40 CHAPTER 12. INDUCTIVE PROBABILITY

12.10 References[1] Wallace, Chris; Boulton (1968). “An information measure for classification”. Computer Journal 11 (2): 185–194.

[2] Rissanen, J. (1978). “Modeling by shortest data description”. Automatica 14 (5): 465–658. doi:10.1016/0005-1098(78)90005-5.

[3] Allison, Lloyd. “Minimum Message Length (MML) – LA’s MML introduction”.

[4] Oliver, J. J.; Baxter, Rohan A. “MML and Bayesianism: Similarities and Differences (Introduction to Minimum EncodingInference – Part II)".

[5] Li, M. and Vitanyi, P., An Introduction to Kolmogorov Complexity and Its Applications, 3rd Edition, Springer Science andBusiness Media, N.Y., 2008, p 347

[6] Solomonoff, R., "A Preliminary Report on a General Theory of Inductive Inference", Report V-131, Zator Co., Cambridge,Ma. Feb 4, 1960, revision, Nov., 1960.

[7] Solomonoff, R., "A Formal Theory of Inductive Inference, Part I" Information and Control, Vol 7, No. 1 pp 1–22, March1964.

[8] Solomonoff, R., "A Formal Theory of Inductive Inference, Part II" Information and Control, Vol 7, No. 2 pp 224–254,June 1964.

[9] Hutter, Marcus (1998). Sequential Decisions Based on Algorithmic Probability. Springer. ISBN 3-540-22139-5.

[10] Carnap, Rudolf. “STATISTICAL AND INDUCTIVE PROBABILITY”.

[11] “Abduction”.

[12] Pfeifer, Niki; Kleiter, Gernot D. (2006). “INFERENCE IN CONDITIONAL PROBABILITY LOGIC”. Kybernetika 42(4): 391– 404.

[13] “Conditional Probability”. Artificial Intelligence - Foundations of computational agents.

[14] “Introduction to the theory of Inductive Logic Programming (ILP)".

[15] Plotkin, Gordon D. (1970). Meltzer, B.; Michie, D., eds. “A Note on Inductive Generalization”. Machine Intelligence(Edinburgh University Press) 5: 153–163.

[16] Plotkin, Gordon D. (1971). Meltzer, B.; Michie, D., eds. “A Further Note on Inductive Generalization”. Machine Intelli-gence (Edinburgh University Press) 6: 101–124.

[17] Isaac Newton: “In [experimental] philosophy particular propositions are inferred from the phenomena and afterwardsrendered general by induction": "Principia", Book 3, General Scholium, at p.392 in Volume 2 of Andrew Motte’s Englishtranslation published 1729.

12.11 External links• Rathmanner, S and Hutter, M., “A Philosophical Treatise of Universal Induction” in Entropy 2011, 13, 1076–

1136: A very clear philosophical and mathematical analysis of Solomonoff’s Theory of Inductive Inference.

• C.S. Wallace, Statistical and Inductive Inference by Minimum Message Length, Springer-Verlag (InformationScience and Statistics), ISBN 0-387-23795-X, May 2005 – chapter headings, table of contents and samplepages.

Page 48: Inference 1

Chapter 13

Inference

Inference is the act or process of deriving logical conclusions from premises known or assumed to be true.[1] Theconclusion drawn is also called an idiomatic. The laws of valid inference are studied in the field of logic.Alternatively, inference may be defined as the non-logical, but rational means, through observation of patterns of facts,to indirectly see new meanings and contexts for understanding. Of particular use to this application of inference areanomalies and symbols. Inference, in this sense, does not draw conclusions but opens new paths for inquiry. (Seesecond set of Examples.) In this definition of inference, there are two types of inference: inductive inference anddeductive inference. Unlike the definition of inference in the first paragraph above, meaning of word meanings arenot tested but meaningful relationships are articulated.Human inference (i.e. how humans draw conclusions) is traditionally studied within the field of cognitive psychology;artificial intelligence researchers develop automated inference systems to emulate human inference.Statistical inference uses mathematics to draw conclusions in the presence of uncertainty. This generalizes determin-istic reasoning, with the absence of uncertainty as a special case. Statistical inference uses quantitative or qualitative(categorical) data which may be subject to random variation.

13.1 Examples

Greek philosophers defined a number of syllogisms, correct three part inferences, that can be used as building blocksfor more complex reasoning. We begin with a famous example:

1. All men are mortal

2. Socrates is a man

3. Therefore, Socrates is mortal.

The reader can check that the premises and conclusion are true, but Logic is concerned with inference: does the truthof the conclusion follow from that of the premises?The validity of an inference depends on the form of the inference. That is, the word “valid” does not refer to the truthof the premises or the conclusion, but rather to the form of the inference. An inference can be valid even if the partsare false, and can be invalid even if some parts are true. But a valid form with true premises will always have a trueconclusion.For example, consider the form of the following symbological track:

1. All meat comes from animals.

2. Beef is a type of meat.

3. Therefore, beef comes from an animal.

41

Page 49: Inference 1

42 CHAPTER 13. INFERENCE

If the premises are true, then the conclusion is necessarily true, too.Now we turn to an invalid form.

1. All A are B.

2. C is a B.

3. Therefore, C is an A.

To show that this form is invalid, we demonstrate how it can lead from true premises to a false conclusion.

1. All apples are fruit. (Correct)

2. Bananas are fruit. (Correct)

3. Therefore, bananas are apples. (Wrong)

A valid argument with false premises may lead to a false conclusion:

1. All tall people are Greek.

2. John Lennon was tall.

3. Therefore, John Lennon was Greek. (wrong)

When a valid argument is used to derive a false conclusion from false premises, the inference is valid because itfollows the form of a correct inference.A valid argument can also be used to derive a true conclusion from false premises:

1. All tall people are musicians (although wrong)

2. John Lennon was tall (right, Valid)

3. Therefore, John Lennon was a musician (Right)

In this case we have two false premises that imply a true conclusion.

13.1.1 Example for definition #2

Evidence: It is the early 1950s and you are an American stationed in the Soviet Union. You read in the Moscownewspaper that a soccer team from a small city in Siberia starts winning game after game. The team even defeatsthe Moscow team. Inference: The small city in Siberia is not a small city anymore. The Soviets are working on theirown nuclear or high-value secret weapons program.Knowns: The Soviet Union is a command economy: people and material are told where to go and what to do. Thesmall city was remote and historically had never distinguished itself; its soccer season was typically short because ofthe weather.Explanation: In a command economy, people and material are moved where they are needed. Large cities mightfield good teams due to the greater availability of high quality players; and teams that can practice longer (weather,facilities) can reasonably be expected to be better. In addition, you put your best and brightest in places where theycan do the most good—such as on high-value weapons programs. It is an anomaly for a small city to field such agood team. The anomaly (i.e. the soccer scores and great soccer team) indirectly described a condition by which theobserver inferred a new meaningful pattern—that the small city was no longer small. Why would you put a large cityof your best and brightest in the middle of nowhere? To hide them, of course.

13.2 Incorrect inference

An incorrect inference is known as a fallacy. Philosophers who study informal logic have compiled large lists of them,and cognitive psychologists have documented many biases in human reasoning that favor incorrect reasoning.

Page 50: Inference 1

13.3. AUTOMATIC LOGICAL INFERENCE 43

13.3 Automatic logical inference

AI systems first provided automated logical inference and these were once extremely popular research topics, leadingto industrial applications under the form of expert systems and later business rule engines. More recent work onautomated theorem proving has had a stronger basis in formal logic.An inference system’s job is to extend a knowledge base automatically. The knowledge base (KB) is a set of proposi-tions that represent what the system knows about the world. Several techniques can be used by that system to extendKB by means of valid inferences. An additional requirement is that the conclusions the system arrives at are relevantto its task.

13.3.1 Example using Prolog

Prolog (for “Programming in Logic”) is a programming language based on a subset of predicate calculus. Its mainjob is to check whether a certain proposition can be inferred from a KB (knowledge base) using an algorithm calledbackward chaining.Let us return to our Socrates syllogism. We enter into our Knowledge Base the following piece of code:mortal(X) :- man(X). man(socrates).( Here :- can be read as “if”. Generally, if P → Q (if P then Q) then in Prolog we would code Q:-P (Q if P).)This states that all men are mortal and that Socrates is a man. Now we can ask the Prolog system about Socrates:?- mortal(socrates).(where ?- signifies a query: Can mortal(socrates). be deduced from the KB using the rules) gives the answer “Yes”.On the other hand, asking the Prolog system the following:?- mortal(plato).gives the answer “No”.This is because Prolog does not know anything about Plato, and hence defaults to any property about Plato beingfalse (the so-called closed world assumption). Finally ?- mortal(X) (Is anything mortal) would result in “Yes” (and insome implementations: “Yes": X=socrates)Prolog can be used for vastly more complicated inference tasks. See the corresponding article for further examples.

13.3.2 Use with the semantic web

Recently automatic reasoners found in semantic web a new field of application. Being based upon description logic,knowledge expressed using one variant of OWL can be logically processed, i.e., inferences can be made upon it.

13.3.3 Bayesian statistics and probability logic

Philosophers and scientists who follow the Bayesian framework for inference use the mathematical rules of probabilityto find this best explanation. The Bayesian view has a number of desirable features—one of them is that it embeds de-ductive (certain) logic as a subset (this prompts some writers to call Bayesian probability “probability logic”, followingE. T. Jaynes).Bayesians identify probabilities with degrees of beliefs, with certainly true propositions having probability 1, andcertainly false propositions having probability 0. To say that “it’s going to rain tomorrow” has a 0.9 probability is tosay that you consider the possibility of rain tomorrow as extremely likely.Through the rules of probability, the probability of a conclusion and of alternatives can be calculated. The bestexplanation is most often identified with the most probable (see Bayesian decision theory). A central rule of Bayesianinference is Bayes’ theorem.See Bayesian inference for examples.

Page 51: Inference 1

44 CHAPTER 13. INFERENCE

13.3.4 Nonmonotonic logic[2]

A relation of inference is monotonic if the addition of premises does not undermine previously reached conclusions;otherwise the relation is nonmonotonic. Deductive inference is monotonic: if a conclusion is reached on the basis ofa certain set of premises, then that conclusion still holds if more premises are added.By contrast, everyday reasoning is mostly nonmonotonic because it involves risk: we jump to conclusions from de-ductively insufficient premises. We know when it is worth or even necessary (e.g. in medical diagnosis) to take therisk. Yet we are also aware that such inference is defeasible—that new information may undermine old conclusions.Various kinds of defeasible but remarkably successful inference have traditionally captured the attention of philoso-phers (theories of induction, Peirce’s theory of abduction, inference to the best explanation, etc.). More recentlylogicians have begun to approach the phenomenon from a formal point of view. The result is a large body of theoriesat the interface of philosophy, logic and artificial intelligence.

13.4 See also• Reasoning

• Abductive reasoning• Deductive reasoning• Inductive reasoning• Retroductive reasoning

• Reasoning System

• Entailment

• Epilogism

• Analogy

• Axiom

• Bayesian inference

• Frequentist inference

• Business rule

• Business rules engine

• Expert system

• Fuzzy logic

• Immediate inference

• Inference engine

• Inferential programming

• Inquiry

• Logic

• Logic of information

• Logical assertion

• Logical graph

• Nonmonotonic logic

• Rule of inference

Page 52: Inference 1

13.5. REFERENCES 45

• List of rules of inference

• Theorem

• Transduction (machine learning)

• Sherlock Holmes

13.5 References[1] http://www.thefreedictionary.com/inference

[2] Fuhrmann, André. Nonmonotonic Logic (PDF). Archived from the original (PDF) on 9 December 2003.

13.6 Further reading• Hacking, Ian (2011). An Introduction to Probability and Inductive Logic. Cambridge University Press. ISBN

0-521-77501-9.

• Jaynes, Edwin Thompson (2003). Probability Theory: The Logic of Science. Cambridge University Press.ISBN 0-521-59271-2.

• McKay, David J.C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge UniversityPress. ISBN 0-521-64298-1.

• Russell, Stuart J.; Norvig, Peter (2003), Artificial Intelligence: A Modern Approach (2nd ed.), Upper SaddleRiver, New Jersey: Prentice Hall, ISBN 0-13-790395-2

• Tijms, Henk (2004). Understanding Probability. Cambridge University Press. ISBN 0-521-70172-4.

Inductive inference:

• Carnap, Rudolf; Jeffrey, Richard C., eds. (1971). Studies in Inductive Logic and Probability 1. The Universityof California Press.

• Jeffrey, Richard C., ed. (1979). Studies in Inductive Logic and Probability 2. The University of CaliforniaPress.

• Angluin, Dana (1976). An Application of the Theory of Computational Complexity to the Study of InductiveInference (Ph.D.). University of California at Berkeley.

• Angluin, Dana (1980). “Inductive Inference of Formal Languages from Positive Data” (PDF). Information andControl 45: 117–135. doi:10.1016/s0019-9958(80)90285-5.

• Angluin, Dana; Smith, Carl H. (Sep 1983). “Inductive Inference: Theory and Methods” (PDF). ComputingSurveys 15 (3): 237–269. doi:10.1145/356914.356918.

• Gabbay, Dov M.; Hartmann, Stephan; Woods, John, eds. (2009). Inductive Logic. Handbook of the Historyof Logic 10. Elsevier.

• Goodman, Nelson (1973). Fact, Fiction, and Forecast. Bobbs-Merrill Co. Inc.

Abductive inference:

• O'Rourke, P.; Josephson, J., eds. (1997). Automated abduction: Inference to the best explanation. AAAI Press.

• Psillos, Stathis (2009). Gabbay, Dov M.; Hartmann, Stephan; Woods, John, eds. An Explorer upon UntroddenGround: Peirce on Abduction (PDF). Handbook of the History of Logic 10. Elsevier. pp. 117–152.

• Ray, Oliver (Dec 2005). Hybrid Abductive Inductive Learning (Ph.D.). University of London, Imperial College.CiteSeerX: 10 .1 .1 .66 .1877.

Page 53: Inference 1

46 CHAPTER 13. INFERENCE

Psychological investigations about human reasoning:

• deductive:

• Johnson-Laird, Philip Nicholas; Byrne, Ruth M. J. (1992). Deduction. Erlbaum.• Byrne, Ruth M. J.; Johnson-Laird, P. N. (2009). ""If” and the Problems of Conditional Reasoning”

(PDF). Trends in Cognitive Science 13 (7): 282–287. doi:10.1016/j.tics.2009.04.003.• Knauff, Markus; Fangmeier, Thomas; Ruff, Christian C.; Johnson-Laird, P. N. (2003). “Reasoning,

Models, and Images: Behavioral Measures and Cortical Activity” (PDF). Journal of Cognitive Neuro-science 15 (4): 559–573. doi:10.1162/089892903321662949.

• Johnson-Laird, Philip N. (1995). Gazzaniga, M. S., ed. Mental Models, Deductive Reasoning, and theBrain (PDF). MIT Press. pp. 999–1008.

• Khemlani, Sangeet; Johnson-Laird, P. N. (2008). “Illusory Inferences about Embedded Disjunctions”.Proceedings of the 30th Annual Conference of the Cognitive Science Society. Washington/DC (PDF). pp.2128–2133.

• statistical:

• McCloy, Rachel; Byrne, Ruth M. J.; Johnson-Laird, Philip N. (2009). “Understanding Cumulative Risk”(PDF). The Quarterly Journal of Experimental Psychology: 18.

• Johnson-Laird, Philip N. (1994). “Mental Models and Probabilistic Thinking” (PDF). Cognition 50:189–209. doi:10.1016/0010-0277(94)90028-0.,

• analogical:

• Burns, B. D. (1996). “Meta-Analogical Transfer: Transfer Between Episodes of Analogical Reasoning”.Journal of Experimental Psychology: Learning, Memory, and Cognition 22 (4): 1032–1048. doi:10.1037/0278-7393.22.4.1032.

• spatial:

• Jahn, Georg; Knauff, Markus; Johnson-Laird, P. N. (2007). “Preferred mental models in reasoning aboutspatial relations” (PDF). Memory & Cognition 35 (8): 2075–2087. doi:10.3758/bf03192939.

• Knauff, Markus; Johnson-Laird, P. N. (2002). “Visual imagery can impede reasoning” (PDF). Memory& Cognition 30 (3): 363–371. doi:10.3758/bf03194937.

• Waltz, James A.; Knowlton, Barbara J.; Holyoak, Keith J.; Boone, Kyle B.; Mishkin, Fred S.; de MenezesSantos, Marcia; Thomas, Carmen R.; Miller, Bruce L. (Mar 1999). “A System for Relational Reason-ing in Human Prefrontal Cortex” (PDF). Psychological Science 10 (2): 119–125. doi:10.1111/1467-9280.00118.

• moral:

• Bucciarelli, Monica; Khemlani, Sangeet; Johnson-Laird, P. N. (Feb 2008). “The Psychology of MoralReasoning” (PDF). Judgment and Decision Making 3 (2): 121–139.

13.7 External links• Inference at PhilPapers

• Inference at the Indiana Philosophy Ontology Project

Page 54: Inference 1

Chapter 14

Inference engine

An Inference Engine is a tool from artificial intelligence. The first inference engines were components of expertsystems. The typical expert system consisted of a knowledge base and an inference engine. The knowledge basestored facts about the world. The inference engine applied logical rules to the knowledge base and deduced newknowledge. This process would iterate as each new fact in the knowledge base could trigger additional rules in theinference engine. Inference engines work primarily in one of two modes: forward chaining and backward chaining.Forward chaining starts with the known facts and asserts new facts. Backward chaining starts with goals, and worksbackward to determine what facts must be asserted so that the goals can be achieved.[1]

14.1 Architecture

The logic that an inference engine uses is typically represented as IF-THEN rules. The general format of such rulesis IF <logical expression> THEN <logical expression>. Prior to the development of expert systems and inferenceengines artificial intelligence researchers focused on more powerful theorem prover environments that offered muchfuller implementations of First Order Logic. For example, general statements that included universal quantification(for all X some statement is true) and existential quantification (there exists some X such that some statement is true).What researchers discovered is that the power of these theorem proving environments was also their drawback. Itwas far too easy to create logical expressions that could take an indeterminate or even infinite time to terminate.For example, it is common in universal quantification to make statements over an infinite set such as the set ofall natural numbers. Such statements are perfectly reasonable and even required in mathematical proofs but whenincluded in an automated theorem prover executing on a computer may cause the computer to fall into an infiniteloop. Focusing on IF-THEN statements (what logicians call Modus Ponens) still gave developers a very powerfulgeneral mechanism to represent logic but one that could be used efficiently with computational resources. What ismore there is some psychological research that indicates humans also tend to favor IF-THEN representations whenstoring complex knowledge.[2]

A simple example of Modus Ponens often used in introductory logic books is “If you are human then you are mortal”.This can be represented in pseudocode as:Rule1: Human(x) => Mortal(x)A trivial example of how this rule would be used in an inference engine is as follows. In forward chaining, theinference engine would find any facts in the knowledge base that matched Human(x) and for each fact it found wouldadd the new information Mortal(x) to the knowledge base. So if it found an object called Socrates that was Humanit would deduce that Socrates was Mortal. In Backward Chaining the system would be given a goal, e.g. answer thequestion is Socrates Mortal? It would search through the knowledge base and determine if Socrates was Human and ifso would assert he is also Mortal. However, in backward chaining a common technique was to integrate the inferenceengine with a user interface. In that way rather than simply being automated the system could now be interactive. Inthis trivial example if the system was given the goal to answer the question if Socrates was Mortal and it didn't yetknow if he was human it would generate a window to ask the user the question “Is Socrates Human?" and would thenuse that information accordingly.This innovation of integrating the inference engine with a user interface led to the second early advancement of expertsystems: explanation capabilities. The explicit representation of knowledge as rules rather than code made it possible

47

Page 55: Inference 1

48 CHAPTER 14. INFERENCE ENGINE

to generate explanations to users. Both explanations in real time and after the fact. So if the system asked the user “IsSocrates Human?" the user may wonder why she was being asked that question and the system would use the chain ofrules to explain why it was currently trying to ascertain that bit of knowledge: i.e., it needs to determine if Socrates isMortal and to do that needs to determine if he is Human. At first these explanations were not much different than thestandard debugging information that developers deal with when debugging any system. However, an active area ofresearch was utilizing natural language technology to ask, understand, and generate questions and explanations usingnatural languages rather than computer formalisms.[3]

An inference engine cycles through three sequential steps: match rules, select rules, and execute rules. The executionof the rules will often result in new facts or goals being added to the knowledge base which will trigger the cycle torepeat. This cycle continues until no new rules can be matched.In the first step, match rules, the inference engine finds all of the rules that are triggered by the current contents of theknowledge base. In forward chaining the engine looks for rules where the antecedent (left hand side) matches somefact in the knowledge base. In backward chaining the engine looks for antecedents that can satisfy one of the currentgoals.In the second step select rules, the inference engine prioritizes the various rules that were matched to determine theorder to execute them. In the final step, execute rules, the engine executes each matched rule in the order determinedin step two and then iterates back to step one again. The cycle continues until no new rules are matched.[4]

14.2 Implementations

Early inference engines focused primarily on forward chaining. These systems were usually implemented in theLisp programming language. Lisp was a frequent platform for early AI research due to its strong capability to dosymbolic manipulation. Also, as an interpreted language it offered productive development environments appropriateto debugging complex programs. A necessary consequence of these benefits was that Lisp programs tended to beslower and less robust than compiled languages of the time such as C. A common approach in these early days was totake an expert system application and remove the inference engine used for that system and package it as a re-usabletool other researchers could use for the development of other expert systems. For example, MYCIN was an earlyexpert system for medical diagnosis and EMYCIN was an inference engine extrapolated from MYCIN and madeavailable for other researchers.[1]

As expert systems moved from research prototypes to deployed systems there was more focus on issues such asspeed and robustness. One of the first and most popular forward chaining engines was OPS5 which used the Retealgorithm to optimize the efficiency of rule firing. Another very popular technology that was developed was the Prologlogic programming language. Prolog focused primarily on backward chaining and also featured various commercialversions and optimizations for efficiency and robustness.[5]

As Expert Systems prompted significant interest from the business world various companies, many of them startedor guided by prominent AI researchers created productized versions of inference engines. For example, Intellicorpwas initially guided by Edward Feigenbaum. These inference engine products were also often developed in Lisp atfirst. However, demands for more affordable and commercially viable platforms eventually made Personal Computerplatforms very popular.

14.3 See also

• Action selection mechanism

• Backward chaining

• Expert system

• Forward chaining

• Inductive inference

Page 56: Inference 1

14.4. REFERENCES 49

14.4 References[1] Hayes-Roth, Frederick; Donald Waterman; Douglas Lenat (1983). Building Expert Systems. Addison-Wesley. ISBN 0-

201-10686-8.

[2] Feigenbaum, Edward; Avron Barr (September 1, 1986). TheHandbook of Artificial Intelligence, Volume I. Addison-Wesley.p. 195. ISBN 0201118114.

[3] Barzilayt, Regina; Daryl McCullough*, Owen Rambow* Jonathan DeCristofaro$, Tanya Korelsky*, Benoit Lavoie*. “ANEW APPROACH TO EXPERT SYSTEM EXPLANATIONS”. USAF Rome Laboratory Report.

[4] Griffin, N.L. “A Rule-Based Inference Engine which is Optimal and VLSI Implementable” (PDF). http://www.cs.uky.edu''.University of Kentucky. Retrieved 6 December 2013.

[5] Sterling, Leon; Ehud Shapiro (1986). The Art of Prolog. Cambridge, MA: MIT. ISBN 0-262-19250-0.

Page 57: Inference 1

Chapter 15

Inference objection

In informal logic, an inference objection is an objection to an argument based not on any of its stated premises, butrather on the relationship between premise and contention. For a given simple argument, if the assumption is madethat its premises are correct, fault may be found in the progression from these to the conclusion of the argument. Thiscan often take the form of an unstated co-premise, as in Begging the question. In other words, it may be necessary tomake an assumption in order to conclude anything from a set of true statements. This assumption must also be truein order that the conclusion follow logically from the initial statements.

15.1 Example

In the example to the left, the objector can't find anything contentious in the stated premises of the argument sup-porting the conclusion that “There is no danger in NASA’s Stardust Mission bringing material from the Wild 2 cometback to Earth”, but still disagrees with the conclusion. The objection is therefore placed beside the main premise andexactly corresponds to an unstated or 'hidden' co-premise. This is demonstrated by the argument map to the right inwhich the full pattern of reasoning relating to the contention is set out.

15.2 References[1] Doom in the sky? - 24 January 2004 - New Scientist

50

Page 58: Inference 1

15.2. REFERENCES 51

An example of an inference objection based on NASA's Stardust Mission.[1]

Page 59: Inference 1

52 CHAPTER 15. INFERENCE OBJECTION

The same argument with the originally unstated co-premise included.

Page 60: Inference 1

Chapter 16

Logical hexagon

The logical hexagon extends the square of opposition to six statements.

The logical hexagon (also called the hexagon of opposition) is a conceptual model of the relationships between the

53

Page 61: Inference 1

54 CHAPTER 16. LOGICAL HEXAGON

truth values of six statements. It is an extension of Aristotle's square of opposition. It was discovered independentlyby both Augustin Sesmat and Robert Blanché.[1]

This extension consists in introducing two statements U and Y. Whereas U is the disjunction of A and E, Y is theconjunction of the two traditional particulars I and O.

16.1 Summary of relationships

The traditional square of opposition demonstrates two sets of contradictories A and O, and E and I (i.e. they cannotboth be true and cannot both be false), two contraries A and E (i.e. they can both be false, but cannot both be true),and two subcontraries I andO (i.e. they can both be true, but cannot both be false) according to Aristotle’s definitions.However, the logical hexagon provides that U and Y are also contradictory.

16.2 Interpretations of the logical hexagon

The logical hexagon may be interpreted in various ways, including as a model of traditional logic, quantifications,modal logic, order theory, or paraconsistent logic.The statement A may be interpreted as “Every man is white.”

(∀x)(Mx → Wx) ∧ (∃x)(Mx)

The statement E may be interpreted as “Every man is non-white.”

(∀x)(Mx → ¬Wx)

The statement I may be interpreted as “Some man is white.”

(∃x)(Mx ∧ Wx)

The statement O may be interpreted as “Not every man is white.”

(∃x)(Mx ∧ ¬Wx) ∨ ¬(∃x)(Mx)

The statement U may be interpreted as “Either every man is white or every man is non-white.”

(∀x)(Mx → Wx) ∨ (∀x)(Mx → ¬Wx)

The statement Y may be interpreted as “Some man is white and some man is non-white”

(∃x)(Mx ∧ Wx) ∧ (∃x)(Mx ∧ ¬Wx)

16.2.1 Modal logic

The logical hexagon may be interpreted as a model of modal logic such that

• A is interpreted as necessity

• E is interpreted as impossibility

• I is interpreted as possibility

• O is interpreted as 'not necessarily'

• U is interpreted as non-contingency

• Y is interpreted as contingency

Page 62: Inference 1

16.3. FURTHER EXTENSION 55

16.3 Further extension

It has been proven that both the square and the hexagon, followed by a “logical cube”, belong to a regular series ofn-dimensional objects called “logical bi-simplexes of dimension n.” The pattern also goes even beyond this.[2]

16.4 Further reading• Jean-Yves Beziau (2012), “The power of the hexagon”, Logica Universalis 6, 2012, 1-43. doi:10.1007/s11787-

012-0046-9

• Blanché (1953)

• Blanché (1957)

• Blanché Structures intellectuelles (1966)

• Gallais, P.: (1982)

• Gottschalk (1953)

• Kalinowski (1972)

• Monteil, J.F.: The logical square of Aristotle or square of Apuleius.The logical hexagon of Robert Blanché inStructures intellectuelles.The triangle of Indian logic mentioned by J.M Bochenski.(2005)

• Moretti (2004)

• Moretti (Melbourne)

• Pellissier, R.: " “Setting” n-opposition” (2008)

• Sesmat (1951)

• Smessaert (2009)

16.5 See also• Alessio Moretti

• Jean-Yves Béziau, New Light on the Square of Oppositions and its Nameless Corner

16.6 References[1] N-opposition theory logical hexagon

[2] Moretti, Pellissier

Page 63: Inference 1

Chapter 17

Material inference

In logic, inference is the process of deriving logical conclusions from premises known or assumed to be true. Inchecking a logical inference for formal and material validity, the meaning of only its logical vocabulary and of bothits logical and extra-logical vocabulary is considered, respectively.

17.1 Examples

For example, the inference "Socrates is a human, and each human must eventually die, therefore Socrates must eventu-ally die" is a formally valid inference; it remains valid if the nonlogical vocabulary "Socrates", "is human", and "musteventually die" is arbitrarily, but consistently replaced. [note 1] In contrast, the inference "Montreal is north of NewYork, therefore New York is south of Montreal" is materially valid only; its validity relies on the extra-logical relations"is north of" and "is south of" being converse to each other. [note 2]

17.2 Material inferences vs. enthymemes

Classical formal logic considers the above “north/south” inference as an enthymeme, that is, as an incomplete infer-ence; it can be made formally valid by supplementing the tacitly used conversity relationship explicitly: "Montreal isnorth of New York, and whenever a location x is north of a location y, then y is south of x; therefore New York is southof Montreal". In contrast, the notion of a material inference has been developed by Wilfrid Sellars [1] in order toemphasize his view that such supplements are not necessary to obtain a correct argument.

17.3 Non-monotonic inference

Robert Brandom adopted Sellars’ view,[2] arguing that everyday (practical) reasoning is usually non-monotonic, i.e.additional premises can turn a practically valid inference into an invalid one, e.g.

1. “If I rub this match along the striking surface, then it will ignite.” (p→q)

2. “If p, but the match is inside a strong electromagnetic field, then it will not ignite.” (p∧r→¬q)

3. “If p and r, but the match is in a Faraday cage, then it will ignite.” (p∧r∧s→q)

4. “If p and r and s, but there is no oxygen in the room, then the match will not ignite.” (p∧r∧s∧t→¬q)

5. ...

Therefore, practically valid inference is different from formally valid inference (which is monotonic - the aboveargument that Socrates must eventually die cannot be challenged by whatever additional information), and shouldbetter be modelled by materially valid inference. While a classical logician could add a ceteris paribus clause to 1. tomake it usable in formally valid inferences:

56

Page 64: Inference 1

17.4. NOTES 57

1. “If I rub this match along the striking surface, then, ceteris paribus,[note 3] it will inflame.”

However, Brandom doubts that the meaning of such a clause can be made explicit, and prefers to consider it as a hintto non-monotony rather than a miracle drug to establish monotony.Moreover, the “match” example shows that a typical everyday inference can hardly be ever made formally complete.In a similar way, Lewis Carroll's dialogue "What the Tortoise Said to Achilles" demonstrates that the attempt to makeevery inference fully complete can lead to an infinite regression.[3]

17.4 Notes[1] A completely fictitious, but formally valid inference obtained by consistent replacement is e.g. "Buckbeak is a unicorn, and

each unicorn has gills, therefore Buckbeak has gills".

[2] A completely fictitious, but materially (and formally) invalid inference obtained by consistent replacement is e.g. "Hagridis younger than Albus, therefore Albus is larger than Hagrid". Consistent replacement doesn't respect conversity.

[3] literally: "all other things being equal"; here: "assuming a typical situation"

17.5 References

Stanford Encyclopedia of Philosophy on Sellars view

[1] Wilfrid Sellars (1980). J. Sicha, ed. Inference and Meaning. pp. 261f.

[2] Robert Brandom (2000). Articulating Reasons: An Introduction to Inferentialism. Harvard University Press. ISBN 0-674-00158-3.; Sect. 2.III-IV

[3] Carroll, Lewis (Apr 1895). “What the Tortoise Said to Achilles”. Mind, n.s. 4 (14): 278–280.

Page 65: Inference 1

Chapter 18

Resolution inference

In propositional logic, a resolution inference is an instance of the following rule:[1]

Γ1 ∪ {ℓ} Γ2 ∪{ℓ}

Γ1 ∪ Γ2|ℓ|

We call:

• The clauses Γ1 ∪ {ℓ} and Γ2 ∪{ℓ}

are the inference’s premises

• Γ1 ∪ Γ2 (the resolvent of the premises) is its conclusion.

• The literal ℓ is the left resolved literal,

• The literal ℓ is the right resolved literal,

• |ℓ| is the resolved atom or pivot.

This rule can be generalized to first-order logic to:[2]

Γ1 ∪ {L1} Γ2 ∪ {L2}(Γ1 ∪ Γ2)ϕ

ϕ

where ϕ is a most general unifier of L1 and L2 and Γ1 and Γ2 have no common variables.

18.1 Example

The clauses P (x), Q(x) and ¬P (b) can apply this rule with [b/x] as unifier.Here x is a variable and b is a constant.

P (x), Q(x) ¬P (b)

Q(B)[b/x]

Here we see that

• The clauses P (x), Q(x) and ¬P (x) are the inference’s premises

• Q(b) (the resolvent of the premises) is its conclusion.

58

Page 66: Inference 1

18.2. NOTES 59

• The literal P (x) is the left resolved literal,

• The literal ¬P (b) is the right resolved literal,

• P is the resolved atom or pivot.

• [b/x] is the most general unifier of the resolved literals.

18.2 Notes[1] Fontaine, Pascal; Merz, Stephan; Woltzenlogel Paleo, Bruno. Compression of Propositional Resolution Proofs via Partial

Regularization. 23rd International Conference on Automated Deduction, 2011.

[2] Enrique P. Arís, Juan L. González y Fernando M. Rubio, Lógica Computacional, Thomson, (2005).

Page 67: Inference 1

Chapter 19

Rule of inference

In logic, a rule of inference, inference rule, or transformation rule is a logical form consisting of a function whichtakes premises, analyzes their syntax, and returns a conclusion (or conclusions). For example, the rule of inferencecalled modus ponens takes two premises, one in the form “If p then q” and another in the form “p”, and returns theconclusion “q”. The rule is valid with respect to the semantics of classical logic (as well as the semantics of many othernon-classical logics), in the sense that if the premises are true (under an interpretation), then so is the conclusion.Typically, a rule of inference preserves truth, a semantic property. In many-valued logic, it preserves a generaldesignation. But a rule of inference’s action is purely syntactic, and does not need to preserve any semantic property:any function from sets of formulae to formulae counts as a rule of inference. Usually only rules that are recursiveare important; i.e. rules such that there is an effective procedure for determining whether any given formula is theconclusion of a given set of formulae according to the rule. An example of a rule that is not effective in this sense isthe infinitary ω-rule.[1]

Popular rules of inference in propositional logic include modus ponens, modus tollens, and contraposition. First-orderpredicate logic uses rules of inference to deal with logical quantifiers.

19.1 The standard form of rules of inference

In formal logic (and many related areas), rules of inference are usually given in the following standard form:Premise#1Premise#2...Premise#nConclusionThis expression states that whenever in the course of some logical derivation the given premises have been obtained,the specified conclusion can be taken for granted as well. The exact formal language that is used to describe bothpremises and conclusions depends on the actual context of the derivations. In a simple case, one may use logicalformulae, such as in:A→BABThis is the modus ponens rule of propositional logic. Rules of inference are often formulated as schemata employingmetavariables.[2] In the rule (schema) above, the metavariables A and B can be instantiated to any element of theuniverse (or sometimes, by convention, a restricted subset such as propositions) to form an infinite set of inferencerules.A proof system is formed from a set of rules chained together to form proofs, also called derivations. Any derivationhas only one final conclusion, which is the statement proved or derived. If premises are left unsatisfied in the derivation,then the derivation is a proof of a hypothetical statement: "if the premises hold, then the conclusion holds.”

60

Page 68: Inference 1

19.2. AXIOM SCHEMAS AND AXIOMS 61

19.2 Axiom schemas and axioms

Inference rules may also be stated in this form: (1) zero or more premises, (2) a turnstile symbol ⊢ , which means“infers”, “proves”, or “concludes”, and (3) a conclusion. This form usually embodies the relational (as opposed tofunctional) view of a rule of inference, where the turnstile stands for a deducibility relation holding between premisesand conclusion.An inference rule containing no premises is called an axiom schema or, if it contains no metavariables, simply anaxiom.[2]

Rules of inference must be distinguished from axioms of a theory. In terms of semantics, axioms are valid assertions.Axioms are usually regarded as starting points for applying rules of inference and generating a set of conclusions. Or,in less technical terms:Rules are statements about the system, axioms are statements in the system. For example:

• The rule that from ⊢ p you can infer ⊢ Provable(p) is a statement that says if you've proven p , then it isprovable that p is provable. This rule holds in Peano arithmetic, for example.

• The axiom p → Provable(p) would mean that every true statement is provable. This axiom does not hold inPeano arithmetic.

Rules of inference play a vital role in the specification of logical calculi as they are considered in proof theory, suchas the sequent calculus and natural deduction.

19.3 Example: Hilbert systems for two propositional logics

In a Hilbert system, the premises and conclusion of the inference rules are simply formulae of some language, usuallyemploying metavariables. For graphical compactness of the presentation and to emphasize the distinction betweenaxioms and rules of inference, this section uses the sequent notation (⊢) instead of a vertical presentation of rules.The formal language for classical propositional logic can be expressed using just negation (¬), implication (→) andpropositional symbols. A well-known axiomatization, comprising three axiom schema and one inference rule (modusponens), is:(CA1) ⊢ A → (B → A)(CA2) ⊢ (A → (B → C)) → ((A → B) → (A → C))(CA3) ⊢ (¬A → ¬B) → (B → A)(MP) A, A → B ⊢ B

It may seem redundant to have two notions of inference in this case, ⊢ and →. In classical propositional logic, theyindeed coincide; the deduction theorem states that A ⊢ B if and only if ⊢ A → B. There is however a distinction worthemphasizing even in this case: the first notation describes a deduction, that is an activity of passing from sentences tosentences, whereas A → B is simply a formula made with a logical connective, implication in this case. Without aninference rule (like modus ponens in this case), there is no deduction or inference. This point is illustrated in LewisCarroll's dialogue called "What the Tortoise Said to Achilles".[3]

For some non-classical logics, the deduction theorem does not hold. For example, the three-valued logic Ł3 ofŁukasiewicz can be axiomatized as:[4]

(CA1) ⊢ A → (B → A)(LA2) ⊢ (A → B) → ((B → C) → (A → C))(CA3) ⊢ (¬A → ¬B) → (B → A)(LA4) ⊢ ((A → ¬A) → A) → A(MP) A, A → B ⊢ B

This sequence differs from classical logic by the change in axiom 2 and the addition of axiom 4. The classicaldeduction theorem does not hold for this logic, however a modified form does hold, namely A ⊢ B if and only if ⊢A → (A → B).[5]

Page 69: Inference 1

62 CHAPTER 19. RULE OF INFERENCE

19.4 Admissibility and derivability

Main article: Admissible rule

In a set of rules, an inference rule could be redundant in the sense that it is admissible or derivable. A derivablerule is one whose conclusion can be derived from its premises using the other rules. An admissible rule is onewhose conclusion holds whenever the premises hold. All derivable rules are admissible. To appreciate the difference,consider the following set of rules for defining the natural numbers (the judgment n nat asserts the fact that n is anatural number):

0 natn nat

s(n) nat

The first rule states that 0 is a natural number, and the second states that s(n) is a natural number if n is. In this proofsystem, the following rule, demonstrating that the second successor of a natural number is also a natural number, isderivable:

n nats(s(n)) nat

Its derivation is the composition of two uses of the successor rule above. The following rule for asserting the existenceof a predecessor for any nonzero number is merely admissible:

s(n) natn nat

This is a true fact of natural numbers, as can be proven by induction. (To prove that this rule is admissible, assume aderivation of the premise and induct on it to produce a derivation of n nat .) However, it is not derivable, because itdepends on the structure of the derivation of the premise. Because of this, derivability is stable under additions to theproof system, whereas admissibility is not. To see the difference, suppose the following nonsense rule were added tothe proof system:

s(−3) nat

In this new system, the double-successor rule is still derivable. However, the rule for finding the predecessor is nolonger admissible, because there is no way to derive −3 nat . The brittleness of admissibility comes from the way itis proved: since the proof can induct on the structure of the derivations of the premises, extensions to the system addnew cases to this proof, which may no longer hold.Admissible rules can be thought of as theorems of a proof system. For instance, in a sequent calculus where cutelimination holds, the cut rule is admissible.

19.5 See also

• Inference objection

• Immediate inference

• Law of thought

• List of rules of inference

• Logical truth

• structural rule

Page 70: Inference 1

19.6. REFERENCES 63

19.6 References[1] Boolos, George; Burgess, John; Jeffrey, Richard C. (2007). Computability and logic. Cambridge: Cambridge University

Press. p. 364. ISBN 0-521-87752-0.

[2] John C. Reynolds (2009) [1998]. Theories of Programming Languages. Cambridge University Press. p. 12. ISBN 978-0-521-10697-9.

[3] Kosta Dosen (1996). “Logical consequence: a turn in style”. In Maria Luisa Dalla Chiara, Kees Doets, Daniele Mundici,Johan van Benthem. Logic and Scientific Methods: Volume One of the Tenth International Congress of Logic, Methodologyand Philosophy of Science, Florence, August 1995. Springer. p. 290. ISBN 978-0-7923-4383-7. preprint (with differentpagination)

[4] Bergmann, Merrie (2008). An introduction to many-valued and fuzzy logic: semantics, algebras, and derivation systems.Cambridge University Press. p. 100. ISBN 978-0-521-88128-9.

[5] Bergmann, Merrie (2008). An introduction to many-valued and fuzzy logic: semantics, algebras, and derivation systems.Cambridge University Press. p. 114. ISBN 978-0-521-88128-9.

Page 71: Inference 1

Chapter 20

Scalar implicature

In pragmatics, scalar implicature, or quantity implicature,[1] is an implicature that attributes an implicit meaningbeyond the explicit or literal meaning of an utterance, and which suggests that the utterer had a reason for not using amore informative or stronger term on the same scale. The choice of the weaker characterization suggests that, as faras the speaker knows, none of the stronger characterizations in the scale holds. This is commonly seen in the use of'some' to suggest the meaning 'not all', even though 'some' is logically consistent with 'all'.[2] If Bill says 'I have someof my money in cash', this utterance suggests to a hearer (though the sentence uttered does not logically imply it) thatBill does not have all his money in cash.

20.1 Origin

Scalar implicatures typically arise where the speaker qualifies or scales their statement with language that conveysto the listener an inference or implicature that indicates that the speaker had reasons not to use a stronger, moreinformative, term.[3] For example, where a speaker uses the term “some” in the statement, “Some students can afforda new car.”, the use of “some” gives rise to an inference or implicature that “Not all students can afford a new car.”[3]

As with pragmatic inference generally, such inferences are defeasible or cancellable - the inferred meaning may notbe true, even though the literal meaning is true. This distinguishes such inferences from entailment. They are alsonon-detachable. A conversational implicature is said to be non-detachable when, after the replacement of what issaid with another expression with the same literal meaning, the same conversational implicature remains.[4] Thisdistinguishes them from conventional implicatures.In a 2006 experiment with Greek-speaking five-year-olds’ interpretation of aspectual expressions, the results revealedthat children have limited success in deriving scalar implicatures from the use of aspectual verbs such as “start” (whichimplicates non-completion).[5] However, the tested children succeed in deriving scalar implicatures with discretedegree modifiers such as “half” as in half finished.[5] Their ability to spontaneously compute scalar implicatures wasgreater than their ability to judge the pragmatic appropriateness of scalar statements.[5] In addition, the tested childrenwere able to suspend scalar implicatures in environments where they were not supported.[5]

Griceans attempt to explain these implicatures in terms of the maxim of quantity, according to which one is to bejust as informative as required. The idea is that if the speaker were in a position to make the stronger statement, theywould have. Since (s)he did not, (s)he must believe that the stronger statement is not true.

20.2 Examples of scalar implicature

Some examples of scalar implicature[6] are:

1a. Bill has got some of Chomsky’s papers.1b. The speaker believes that Bill hasn't got all of Chomsky’s papers.

2a. There will be five of us for dinner tonight.

64

Page 72: Inference 1

20.3. REFERENCES 65

2b. There won't be more than five of us for dinner tonight.

3a. She won't necessarily get the job.3b. She will possibly get the job.

Uttering the sentence (a) in most cases will communicate the assumption in (b). This seems to be because the speakerdid not use stronger terms such as 'there will be more than five people for dinner tonight' or 'she can't possibly getthe job'. For example, if Bill really did have all of Chomsky’s papers, the speaker would have said so. However,according to the maxim of quantity, a speaker will only be informative as is required, and will therefore not use anystronger terms unless required. The hearer, knowing this, will assume that the stronger term does not apply.

20.3 References• Robyn Carston, “Informativeness, Relevance and Scalar Implicature” .

• Chierchia G., Guasti M. T., Gualmini A., Meroni L., Crain S., Foppolo F. (2004). Semantic and pragmaticcompetence in children and adults comprehension of or. In Experimental Pragmatics, Eds. I. Noveck and D.Sperber, pag. 283-300, Palgrave Macmillan, New York.

• Laurence R. Horn. 1984. “A new taxonomy for pragmatic inference: Q-based and R-based implicature.” In D.Schiffrin (ed.), Meaning, Form and Use in Context (GURT '84), 11-42. Washington: Georgetown UniversityPress.

• Laurence R. Horn, 'A natural history of negation', 1989, University of Chicago Press: Chicago.

• Kepa Korta, 'Implicitures: Cancelability and Non-detachability',

• Angelika Kratzer, Scalar Implicatures: Are There Any? Workshop on Polarity, Scalar Phenomena, and Im-plicatures. University of Milan-Bicocca June 18, 2003

• Ira Noveck, “When children are more logical than adults : experimental investigations of scalar implicature”,Cognition 2001, vol. 78, no2, pp. 165-188.

• Stanford Encyclopedia of Philosophy, article “Implicature”

• Mante S. Nieuwland, Tali Ditman & Gina R. Kuperberg (2010). On the incrementality of pragmatic process-ing: An ERP investigation of informativeness and pragmatic abilities. Journal of Memory and Language 63(2010) 324–346.

• Zondervan, A. Meroni, L & Gualmini, A.(2010)Experiments on the role of the Question under Discussion forAmbiguity Resolution and Implicature Computation in Adults. In SALT 18.

20.4 Endnotes[1] Hansen, Maj-Britt Mosegaard; Erling Strudsholm (May 1, 2008). “The semantics of particles: advantages of a contrastive

and panchronic approach: a study of the polysemy of French deja and Italian gia.”. Linguistics: An Interdisciplinary Journalof the Language Sciences (Walter de Gruyter) 46 (3): 471. Retrieved 2008-10-24. Moreover, the truth of a sentence like“It’s a pretty big thing in itself if I recover the outlay.” with an inherently scalar predicate allows, in principle, for the truthfulapplication of a predicate higher up on the scale, but will, at the same time, carry a generalized conversational quantityimplicature to the effect that the stronger proposition does not, in fact, hold (cf. Horn 1989; Levinson 2000): “Has Anneever eaten squid? No, she has never eaten that.”

[2] Noveck p. 165

[3] Musolino, Julien; Jeffrey Lidz (July 1, 2006). “Why children aren't universally successful with quantification”. Linguistics:An Interdisciplinary Journal of the Language Sciences (Walter de Gruyter) 44 (4): 818. Retrieved 2008-10-24. Here weinvestigate experimentally the development of the semantics-pragmatics interface, focusing on Greek-speaking five-year-olds’ interpretation of aspectual expressions such as arxizo ('start') and degree modifiers such as miso ('half') and mexriti mesi ('halfway').” “Such expressions are known to give rise to scalar inferences cross-linguistically: for instance, start,even though compatible with expressions denoting completion (e.g. finish), is typically taken to implicate non-completion.Overall, our experiments reveal that children have limited success in deriving scalar implicatures from the use of aspectualverbs but they succeed with 'discrete' degree modifiers such as 'half'.

Page 73: Inference 1

66 CHAPTER 20. SCALAR IMPLICATURE

[4] Korta p.4

[5] Staff (December 25, 2006). “Research reports from University of Delaware, Department of Psychology provide newinsights into child language in children”. Health & Medicine Week (Expanded Reporting ed.) (NewsRX): 195. ISSN1531-6459.

[6] taken from Carston

20.5 See also• Implicature

• Cooperative principle

• Gricean maxims

• Logical consequence

• Entailment (pragmatics)

• Indirect speech act

• Implicate and Explicate Order

• Intrinsic and extrinsic properties

• Laurence Horn

Page 74: Inference 1

Chapter 21

Solomonoff’s theory of inductive inference

Solomonoff's theory of universal inductive inference is a theory of prediction based on logical observations, such aspredicting the next symbol based upon a given series of symbols. The only assumption that the theory makes is thatthe environment follows some unknown but computable probability distribution. It is a mathematical formalizationof Occam’s razor[1][2][3][4][5] and the Principle of Multiple Explanations.[6]

Prediction is done using a completely Bayesian framework. The universal prior is taken over the class of all computablesequences—this is the universal a priori probability distribution; no hypothesis will have a zero probability. Thismeans that Bayes rule of causation can be used in predicting the continuation of any particular sequence.

21.1 Origin

21.1.1 Philosophical

The theory is based in philosophical foundations, and was founded by Ray Solomonoff around 1960.[7] It is a math-ematically formalized combination of Occam’s razor.[1][2][3][4][5] and the Principle of Multiple Explanations.[6] Allcomputable theories which perfectly describe previous observations are used to calculate the probability of the nextobservation, with more weight put on the shorter computable theories. Marcus Hutter’s universal artificial intelligencebuilds upon this to calculate the expected value of an action.

21.1.2 Mathematical

The proof of the “razor” is based on the known mathematical properties of a probability distribution over a denumerableset. These properties are relevant because the infinite set of all programs is a denumerable set. The sum S of theprobabilities of all programs must be exactly equal to one (as per the definition of probability) thus the probabilitiesmust roughly decrease as we enumerate the infinite set of all programs, otherwise S will be strictly greater than one.To be more precise, for every ϵ > 0, there is some length l such that the probability of all programs longer than l is atmost ϵ . This does not, however, preclude very long programs from having very high probability.Fundamental ingredients of the theory are the concepts of algorithmic probability and Kolmogorov complexity. Theuniversal prior probability of any prefix p of a computable sequence x is the sum of the probabilities of all programs(for a universal computer) that compute something starting with p. Given some p and any computable but unknownprobability distribution from which x is sampled, the universal prior and Bayes’ theorem can be used to predict theyet unseen parts of x in optimal fashion.

21.2 Modern applications

67

Page 75: Inference 1

68 CHAPTER 21. SOLOMONOFF’S THEORY OF INDUCTIVE INFERENCE

21.2.1 Artificial intelligence

Though Solomonoff’s inductive inference is not computable, several AIXI-derived algorithms approximate it in orderto make it run on a modern computer. The more they are given computing power, the more their predictions areclose to the predictions of inductive inference (their mathematical limit is Solomonoff’s inductive inference).[8][9][10]

Another direction of inductive inference is based on E. Mark Gold's model of learning in the limit from 1967 andhas developed since then more and more models of learning. [11] The general scenario is the following: Given aclass S of computable functions, is there a learner (that is, recursive functional) which for any input of the form(f(0),f(1),...,f(n)) outputs a hypothesis (an index e with respect to a previously agreed on acceptable numbering ofall computable functions; the indexed function should be consistent with the given values of f). A learner M learnsa function f if almost all its hypotheses are the same index e, which generates the function f; M learns S if M learnsevery f in S. Basic results are that all recursively enumerable classes of functions are learnable while the class REC ofall computable functions is not learnable. Many related models have been considered and also the learning of classesof recursively enumerable sets from positive data is a topic studied from Gold’s pioneering paper in 1967 onwards.A far reaching extension of the Gold’s approach is developed by Schmidhuber’s theory of generalized Kolmogorovcomplexities,[12] which are kinds of super-recursive algorithms.

21.2.2 Turing machines

The third mathematically based direction of inductive inference makes use of the theory of automata and compu-tation. In this context, the process of inductive inference is performed by an abstract automaton called an inductiveTuring machine (Burgin, 2005). Inductive Turing machines represent the next step in the development of computerscience providing better models for contemporary computers and computer networks (Burgin, 2001) and forming animportant class of super-recursive algorithms as they satisfy all conditions in the definition of algorithm. Namely,each inductive Turing machines is a type of effective method in which a definite list of well-defined instructions forcompleting a task, when given an initial state, will proceed through a well-defined series of successive states, eventu-ally terminating in an end-state. The difference between an inductive Turing machine and a Turing machine is that toproduce the result a Turing machine has to stop, while in some cases an inductive Turing machine can do this withoutstopping. Kleene called procedures that could run forever without stopping by the name calculation procedure oralgorithm (Kleene 1952:137). Kleene also demanded that such an algorithm must eventually exhibit “some object”(Kleene 1952:137). This condition is satisfied by inductive Turing machines, as their results are exhibited after afinite number of steps, but inductive Turing machines do not always tell at which step the result has been obtained.Simple inductive Turing machines are equivalent to other models of computation. More advanced inductive Turingmachines are much more powerful. It is proved (Burgin, 2005) that limiting partial recursive functions, trial and errorpredicates, general Turing machines, and simple inductive Turing machines are equivalent models of computation.However, simple inductive Turing machines and general Turing machines give direct constructions of computingautomata, which are thoroughly grounded in physical machines. In contrast, trial and error predicates, limiting re-cursive functions and limiting partial recursive functions present syntactic systems of symbols with formal rules fortheir manipulation. Simple inductive Turing machines and general Turing machines are related to limiting partialrecursive functions and trial and error predicates as Turing machines are related to partial recursive functions andlambda-calculus.Note that only simple inductive Turing machines have the same structure (but different functioning semantics of theoutput mode) as Turing machines. Other types of inductive Turing machines have an essentially more advancedstructure due to the structured memory and more powerful instructions. Their utilization for inference and learningallows achieving higher efficiency and better reflects learning of people (Burgin and Klinger, 2004).Some researchers confuse computations of inductive Turing machines with non-stopping computations or with infinitetime computations. First, some of computations of inductive Turing machines halt. As in the case of conventionalTuring machines, some halting computations give the result, while others do not give. Second, some non-stoppingcomputations of inductive Turing machines give results, while others do not give. Rules of inductive Turing machinesdetermine when a computation (stopping or non-stopping) gives a result. Namely, an inductive Turing machineproduces output from time to time and once this output stops changing, it is considered the result of the computation.It is necessary to know that descriptions of this rule in some papers are incorrect. For instance, Davis (2006: 128)formulates the rule when result is obtained without stopping as "… once the correct output has been produced anysubsequent output will simply repeat this correct result.” Third, in contrast to the widespread misconception, inductiveTuring machines give results (when it happens) always after a finite number of steps (in finite time) in contrast toinfinite and infinite-time computations. There are two main distinctions between conventional Turing machines and

Page 76: Inference 1

21.3. SEE ALSO 69

simple inductive Turing machines. The first distinction is that even simple inductive Turing machines can do muchmore than conventional Turing machines. The second distinction is that a conventional Turing machine always informs(by halting or by coming to a final state) when the result is obtained, while a simple inductive Turing machine in somecases does inform about reaching the result, while in other cases (where the conventional Turing machine is helpless),it does not inform. People have an illusion that a computer always itself informs (by halting or by other means) whenthe result is obtained. In contrast to this, users themselves have to decide in many cases whether the computed resultis what they need or it is necessary to continue computations. Indeed, everyday desktop computer applications likeword processors and spreadsheets spend most of their time waiting in event loops, and do not terminate until directedto do so by users.

Evolutionary inductive Turing machines

Evolutionary approach to inductive inference is accomplished by another class of automata called evolutionary in-ductive Turing machines (Burgin and Eberbach, 2009; 2012). An ‘’’evolutionary inductive Turing machine’’’ is a(possibly infinite) sequence E = {A[t]; t = 1, 2, 3, ... } of inductive Turing machines A[t] each working on gener-ations X[t] which are coded as words in the alphabet of the machines A[t]. The goal is to build a “population” Zsatisfying the inference condition. The automaton A[t] called a component, or a level automaton, of E represents(encodes) a one-level evolutionary algorithm that works with input generations X[i] of the population by applyingthe variation operators v and selection operator s. The first generation X[0] is given as input to E and is processedby the automaton A[1], which generates/produces the first generation X[1] as its transfer output, which goes to theautomaton A[2]. For all t = 1, 2, 3, ..., the automaton A[t] receives the generation X[t − 1] as its input from A[t − 1]and then applies the variation operator v and selection operator s, producing the generation X[i + 1] and sending itto A[t + 1] to continue evolution.

21.3 See also

• Algorithmic probability

• Algorithmic information theory

• Bayesian inference

• Language identification in the limit

• Inductive inference

• Inductive probability

• Mill’s methods

• Minimum description length

• Minimum message length

• Turing Machine

• For a philosophical viewpoint, see: Problem of induction and New riddle of induction

21.4 Notes[1] JJ McCall. Induction: From Kolmogorov and Solomonoff to De Finetti and Back to Kolmogorov – Metroeconomica, 2004

– Wiley Online Library.

[2] D Stork. Foundations of Occam’s razor and parsimony in learning from ricoh.com – NIPS 2001 Workshop, 2001

[3] A.N. Soklakov. Occam’s razor as a formal basis for a physical theory from arxiv.org – Foundations of Physics Letters,2002 – Springer

[4] Jose Hernandez-Orallo (1999). “Beyond the Turing Test”. Journal of Logic, Language and Information 9.

Page 77: Inference 1

70 CHAPTER 21. SOLOMONOFF’S THEORY OF INDUCTIVE INFERENCE

[5] M Hutter. On the existence and convergence of computable universal priors arxiv.org – Algorithmic Learning Theory,2003 – Springer

[6] Ming Li and Paul Vitanyi, An Introduction to Kolmogorov Complexity and Its Applications. Springer-Verlag, N.Y., 2008p339 ff.

[7] Samuel Rathmanner and Marcus Hutter. A philosophical treatise of universal induction. Entropy, 13(6):1076–1136, 2011

[8] J. Veness, K.S. Ng, M. Hutter, W. Uther, D. Silver. “A Monte Carlo AIXI Approximation” – Arxiv preprint, 2009 arxiv.org

[9] J. Veness, K.S. Ng, M. Hutter, D. Silver. “Reinforcement Learning via AIXI Approximation” Arxiv preprint, 2010 –aaai.org

[10] S. Pankov. A computational approximation to the AIXI model from agiri.org – Artificial general intelligence, 2008: pro-ceedings of …, 2008 – books.google.com

[11] Gold, E. Mark (1967). “Language identification in the limit”. Information and Control 10 (5): 447–474. doi:10.1016/S0019-9958(67)91165-5.

[12] J. Schmidhuber (2002). “Hierarchies of generalized Kolmogorov complexities and nonenumerable universal measurescomputable in the limit”. International Journal of Foundations of Computer Science 13 (4): 587–612. doi:10.1142/S0129054102001291.

21.5 References• Angluin, Dana; Smith, Carl H. (Sep 1983). “Inductive Inference: Theory and Methods”. Computing Surveys15 (3): 237–269. doi:10.1145/356914.356918.

• Burgin, M. (2005), Super-recursive Algorithms, Monographs in computer science, Springer. ISBN 0-387-95569-0

• Burgin, M., “How We Know What Technology Can Do”, Communications of the ACM, v. 44, No. 11, 2001,pp. 82–88.

• Burgin, M.; Eberbach, E., “Universality for Turing Machines, Inductive Turing Machines and EvolutionaryAlgorithms”, Fundamenta Informaticae, v. 91, No. 1, 2009, 53–77.

• Burgin, M.; Eberbach, E., “On Foundations of Evolutionary Computation: An Evolutionary Automata Ap-proach”, in Handbook of Research on Artificial Immune Systems and Natural Computing: Applying ComplexAdaptive Technologies (Hongwei Mo, Ed.), IGI Global, Hershey, Pennsylvania, 2009, 342–360.

• Burgin, M.; Eberbach, E., “Evolutionary Automata: Expressiveness and Convergence of Evolutionary Com-putation”, Computer Journal, v. 55, No. 9, 2012, pp. 1023–1029.

• Burgin, M.; Klinger, A. Experience, Generations, and Limits in Machine Learning, Theoretical ComputerScience, v. 317, No. 1/3, 2004, pp. 71–91

• Davis, Martin (2006) “The Church–Turing Thesis: Consensus and opposition]". Proceedings, Computabilityin Europe 2006. Lecture notes in computer science, 3988 pp. 125–132.

• Gasarch, W.; Smith, C. H. (1997) “A survey of inductive inference with an emphasis on queries”. Complexity,logic, and recursion theory, Lecture Notes in Pure and Appl. Math., 187, Dekker, New York, pp. 225–260.

• Hay, Nick. "Universal Semimeasures: An Introduction,” CDMTCS Research Report Series, University ofAuckland, Feb. 2007.

• Jain, Sanjay ; Osherson, Daniel ; Royer, James ; Sharma, Arun, Systems that Learn: An Introduction to LearningTheory (second edition), MIT Press, 1999.

• Kleene, Stephen C. (1952), Introduction to Metamathematics (First ed.), Amsterdam: North-Holland.

• Li Ming; Vitanyi, Paul, An Introduction to Kolmogorov Complexity and Its Applications, 2nd Edition, SpringerVerlag, 1997.

• Osherson, Daniel ; Stob, Michael ; Weinstein, Scott, Systems That Learn, An Introduction to Learning Theoryfor Cognitive and Computer Scientists, MIT Press, 1986.

Page 78: Inference 1

21.6. EXTERNAL LINKS 71

• Solomonoff, Ray J. (1999). “Two Kinds of Probabilistic Induction”. The Computer Journal 42 (4): 256.doi:10.1093/comjnl/42.4.256.

• Solomonoff, Ray (March 1964). “A Formal Theory of Inductive Inference Part I”. Information and Control 7(1): 1–22. doi:10.1016/S0019-9958(64)90223-2.

• Solomonoff, Ray (June 1964). “A Formal Theory of Inductive Inference Part II”. Information and Control 7(2): 224–254. doi:10.1016/S0019-9958(64)90131-7.

21.6 External links• An Intuitive Explanation of Solomonoff Induction - Less Wrong wiki

• Algorithmic probability - Scholarpedia

Page 79: Inference 1

Chapter 22

Square of opposition

In the system of Aristotelian logic, the square of opposition is a diagram representing the different ways in whicheach of the four propositions of the system is logically related ('opposed') to each of the others. The system is alsouseful in the analysis of syllogistic logic, serving to identify the allowed logical conversions from one type to another.

22.1 Summary

In traditional logic, a proposition (Latin: propositio) is a spoken assertion (oratio enunciativa), not the meaning of anassertion, as in modern philosophy of language and logic. A categorical proposition is a simple proposition containingtwo terms, subject and predicate, in which the predicate is either asserted or denied of the subject.Every categorical proposition can be reduced to one of four logical forms. These are:

• The so-called 'A' proposition, the universal affirmative (universalis affirmativa), whose form in Latin is 'omneS est P', usually translated as 'every S is a P'.

• The 'E' proposition, the universal negative (universalis negativa), Latin form 'nullum S est P', usually translatedas 'no S are P'.

• The 'I' proposition, the particular affirmative (particularis affirmativa), Latin 'quoddam S est P', usually trans-lated as 'some S are P'.

• The 'O' proposition, the particular negative (particularis negativa), Latin 'quoddam S non est P', usually trans-lated as 'some S are not P'.

In tabular form:Aristotle states (in chapters six and seven of the Peri hermaneias (Περὶ Ἑρμηνείας, Latin De Interpretatione, English'On Interpretation')), that there are certain logical relationships between these four kinds of proposition. He says thatto every affirmation there corresponds exactly one negation, and that every affirmation and its negation are 'opposed'such that always one of them must be true, and the other false. A pair of affirmative and negative statements he callsa 'contradiction' (in medieval Latin, contradictio). Examples of contradictories are 'every man is white' and 'not everyman is white', 'no man is white' and 'some man is white'.'Contrary' (medieval: contrariae) statements, are such that both cannot at the same time be true. Examples of theseare the universal affirmative 'every man is white', and the universal negative 'no man is white'. These cannot be trueat the same time. However, these are not contradictories because both of them may be false. For example, it is falsethat every man is white, since some men are not white. Yet it is also false that no man is white, since there are somewhite men.Since every statement has a contradictory opposite, and since a contradictory is true when its opposite is false, itfollows that the opposites of contraries (which the medievals called subcontraries, subcontrariae) can both be true,but they cannot both be false. Since subcontraries are negations of universal statements, they were called 'particular'statements by the medieval logicians.

72

Page 80: Inference 1

22.1. SUMMARY 73

Square of oppositionIn the Venn diagrams, black areas are empty and red areas are nonempty.The faded arrows and faded red areas apply in traditional logic.

Another logical opposition implied by this, though not mentioned explicitly by Aristotle, is 'alternation' (alternatio),consisting of 'subalternation' and 'superalternation'. Alternation is a relation between a particular statement and auniversal statement of the same quality such that the particular is implied by the other. The particular is the subalternof the universal, which is the particular’s superaltern. For example, if 'every man is white' is true, its contrary 'no manis white' is false. Therefore the contradictory 'some man is white' is true. Similarly the universal 'no man is white'implies the particular 'not every man is white'.[1][2]

Page 81: Inference 1

74 CHAPTER 22. SQUARE OF OPPOSITION

Depiction from the 15th century

In summary:

• Universal statements are contraries: 'every man is just' and 'no man is just' cannot be true together, althoughone may be true and the other false, and also both may be false (if at least one man is just, and at least one manis not just).

• Particular statements are subcontraries. 'Some man is just' and 'some man is not just' cannot be false together

• The particular statement of one quality is the subaltern of the universal statement of that same quality, whichis the superaltern of the particular statement, because in Aristotelian semantics 'every A is B' implies 'some Ais B' and 'no A is B' implies 'some A is not B'. Note that modern formal interpretations of English sentencesinterpret 'every A is B' as 'for any x, x is A implies x is B', which does not imply 'some x is A'. This is a matterof semantic interpretation, however, and does not mean, as is sometimes claimed, that Aristotelian logic is'wrong'.

• The universal affirmative and the particular negative are contradictories. If some A is not B, not every A isB. Conversely, though this is not the case in modern semantics, it was thought that if every A is not B, someA is not B. This interpretation has caused difficulties (see below). While Aristotle’s Greek does not representthe particular negative as 'some A is not B', but as 'not every A is B', someone in his commentary on the Perihermaneias, renders the particular negative as 'quoddam A non est B', literally 'a certain A is not a B', and inall medieval writing on logic it is customary to represent the particular proposition in this way.

Page 82: Inference 1

22.2. THE PROBLEM OF EXISTENTIAL IMPORT 75

These relationships became the basis of a diagram originating with Boethius and used by medieval logicians to classifythe logical relationships. The propositions are placed in the four corners of a square, and the relations represented aslines drawn between them, whence the name 'The Square of Opposition'.

22.2 The problem of existential import

Subcontraries, which medieval logicians represented in the form 'quoddam A est B' (some particular A is B) and'quoddam A non est B' (some particular A is not B) cannot both be false, since their universal contradictory statements(every A is B / no A is B) cannot both be true. This leads to a difficulty that was first identified by Peter Abelard.'Some A is B' seems to imply 'something is A'. For example 'Some man is white' seems to imply that at least one thingis a man, namely the man who has to be white if 'some man is white' is true. But 'some man is not white' also seemsto imply that something is a man, namely the man who is not white if 'some man is not white' is true. But Aristotelianlogic requires that necessarily one of these statements is true. Both cannot be false. Therefore (since both imply thatsomething is a man) it follows that necessarily something is a man, i.e. men exist. But (as Abelard points out, in theDialectica) surely men might not exist?[3]

For with absolutely no man existing, neither the proposition 'every man is a man' is true nor 'some manis not a man'.[4]

Abelard also points out that subcontraries containing subject terms denoting nothing, such as 'a man who is a stone',are both false.

If 'every stone-man is a stone' is true, also its conversion per accidens is true ('some stones are stone-men'). But no stone is a stone-man, because neither this man nor that man etc. is a stone. But also this'a certain stone-man is not a stone' is false by necessity, since it is impossible to suppose it is true.[5]

Terence Parsons argues that ancient philosophers did not experience the problem of existential import as only the Aand I forms had existential import.

Affirmatives have existential import, and negatives do not. The ancients thus did not see the incoherenceof the square as formulated by Aristotle because there was no incoherence to see.[6]

He goes on to cite medieval philosopher William of Ockham

In affirmative propositions a term is always asserted to supposit for something. Thus, if it supposits fornothing the proposition is false. However, in negative propositions the assertion is either that the termdoes not supposit for something or that it supposits for something of which the predicate is truly denied.Thus a negative proposition has two causes of truth.[7]

And points to Boethius' translation of Aristotle’s work as giving rise to the mistaken notion that the O form hasexistential import.

But when Boethius comments on this text he illustrates Aristotle’s doctrine with the now-famous diagram,and he uses the wording 'Some man is not just'. So this must have seemed to him to be a natural equivalentin Latin. It looks odd to us in English, but he wasn't bothered by it.[8]

22.3 Modern squares of opposition

In the 19th century, George Boole argued for requiring existential import on both terms in particular claims (I andO), but allowing all terms of universal claims (A and E) to lack existential import. This decision made Venn diagramsparticularly easy to use for term logic. The square of opposition, under this Boolean set of assumptions, is often calledthe modern Square of opposition. In the modern square of opposition, A and O claims are contradictories, as are Eand I, but all other forms of opposition cease to hold; there are no contraries, subcontraries, or subalterns. Thus, from

Page 83: Inference 1

76 CHAPTER 22. SQUARE OF OPPOSITION

Frege’s square of oppositionThe conträr below is an erratum:It should read subconträr

a modern point of view, it often makes sense to talk about “the” opposition of a claim, rather than insisting as olderlogicians did that a claim has several different opposites, which are in different kinds of opposition with the claim.Gottlob Frege's Begriffsschrift also presents a square of oppositions, organised in an almost identical manner to theclassical square, showing the contradictories, subalternates and contraries between four formulae constructed fromuniversal quantification, negation and implication.Algirdas Julien Greimas' semiotic square was derived from Aristotle’s work.

22.4 Logical hexagons and other bi-simplexes

Main article: Logical hexagon

The square of opposition has been extended to a logical hexagon which includes the relationships of six statements.It was discovered independently by both Augustin Sesmat and Robert Blanché.[9] It has been proven that both thesquare and the hexagon, followed by a "logical cube", belong to a regular series of n-dimensional objects called “logicalbi-simplexes of dimension n.” The pattern also goes even beyond this.[10]

22.5 Square of opposition (or logical square) and modal logic

The logical square, also called square of opposition or square of Apuleius has its origin in the four marked sentencesto be employed in syllogistic reasoning: Every man is white, the universal affirmative and its negation Not every manis white (or Some men are not white), the particular negative on the one hand, Some men are white, the particular

Page 84: Inference 1

22.6. SEE ALSO 77

affirmative and its negation No man is white, the universal negative on the other. Robert Blanché published with Vrinhis Structures intellectuelles in 1966 and since then many scholars think that the logical square or square of oppositionrepresenting four values should be replaced by the logical hexagon which by representing six values is a more potentfigure because it has the power to explain more things about logic and natural language.

22.6 See also• Boole’s syllogistic

• Free logic

22.7 References[1] Parry & Hacker, Aristotelian Logic (SUNY Press, 1990), p. 158.

[2] Cohen & Nagel, Introduction to Logic Second Edition (Hackett Publishing, 1993), p. 55.

[3] In his Dialectica, and in his commentary on the Perihermaneias

[4] Re enim hominis prorsus non existente neque ea vera est quae ait: omnis homo est homo, nec ea quae proponit: quidam homonon est homo

[5] Si enim vera est: Omnis homo qui lapis est, est lapis, et eius conversa per accidens vera est: Quidam lapis est homo qui estlapis. Sed nullus lapis est homo qui est lapis, quia neque hic neque ille etc. Sed et illam: Quidam homo qui est lapis, non estlapis, falsam esse necesse est, cum impossibile ponat

[6] in The Traditional Square of Opposition in the Stanford Encyclopedia of Philosophy

[7] (SL I.72) Loux 1974, 206

[8] The Traditional Square of Opposition

[9] N-Opposition Theory Logical hexagon

[10] Moretti, Pellissier

22.8 External links• The Traditional Square of Opposition entry by Terence Parsons in the Stanford Encyclopedia of Philosophy

• International Congress on the Square of Opposition

• Special Issue of Logica Universalis Vol. 2 N. 1 (2008) on the Square of Opposition

• Catlogic: An open source computer script written in Ruby to construct, investigate, and compute categoricalpropositions and syllogisms

Page 85: Inference 1

Chapter 23

Strong inference

In philosophy of science, strong inference is a model of scientific inquiry that emphasizes the need for alternativehypotheses, rather than a single hypothesis in order to avoid confirmation bias.The term “strong inference” was coined by John R. Platt,[1] a biophysicist at the University of Chicago. Platt notesthat certain fields, such as molecular biology and high-energy physics, seem to adhere strongly to strong inference,with very beneficial results for the rate of progress in those fields.

23.1 The single hypothesis problem

The problem with single hypotheses, confirmation bias, was aptly described by Thomas Chrowder Chamberlin in1897:Despite the admonitions of Platt, reviewers of grant-applications often require “A Hypothesis” as part of the proposal(note the singular). Peer-review of research can help avoid the mistakes of single-hypotheses, but only so long asthe reviewers are not in the thrall of the same hypothesis. If there is a shared enthrallment among the reviewers ina commonly believed hypothesis, then innovation becomes difficult because alternative hypotheses are not seriouslyconsidered, and sometimes not even permitted.

23.2 Strong Inference

The method, very similar to the scientific method, is described as:

1. Devising alternative hypotheses;

2. Devising a crucial experiment (or several of them), with alternative possible outcomes, each of which will, asnearly as possible, exclude one or more of the hypotheses;

3. Carrying out the experiment so as to get a clean result;

4. Recycling the procedure, making subhypotheses or sequential hypotheses to refine the possibilities that remain,and so on.

23.3 Limitations

A number of limitations of strong inference have been identified.[3][4]

23.4 Strong inference plus

The limitations of Strong-Inference can be corrected by having two preceding phases:[2]

78

Page 86: Inference 1

23.5. REFERENCES 79

1. An exploratory phase: at this point information is inadequate so observations are chosen randomly or intuitivelyor based on scientific creativity.

2. A pilot phase: in this phase statistical power is determined by replicating experiments under identical experi-mental conditions.

These phases create the critical seed observation(s) upon which one can base alternative hypotheses.[2]

23.5 References[1] John R. Platt (1964). “Strong inference”. Science 146 (3642). doi:10.1126/science.146.3642.347.

[2] Don L. Jewett (1 January 2005). “What’s wrong with single hypotheses? Why it is time for Strong-Inference-PLUS”.Scientist (Philadelphia, Pa.) 19 (21): 10. PMC 2048741. PMID 17975652.

[3] William O'Donohue and Jeffrey A Buchanan (2001). “The weaknesses of strong inference”. Behavior and Philosophy.

[4] Rowland H. Davis (2006). “Strong Inference: rationale or inspiration?". Perspectives in Biology and Medicine 49 (2):238–250. doi:10.1353/pbm.2006.0022. PMID 16702707.

Page 87: Inference 1

Chapter 24

Type inference

Type inference refers to the automatic deduction of the data type of an expression in a programming language. Ifsome, but not all, type annotations are already present it is referred to as type reconstruction. The opposite operationof type inference is called type erasure.It is a feature present in some strongly statically typed languages. It is often characteristic of, but not limited to,functional programming languages in general. Some languages that include type inference are ML, OCaml, F#,Haskell, Scala, D, Clean, Opa, Rust, Swift, Visual Basic (starting with version 9.0), C# (starting with version 3.0)and C++11. The ability to infer types automatically makes many programming tasks easier, leaving the programmerfree to omit type annotations while still permitting type checking.

24.1 Nontechnical explanation

In most programming languages, all values have a type explicitly declared at compile time, limiting the values aparticular expression can take on at run-time. Increasingly, just-in-time compilation renders the distinction betweenrun time and compile time moot. However, historically, if the type of a value is known only at run-time; theselanguages are dynamically typed. In other languages, the type of an expression is known only at compile time; theselanguages are statically typed. In statically typed languages, the input and output types of functions and local variablesordinarily must be explicitly provided by type annotations. For example, in C:int addone(int x) { int result; /* declare integer result */ result = x + 1; return result; }

The signature of this function definition, int addone(int x), declares that addone is a function that takes one argument,an integer, and returns an integer. int result; declares that the local variable result is an integer. In a hypotheticallanguage supporting type inference, the code might be written like this instead:addone(x) { var result; /* inferred-type variable result */ var result2; /* inferred-type variable result #2 */ result = x+ 1; result2 = x + 1.0; /* this line won't work (in the proposed language) */ return result; }

This is identical to how code is written in the Dart programming language except that it is subject to some additionalconstraints as described below. It would be possible to infer the types of all the variables at compile time. In theexample above, the compiler would infer that result and x have type integer and addone is a function int -> int. Thevariable result2 isn't used in a legal manner, so it wouldn't have a type.In the imaginary language in which the last example is written, the compiler would assume that, in the absence ofinformation to the contrary, + takes two integers and returns one integer. (This is how it works in, for example,OCaml). From this, the type inferencer can infer that the type of x + 1 is an integer, which means result is an integerand thus the return value of addone is an integer. Similarly, since + requires that both of its arguments be of the sametype, x must be an integer, and therefore addone accepts one integer as an argument.However, in the subsequent line, result2 is calculated by adding a decimal “1.0” with floating-point arithmetic, causinga conflict in the use of x for both integer and floating-point expressions. The correct type-inference algorithm for sucha situation has been known since 1958 and has been known to be correct since 1982. It revisits the prior inferencesand utilizes the most general type from the outset: in this case floating-point. Frequently, however, degenerate type-

80

Page 88: Inference 1

24.2. TECHNICAL DESCRIPTION 81

inference algorithms are used that are incapable of backtracking and instead generate an error message in such asituation. An algorithm of intermediate generality implicitly declares result2 as a floating-point variable, and theaddition implicitly converts x to a floating point. This can be correct if the calling contexts never supply a floatingpoint argument. Such a situation shows the difference between type inference, which does not involve type conversion,and implicit type conversion, which forces data to a different data type, often without restrictions.The recent emergence of just-in-time compilation allows for hybrid approaches where the type of arguments suppliedby the various calling context is known at compile time, and can generate a large number of compiled versions of thesame function. Each compiled version can then be optimized for a different set of types. For instance, JIT compilationallows there to be at least two compiled versions of addone:

A version that accepts an integer input and uses implicit type conversion.A version that accepts a floating-point number as input and utilizes floating point instructions throughout.

24.2 Technical description

Type inference is the ability to automatically deduce, either partially or fully, the type of an expression at compiletime. The compiler is often able to infer the type of a variable or the type signature of a function, without explicittype annotations having been given. In many cases, it is possible to omit type annotations from a program completelyif the type inference system is robust enough, or the program or language is simple enough.To obtain the information required to infer the type of an expression, the compiler either gathers this informationas an aggregate and subsequent reduction of the type annotations given for its subexpressions, or through an implicitunderstanding of the type of various atomic values (e.g. true : Bool; 42 : Integer; 3.14159 : Real; etc.). It is throughrecognition of the eventual reduction of expressions to implicitly typed atomic values that the compiler for a typeinferring language is able to compile a program completely without type annotations.In the case of complex forms of higher-order programming and polymorphism, it is not always possible for thecompiler to infer as much, however, and type annotations are occasionally necessary for disambiguation. For instance,type inference with polymorphic recursion is known to be undecidable. Furthermore, explicit type annotations canbe used to optimize code by forcing the compiler to use a more specific (faster/smaller) type than it had inferred.[1]

From a program analysis point of view, type inference is a special case of points-to analysis that uses a type abstractionon pointer targets.

24.3 Example

For example, let us consider the Haskell function map, which applies a function to each element of a list, and may bedefined as:map f [] = [] map f (first:rest) = f first : map f rest

Type inference on the map function proceeds (intuitively) as follows. map is a function of two arguments, so its typeis constrained to be of the form a → b → c. In Haskell, the patterns [] and (first:rest) always match lists, so the secondargument must be a list type: b = [d] for some type d. Its first argument f is applied to the argument first, which musthave type d, corresponding with the type in the list argument, so f :: d → e (:: means “is of type”) for some type e.The return value of map f, finally, is a list of whatever f produces, so [e]Putting the parts together, we obtain map :: (d → e) → [d] → [e]. Nothing is special about the type variables, so wecan simply relabel this asmap :: (a → b) → [a] → [b]

It turns out that this is also the most general type, since no further constraints apply. Note that the inferred typeof map is parametrically polymorphic: The type of the arguments and results of f are not inferred, but left as typevariables, and so map can be applied to functions and lists of various types, as long as the actual types match in eachinvocation.

Page 89: Inference 1

82 CHAPTER 24. TYPE INFERENCE

24.4 Hindley–Milner type inference algorithm

Main article: Hindley–Milner type system

The algorithm first used to perform type inference is now informally referred to as the Hindley–Milner algorithm,although the algorithm should properly be attributed to Damas and Milner.[2]

The origin of this algorithm is the type inference algorithm for the simply typed lambda calculus, which was devisedby Haskell Curry and Robert Feys in 1958. In 1969 J. Roger Hindley extended this work and proved that theiralgorithm always inferred the most general type. In 1978 Robin Milner,[3] independently of Hindley’s work, providedan equivalent algorithm, Algorithm W. In 1982 Luis Damas[2] finally proved that Milner’s algorithm is complete andextended it to support systems with polymorphic references.

24.5 References[1] Bryan O'Sullivan; Don Stewart; John Goerzen (2008). “Chapter 25. Profiling and optimization”. Real World Haskell.

O'Reilly.

[2] Damas, Luis; Milner, Robin (1982), “Principal type-schemes for functional programs”, POPL '82: Proceedings of the 9thACM SIGPLAN-SIGACT symposium on Principles of programming languages, ACM, pp. 207–212

[3] Milner, Robin (1978), “A Theory of Type Polymorphism in Programming”, Jcss 17: 348–375

24.6 External links• Archived e-mail message by Roger Hindley, explains history of type inference

• Polymorphic Type Inference by Michael Schwartzbach, gives an overview of Polymorphic type inference.

• Basic Typechecking paper by Luca Cardelli, describes algorithm, includes implementation in Modula-2

• Implementation of Hindley-Milner type inference in Scala, by Andrew Forrest (retrieved July 30, 2009)

• Implementation of Hindley-Milner in Perl 5, by Nikita Borisov at the Wayback Machine (archived February18, 2007)

• What is Hindley-Milner? (and why is it cool?) Explains Hindley-Milner, examples in Scala

Page 90: Inference 1

Chapter 25

Uncertain inference

Uncertain inference was first described by C. J. van Rijsbergen[1] as a way to formally define a query and docu-ment relationship in Information retrieval. This formalization is a logical implication with an attached measure ofuncertainty.

25.1 Definitions

Rijsbergen proposes that the measure of uncertainty of a document d to a query q be the probability of its logicalimplication, i.e.:P (d → q)

A user’s query can be interpreted as a set of assertions about the desired document. It is the system’s task to infer,given a particular document, if the query assertions are true. If they are, the document is retrieved. In many cases thecontents of documents are not sufficient to assert the queries. A knowledge base of facts and rules is needed, but someof them may be uncertain because there may be a probability associated to using them for inference. Therefore, wecan also refer to this as plausible inference. The plausibility of an inference d → q is a function of the plausibility ofeach query assertion. Rather than retrieving a document that exactly matches the query we should rank the documentsbased on their plausibility in regards to that query. Since d and q are both generated by users, they are error prone;thus d → q is uncertain. This will affect the plausibility of a given query.By doing this it accomplishes two things:

• Separate the processes of revising probabilities from the logic• Separate the treatment of relevance from the treatment of requests

Multimedia documents, like images or videos, have different inference properties for each datatype. They are alsodifferent from text document properties. The framework of plausible inference allows us to measure and combine theprobabilities coming from these different properties.Uncertain inference generalizes the notions of autoepistemic logic, where truth values are either known or unknown,and when known, they are true or false.

25.2 Example

If we have a query of the form:q = A ∧B ∧ C

where A, B and C are query assertions, then for a document D we want the probability:P (D → (A ∧B ∧ C))

If we transform this into the conditional probability P ((A ∧B ∧C)|D) and if the query assertions are independentwe can calculate the overall probability of the implication as the product of the individual assertions probabilities.

83

Page 91: Inference 1

84 CHAPTER 25. UNCERTAIN INFERENCE

25.3 Further work

Croft and Krovetz[2] applied uncertain inference to an information retrieval system for office documents they calledOFFICER. In office documents the independence assumption is valid since the query will focus on their individualattributes. Besides analysing the content of documents one can also query about the author, size, topic or collectionfor example. They devised methods to compare document and query attributes, infer their plausibility and combineit into an overall rating for each document. Besides that uncertainty of document and query contents also had to beaddressed.Probabilistic logic networks is a system for performing uncertain inference; crisp true/false truth values are replacednot only by a probability, but also by a confidence level, indicating the certitude of the probability.Markov logic networks allow uncertain inference to be performed; uncertainties are computed using the maximumentropy principle, in analogy to the way that Markov chains describe the uncertainty of finite state machines.

25.4 See also• Fuzzy logic

• Probabilistic logic

• Plausible reasoning

• Imprecise probability

25.5 References[1] C. J. van Rijsbergen (1986), A non-classical logic for information retrieval, The Computer Journal, pp. 481–485

[2] W. B. Croft; R. Krovetz (1988), Interactive retrieval office documents

Page 92: Inference 1

Chapter 26

Veridicality

In linguistics, veridicality is a semantic or grammatical assertion of the truth of an utterance. For example, thestatement “Paul saw a snake” asserts the truthfulness of the claim, while “Paul did see a snake” is an even strongerassertion. Negation is veridical, though of opposite polarity, sometimes called antiveridical: “Paul didn't see a snake”asserts that the statement “Paul saw a snake” is false. In English, non-indicative moods are frequently used in anonveridical sense: “Paul may have seen a snake” and “Paul would have seen a snake” do not assert that Paul actuallysaw a snake (and the second implies that he did not), though “Paul would indeed have seen a snake” is veridical, andsome languages have separate veridical conditional moods for such cases.

26.1 Veridicality in semantic theory

The formal definition of veridicality views the context as a propositional operator.

1. A propositional operator F is veridical iff Fp entails p: Fp → p; otherwise F is nonveridical.

2. Additionally, a nonveridical operator F is antiveridical iff Fp entails not p: Fp → ¬p.

For temporal and aspectual operators, the definition of veridicality is somewhat more complex:

• For operators relative to instants of time: Let F be a temporal or aspectual operator, and t an instant of time.

1. F is veridical iff for Fp to be true at time t, p must be true at a (contextually relevant) time t′ ≤ t; otherwiseF is nonveridical.

2. A nonveridical operator F is antiveridical iff for Fp to be true at time t, ¬p must be true at a (contextuallyrelevant) time t′ ≤ t.

• For operators relative to intervals of time: Let F be a temporal or aspectual operator, and t an interval of time.

1. F is veridical iff for Fp to be true of t, p must be true of all (contextually relevant) t′ ⊆ t; otherwise F isnonveridical.

2. A nonveridical operator F is antiveridical iff for Fp to be true of t, ¬p must be true of all (contextuallyrelevant) t′ ⊆ t.

26.1.1 Analysis

Sri Dharma Pravartaka Acharya (Dr. Frank Morales, PhD) originated the term “Veridical Analysis” to suggest that asemantic argument needs to be both structurally sound and essentially true in order to be “consistent with the realityof the situation under question.”[1] In his dissertation, he explains: “for general Indian logic, arguments must besound (true) in addition to being valid. Propositional analysis is a method for determining whether x truth-claimis structurally valid within the context of formal logical principles. Veridical analysis seeks to know, additionally,whether x truth-claim corresponds with the truth of reality. E.G.: A) All Leprechauns are Deontologists; B) Matthew

85

Page 93: Inference 1

86 CHAPTER 26. VERIDICALITY

is a Leprechaun; C) Therefore Matthew is a Deontologist. While such a claim is structurally sound, it is also not true,given the generally accepted non-existence of leprechauns.”Although Dharma Pravartaka’s work emphasizes mainly on studying metaphysical epistemology, this epistemologicalmethod can be applied to any field of truth-claiming. For example, a claim regarding notions arising from statistics:

Nation A has a population of one billion.Nation B has a population of eight persons.Nation A has twenty million criminals.Nation B has one criminal.

Without the implementation of veridical analysis, one who overlooks the fact that nation A has a significantly largerpopulation than nation B, may mistakenly conclude that nation A must be a very dangerous place, just by looking atthe sheer number of criminals compared to the latter, (See also: Hasty generalization) rather than acknowledging thereality of the situation under question, namely that a nation very large in population will almost under all circumstanceshave more criminals than a nation which has the population of a large two-family house.

26.1.2 Nonveridical operators

Nonveridical operators typically license the use of polarity items, which in veridical contexts normally is ungrammat-ical:

* John saw any students. (The context is veridical.)John didn't see any students. (The context is nonveridical.)

26.1.3 Downward entailment

All downward entailing contexts are nonveridical. Because of this, theories based on nonveridicality can be seen asextending those based on downward entailment, allowing to explain more cases of PI licensing.Downward entailment predicts that polarity items will be licensed in the scope of negation, downward entailingquantifiers like few N, at most n N, no N, and the restriction of every:

No students saw anything.John didn't see anything.Few children saw anything.Every student who saw anything should report to the police.

26.1.4 Non-monotone quantifiers

Quantifiers like exactly three students, nobody but John, and almost nobody are non-monotone (and thus not downwardentailing) but nevertheless admit any:

% Exactly three students saw anything.Nobody but John saw anything.Almost nobody saw anything.

26.1.5 Hardly and barely

Hardly and barely allow for any despite not being downward entailing.

John hardly talked to anybody. (Does not entail “John hardly talked to his mother”.)John barely studied anything. (Does not entail “John barely studied linguistics”.)

Page 94: Inference 1

26.1. VERIDICALITY IN SEMANTIC THEORY 87

26.1.6 Questions

Polarity items are quite frequent in questions, although questions are not monotone.

Did you see anything?

Although questions biased towards the negative answer, such as “Do you [even] give a damn about any books?" (tagquestions based on negative sentences exhibit even more such bias), can sometimes be seen as downward entailing,this approach cannot account for the general case, such as the above example where the context is perfectly neutral.Neither can it explain why negative questions, which naturally tend to be biased, don't license negative polarity items.In semantics which treats a question as the set of its true answers, the denotation of a polar question contains twopossible answers:

[[Did you see John?]] = { you saw John ∨ you didn't see John }

Because disjunction p ∨ q entails neither p nor q, the context is nonveridical, which explains the admittance of any.

26.1.7 Future

Polarity items appear in future sentences.

John will buy any bottle of wine.The children will leave as soon as they discover anything.

According to the formal definition of veridicality for temporal operators, future is nonveridical: that “John will buya bottle of Merlot” is true now does not entail that “John buys a bottle of Merlot” is true at any instant up to andincluding now. On the other hand, past is veridical: that “John bought a bottle of Merlot” is true now entails thatthere is an instant preceding now at which “John buys a bottle of Merlot” is true.

26.1.8 Habitual aspect

Likewise, nonveridicality of the habitual aspect licenses polarity items.

He usually reads any book very carefully.

The habitual aspect is nonveridical because e.g., that “He is usually cheerful” is true over some interval of time doesnot entail that “He is cheerful” is true over every subinterval of that. This is in contrast to e.g., the progressive aspect,which is veridical and prohibits negative polarity items.

26.1.9 Generic sentences

Non-monotone generic sentences accept polarity items.

Any cat hunts mice.

26.1.10 Modal verbs

Modal verbs create generally good environments for polarity items:

John may talk to anybody.Any minors must be accompanied by their parents.The committee can give the job to any candidate.

Such contexts are nonveridical despite being non-monotone and sometimes even upward entailing (“John must tango”entails “John must dance”).

Page 95: Inference 1

88 CHAPTER 26. VERIDICALITY

26.1.11 Imperatives

imperatives are roughly parallel to modal verbs and intensional contexts in general.

Take any apple. (cf. “You may/must take any apple”, “I want you to take any apple”.)

26.1.12 Protasis of conditionals

Protasis of conditionals is one of the most common environments for polarity items.

If you sleep with anybody, I'll kill you.

26.1.13 Directive intensional verbs

Polarity items are licensed with directive propositional attitudes but not with epistemic ones.

John would like to invite any student.John asked us to invite any student.* John believes that we invited any student.* John dreamt that we invited any student.

26.2 References[1] Sri Dharma Pravartaka Acharya (2010). The Vedic Way of Knowing God. Dharma Sun Media. p. 25. Retrieved 7

September 2014.

• Giannakidou, Anastasia (2002). “Licensing and Sensitivity in Polarity Items: From Downward Entailmentto Nonveridicality” (PDF format; Adobe Acrobat required). In Andronis, Maria; Pycha, Anne; Yoshimura,Keiko. CLS 38: Papers from the 38th Annual Meeting of the Chicago Linguistic Society, Parasession on Polarityand Negation. Retrieved December 15, 2011.

Page 96: Inference 1

26.3. TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES 89

26.3 Text and image sources, contributors, and licenses

26.3.1 Text• Adverse inference Source: http://en.wikipedia.org/wiki/Adverse_inference?oldid=656917453 Contributors: The Anome, NuclearWin-

ner, PBS, Alan Liefting, Edcolins, TeaDrinker, Nikkimaria, SmackBot, Gregbard, JaGa, Superbeecat, Dubitante, 7&6=thirteen, Addbot,Bellemonde, ZéroBot and Anonymous: 3

• Arbitrary inference Source: http://en.wikipedia.org/wiki/Arbitrary_inference?oldid=640198401 Contributors: Bearcat, Rich Farm-brough, Grutness, Gregbard, Thijs!bot, 1000Faces, The Founders Intent, Addbot, Helpful Pixie Bot and Anonymous: 1

• Biological network inference Source: http://en.wikipedia.org/wiki/Biological_network_inference?oldid=659848232Contributors: Thor-wald, RJFJR, Rjwilmsi, Biochemza, Hughitt1, SmackBot, Gregbard, Headbomb, JPG-GR, Jvhertum, Anaxial, Rtaylor pnnl, DOI bot,Jncraton, Yobot, Citation bot, J04n, Gdrahnier, Citation bot 1, Trappist the monk, Razor2988, BabbaQ, ClueBot NG, Monkbot, Puccio.band Anonymous: 6

• Constraint inference Source: http://en.wikipedia.org/wiki/Constraint_inference?oldid=646826730 Contributors: Art LaPella, Tizio,JRSpriggs, Eastlaw, Gregbard, Cydebot and Monkbot

• Correspondent inference theory Source: http://en.wikipedia.org/wiki/Correspondent_inference_theory?oldid=659863648 Contribu-tors: Kku, Karada, Oliver Crow, Mikeo, Betsythedevine, Rjwilmsi, Richardbondi, SmackBot, Elonka, Jtneill, Sadads, Tim buckley,JHunterJ, Courcelles, Gregbard, Mattisse, JaGa, Lova Falk, Neha7, Ettrig, DrilBot, Lam Kin Keung, JeepdaySock, Helpful Pixie Bot,Psycoaiko and Anonymous: 17

• Deep inference Source: http://en.wikipedia.org/wiki/Deep_inference?oldid=527457726 Contributors: Michael Hardy, Silverfish, Stein-sky, Chalst, Oleg Alexandrov, CBM, Gregbard, Yobot and Anonymous: 3

• Dictum de omni et nullo Source: http://en.wikipedia.org/wiki/Dictum_de_omni_et_nullo?oldid=635731299 Contributors: Gobonobo,Iridescent, Gregbard, Cydebot, R'n'B, UnCatBot, Burket, Addbot, AgadaUrbanit, Luckas-bot, Helpful Pixie Bot, BurkeFT and Anony-mous: 3

• Downward entailing Source: http://en.wikipedia.org/wiki/Downward_entailing?oldid=604850753 Contributors: Michael Hardy, Augur,Stevey7788, Qwertyus, Volfy, Epolk, SmackBot, Imz, Pvodenski, Alexey Feldgendler, Gregbard, Hharley, Erik9bot, Lam Kin Keung,KellerST, Tomtung and Anonymous: 10

• Grammar induction Source: http://en.wikipedia.org/wiki/Grammar_induction?oldid=661963338 Contributors: Delirium, Aabs, JimHorning, NTiOzymandias, MCiura, Marudubshinki, Rjwilmsi, Koavf, SmackBot, Took, Bluebot, Rizzardi, Antonielly, Dfass, Hukkinen,Gregbard, Wikid77, Bobblehead, Erxnmedia, Tremilux, Stassa, Mgalle, KoenDelaere, Aclark17, 1ForTheMoney, Bility, Hiihammuk,Josve05a, Chire, KLBot2, BG19bot, Jochen Burghardt, Superploro and Anonymous: 7

• Implicature Source: http://en.wikipedia.org/wiki/Implicature?oldid=660643237Contributors: Radgeek, Andycjp, Lucky13pjn, Burschik,Jim Henry, Sfeldman, Rich Farmbrough, El C, Reinyday, Flamingspinach, KYPark, Authr, Mo-Al, The wub, FlaBot, Trickstar, Smack-Bot, Imz, Antonielly, Sjf, Iridescent, Thomasmeeks, Gregbard, Sloth monkey, DumbBOT, Knakts, Dawnseeker2000, Silver Edge, Bong-warrior, Yakushima, Nimic86, Nieske, PubliusNemo, Kaffeeringe.de, DragonBot, Addbot, Americanlinguist, GrouchoBot, Teamprag,Citation bot 1, Pallerti, Hriber, Shpowell, Whisky drinker, Tyranny Sue, ZéroBot, ClueBot NG, Implyer, Helpful Pixie Bot, Klas Katt,Darigon Jr., Epicgenius, Monkbot and Anonymous: 32

• Inductive functional programming Source: http://en.wikipedia.org/wiki/Inductive_functional_programming?oldid=594918884 Con-tributors: RadioFan, Fram and Superploro

• Inductive probability Source: http://en.wikipedia.org/wiki/Inductive_probability?oldid=653106098Contributors: Michael Hardy, Table-top, BiH, CmdrObot, Sunrise, Yobot, LilHelpa, John of Reading, Thepigdog, BG19bot, Mogism, OccultZone, Monkbot and Anonymous:1

• Inference Source: http://en.wikipedia.org/wiki/Inference?oldid=664073695 Contributors: The Anome, Edward, Michael Hardy, Kku,SebastianHelm, Angela, BAxelrod, EdH, Ww, Dysprosia, Markhurd, Furrykef, Hyacinth, Phil Boswell, Robbot, Shoesfullofdust, Giftlite,0x6D667061, Snowdog, Bovlb, Alan Au, Neilc, Andycjp, Gzuckier, Slartoff, Chapplek, Icairns, Discospinster, Vsmith, Bender235,Haxwell, Ntmatter, Sbarthelme, MPerel, Alansohn, Arthena, Bart133, Computerjoe, Camw, Ruud Koot, Paxsimius, BD2412, Men-daliv, Rjwilmsi, Koavf, Authr, Bhadani, Amelio Vázquez, FlaBot, Twipley, Crazycomputers, Ewlyahoocom, Chobot, Bgwhite, YurikBot,Phantomsteve, Rick Norwood, Ziel, Robert McClenon, Amakuha, Hans Oesterholt, Andrew Lancaster, Fang Aili, GraemeL, Leonar-doRob0t, Infinity0, Zvika, DVD R W, Sardanaphalus, SmackBot, KnowledgeOfSelf, NickShaforostoff, Rajah9, Gilliam, Chris the speller,MalafayaBot, Therandreedgroup, Go For It, Frap, Kjetil1001, DéRahier, KerathFreeman, Stevenmitchell, Cybercobra, StephenReed,Jon Awbrey, SashatoBot, General Ization, Gobonobo, Kransky, 16@r, Hetar, Iridescent, K, Igoldste, Tawkerbot2, JForget, Wolfdog,Gregbard, Eu.stefan, Julian Mendez, Scolobb, Letranova, Epbr123, Headbomb, Marek69, John254, Odoncaoa, Sean William, LachlanA,Mentifisto, Gioto, Venar303~enwiki, Albany NY, PhilKnight, VoABot II, Rederiksen, Twsx, Caesarjbsquitti, Whoop whoop, JaGa, Mar-tinBot, J.delanoy, Trusilver, Bogey97, Ginsengbomb, Cpiral, Chiswick Chap, NewEnglandYankee, Nadiatalent, Hanacy, Juliancolton,ACSE, VolkovBot, Aesopos, Lradrama, Philogo, Sylvank, C Chiara, Andy Dingley, Graymornings, Lova Falk, Spinningspark, Cnilep,Paracit, RHaden, Seraphita~enwiki, Newbyguesses, SieBot, Tiddly Tom, Storytellershrink, Exert, Oxymoron83, Iain99, CharlesGilling-ham, Melcombe, Escape Orbit, Sfan00 IMG, ClueBot, GorillaWarfare, The Thing That Should Not Be, Arakunem, Tomas e, Mild BillHiccup, CounterVandalismBot, Sambitbikaspal, PhySusie, AaronNGray, Versus22, Eroenj, Qwfp, Stickee, Gerhardvalentin, Badgernet,HexaChord, Addbot, Fgnievinski, Ronhjones, DutchDevil, Jblondin, Tide rolls, BrianKnez, JakobVoss, Legobot, Yobot, Tamiasciurus,Karnpatel18, IW.HG, Eric-Wester, Jim1138, IRP, Darolew, Ulric1313, Materialscientist, E2eamon, TheAMmollusc, Intelati, Capri-corn42, Forring, Grim23, Tuxponocrates, Govindjsk, Lancioni, Olexa Riznyk, Intelligentsium, Pinethicket, Ashimashi, Nurefsan, Aque-ousmatt, TBloemink, Hentzde, Mknomad5, Hriber, MegaSloth, El Mayimbe, DARTH SIDIOUS 2, Dhburns, Mean as custard, Johnof Reading, Honestrosewater, Jake, AsceticRose, Scandizzzle, Anir1uph, Fixblor, Lynette2c, Mr legumoto, Donner60, Peter Karlsen,Xanchester, ClueBot NG, Run54, Satellizer, Bped1985, Kevin Gorman, Masssly, Widr, Helpful Pixie Bot, Craighawkinson, Rm1271,BattyBot, GoShow, EuroCarGT, Davidlwinkler, Jochen Burghardt, Milesandkilometrestogo, Lockfox, I am One of Many, Harlem BakerHughes, DavidLeighEllis, 126 rules, Acschenkel, Sam Sailor, Thennicke, Writers Bond, Monkbot, Jasminemarie647, LadyLeodia, Vol-wen, KasparBot and Anonymous: 367

Page 97: Inference 1

90 CHAPTER 26. VERIDICALITY

• Inference engine Source: http://en.wikipedia.org/wiki/Inference_engine?oldid=659754630Contributors: B4hand, Michael Hardy, Ronz,Khym Chanur, Fredrik, Chocolateboy, Rursus, Abdull, Dan Gan, Viriditas, Burn, Fountainofignorance, Linas, Waldir, Lockley, Ceefour,YurikBot, Bovineone, SmackBot, Tom Lougheed, Verne Equinox, Chris the speller, Bluebot, Jerome Charles Potts, Veggies, A5b, 16@r,Kompere, Pvlasov, Switchercat, Gregbard, Jmchauvet, AntiVandalBot, Joydurgin, Parveson, Fordescort79, Jeff G., Technicalganesh,VanishedUserABC, Taemyr, Geldsack, Afluegge, DragonBot, Mdebellis, Rhododendrites, Cpoizat, Addbot, Luckas-bot, ChristopheS,AKappa, Erik9bot, Wireless Keyboard, LakeofConstance, Roboo.jack, GoingBatty, Njsg, AvicBot, Tutelary, MadScientistX11, Monkbot,NuteGunraysFace and Anonymous: 38

• Inference objection Source: http://en.wikipedia.org/wiki/Inference_objection?oldid=611341030 Contributors: Grumpyyoungman01,Rmessenger, Gregbard, Ohms law, DumZiBoT, Yobot, Meteor sandwich yum and Anonymous: 3

• Logical hexagon Source: http://en.wikipedia.org/wiki/Logical_hexagon?oldid=657214550 Contributors: Fuzzypeg, Gregbard, Epsilon0,Ontoraul, Stpasta, Machine Elf 1735, Jean KemperNN, Jochen Burghardt and Anonymous: 5

• Material inference Source: http://en.wikipedia.org/wiki/Material_inference?oldid=591780756Contributors: Gregbard, Ironholds, Legobot,ChrisGualtieri and Jochen Burghardt

• Resolution inference Source: http://en.wikipedia.org/wiki/Resolution_inference?oldid=598608676 Contributors: Michael Hardy, Greg-bard, Ceilican, BG19bot, Ezequiel234 and Anonymous: 1

• Rule of inference Source: http://en.wikipedia.org/wiki/Rule_of_inference?oldid=654518943 Contributors: Michael Hardy, Darkwind,Poor Yorick, Rossami, BAxelrod, Hyacinth, Ldo, Timrollpickering, Markus Krötzsch, Jason Quinn, Khalid hassani, Neilc, Quadell,CSTAR, Lucidish, MeltBanana, Elwikipedista~enwiki, EmilJ, Nortexoid, Giraffedata, Joriki, Ruud Koot, Hurricane Angel, Waldir,BD2412, Kbdank71, Emallove, Brighterorange, Algebraist, YurikBot, Rsrikanth05, Cleared as filed, Arthur Rubin, Fram, Nahaj, Elwoodj blues, Mhss, Chlewbot, Byelf2007, ArglebargleIV, Robofish, Tktktk, Jim.belk, Physis, JHunterJ, Grumpyyoungman01, Dan Gluck,CRGreathouse, CBM, Simeon, Gregbard, Cydebot, Thijs!bot, Epbr123, LokiClock, TXiKiBoT, Cliff, Eusebius, Addbot, Luckas-bot,AnomieBOT, Citation bot, GrouchoBot, RibotBOT, WillMall, Undsoweiter, Jonesey95, Gamewizard71, Onel5969, TomT0m, Tesser-act2, Tijfo098, ClueBot NG, Delphinebbd, Ginsuloft and Anonymous: 27

• Scalar implicature Source: http://en.wikipedia.org/wiki/Scalar_implicature?oldid=639090532Contributors: Topbanana, BD2412, Rjwilmsi,ENeville, Gregbard, Future Perfect at Sunrise, Hamaryns, Iain99, Ascidian, Rodhullandemu, Simplebutpowerful, Ochib, Suntag, Amer-icanlinguist, Citation bot 1, Joost.b, Pollinosisss, Widr, ChrisGualtieri, Monkbot and Anonymous: 7

• Solomonoff’s theory of inductive inference Source: http://en.wikipedia.org/wiki/Solomonoff’{}s_theory_of_inductive_inference?oldid=656685381 Contributors: Michael Hardy, Den fjättrade ankan~enwiki, Henrygb, Randomness~enwiki, Giftlite, Ben Standeven, Per-fecto, Ziggurat, Linas, Rjwilmsi, Anomie, Benja, A bit iffy, Byelf2007, Grumpyyoungman01, CRGreathouse, Gregbard, ElPoojmar,Baccyak4H, TheSeven, Touisiau, Terpsichoreus, Melcombe, Kbdankbot, Multipundit, Addbot, Yobot, AnomieBOT, Vivohobson, AlanDawrst, Miracle Pen, RjwilmsiBot, Arinelle, Logical Cowboy, 478jjjz, Erianna, Albertttt, Thepigdog, Helpful Pixie Bot, BG19bot, Bat-tyBot, Barney the barney barney, Jochen Burghardt, 90b56587, BreakfastJr, François Robere, Laundrevity, Alexiswolfish, Monkbot andAnonymous: 14

• Square of opposition Source: http://en.wikipedia.org/wiki/Square_of_opposition?oldid=647443731 Contributors: JohnOwens, Kku,Evercat, Sethmahoney, Renamed user 4, Markhurd, Hyacinth, Giftlite, Utcursch, Liflon, Paul August, Chalst, Shamilton, John Vanden-berg, Mdd, Cscott, Reinis, Mysid, Light current, SmackBot, Mhss, Skomae, Acepectif, Ohconfucius, Harryboyles, JzG, Ckatz, Cm-drObot, Gregbard, Cydebot, Viridae, Gimmetrow, Thijs!bot, Bmorton3, Amarkov, Matthew Fennell, Sunderland06, LokiClock, On-toraul, Ggenellina, Cap'n rye, Chiba13, Barbara Partee, SieBot, Asday85, Mild Bill Hiccup, Watchduck, Johnuniq, DumZiBoT, Savabub-ble, NellieBly, Addbot, Atethnekos, MrOllie, DvK, AnomieBOT, Jim1138, Dante Cardoso Pinto de Almeida, Luis Felipe Schenone, PeterDamian, Pamdhiga, Machine Elf 1735, December21st2012Freak, Ammimajus, ClueBot NG, Jean KemperNN, Movses-bot, Quisquiliae,Masssly, George Ponderevo, Brad7777, Square of opposition, Blessed Duns Scotus, Katz C-ing U, Sum of Medieval Logic, John DunsScotus, John Dunz Scotuz, Root of opposition, Molly Buckner, Edward Fullerene, Squidward Buckner, Petrus Daemonicus, LondonGreyfriars, Mr. Guye, Kephir, William Scotus, Scuns Dotus, Cake Merridew and Anonymous: 48

• Strong inference Source: http://en.wikipedia.org/wiki/Strong_inference?oldid=655372796 Contributors: Jokestress, McCart42, RichFarmbrough, David Schaich, Bender235, Orlady, Rjwilmsi, The Rambling Man, Retired username, SmackBot, Tim bates, Falk Lieder,Pgr94, Gregbard, MarshBot, Arno Matthias, Mainmre, Ravenna1961, DonLJewett, DOI bot, Yobot, Citation bot, NiginiOliveira andAnonymous: 2

• Type inference Source: http://en.wikipedia.org/wiki/Type_inference?oldid=637576532Contributors: Damian Yerrick, MarXidad, B4hand,Axlrosen, LittleDan, Dysprosia, Zoicon5, Jackson~enwiki, Ruakh, EvanED, Tobias Bergemann, Ancheta Wis, Connelly, Leonard G.,Jabowery, Neilc, Ascánder, Spayrard, Euyyn, R. S. Shaw, Koper, Alansohn, Nighthawk4211, Ruud Koot, LinkTiger, Marudubshinki,Qwertyus, TheLaughingMan, Gfxmonk, Rjwilmsi, ErikHaugen, Debajit, YurikBot, Gaius Cornelius, Dogcow, SamuelRiv, Cedar101,That Guy, From That Show!, SmackBot, Aardvark92, Mgreenbe, Episteme-jp, Cybercobra, Almkglor, Jhammerb, Talandor, JonathanS. Shapiro, Isaacdealey, Gregbard, Torc2, Thijs!bot, Oerjan, Igodard, Magioladitis, Gwern, Kyralessa, SparsityProblem, Semi Virgil,Daniel5Ko, Owengibbins, Jerryobject, AncientPC, Classicalecon, Adrianwn, Excirial, PixelBot, ChuckEsterbrook, Addbot, Ghetto-blaster, Gasper.azman, Jarble, Peni, Yobot, Ptbotgourou, Ljaun, MrBlueSky, Rubinbot, Citation bot, FrescoBot, Денис Владимирович,Sae1962, Citation bot 1, RandomDSdevel, MastiBot, Francis Lima, ProjectSHiNKiROU, Gabaix, Gf uip, GoingBatty, ClueBot NG,Clegoues, Helpful Pixie Bot, Cobalt pen, 786b6364, MatejLach and Anonymous: 98

• Uncertain inference Source: http://en.wikipedia.org/wiki/Uncertain_inference?oldid=666715803 Contributors: Kku, Beland, Linas,Chris the speller, Gregbard, Krishnachandranvn, RjwilmsiBot, Riclas and Anonymous: 1

• Veridicality Source: http://en.wikipedia.org/wiki/Veridicality?oldid=628591778 Contributors: Kwamikagami, BD2412, Bgwhite, Un-kleFester, Racklever, Alexey Feldgendler, Gregbard, Cydebot, Yobot, FrescoBot, Cerabot~enwiki, Esemee2, DharmaUser and Anony-mous: 5

26.3.2 Images• File:Ambox_important.svg Source: https://upload.wikimedia.org/wikipedia/commons/b/b4/Ambox_important.svg License: Public do-

main Contributors: Own work, based off of Image:Ambox scales.svg Original artist: Dsmurat (talk · contribs)

Page 98: Inference 1

26.3. TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES 91

• File:Arithmetic_symbols.svg Source: https://upload.wikimedia.org/wikipedia/commons/a/a3/Arithmetic_symbols.svg License: Publicdomain Contributors: Own work Original artist: This vector image was created with Inkscape by Elembis, and then manually replaced.

• File:Brain.png Source: https://upload.wikimedia.org/wikipedia/commons/7/73/Nicolas_P._Rougier%27s_rendering_of_the_human_brain.png License: GPL Contributors: http://www.loria.fr/~{}rougier Original artist: Nicolas Rougier

• File:Commons-logo.svg Source: https://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg License: ? Contributors: ? Origi-nal artist: ?

• File:Edit-clear.svg Source: https://upload.wikimedia.org/wikipedia/en/f/f2/Edit-clear.svg License: Public domain Contributors: TheTango! Desktop Project. Original artist:The people from the Tango! project. And according to the meta-data in the file, specifically: “Andreas Nilsson, and Jakub Steiner (althoughminimally).”

• File:Frege-gegensätze.png Source: https://upload.wikimedia.org/wikipedia/commons/0/0c/Frege-gegens%C3%A4tze.pngLicense: Pub-lic domain Contributors: Frege, Gottlob, 1879. Begriffsschrift: eine der arithmetischen nachgebildete Formelsprache des reinen Denkens.Halle: L. Nebert. Original artist: Frege

• File:Johannesmagistris-square.jpg Source: https://upload.wikimedia.org/wikipedia/commons/c/ca/Johannesmagistris-square.jpg Li-cense: Public domain Contributors: Own work Original artist: Peter Damian

• File:LampFlowchart.svg Source: https://upload.wikimedia.org/wikipedia/commons/9/91/LampFlowchart.svg License: CC-BY-SA-3.0 Contributors: vector version of Image:LampFlowchart.png Original artist: svg by Booyabazooka

• File:Logic_portal.svg Source: https://upload.wikimedia.org/wikipedia/commons/7/7c/Logic_portal.svg License: CC BY-SA 3.0 Con-tributors: Own work Original artist: Watchduck (a.k.a. Tilman Piesk)

• File:Logical-hexagon.png Source: https://upload.wikimedia.org/wikipedia/commons/b/bb/Logical-hexagon.png License: Public do-main Contributors: Own work by the original uploader (Original text: I (Greg Bard (talk)) created this work entirely by myself.) Originalartist: Greg Bard (talk)

• File:Merge-arrow.svg Source: https://upload.wikimedia.org/wikipedia/commons/a/aa/Merge-arrow.svg License: Public domain Con-tributors: ? Original artist: ?

• File:NASA_Stardust_Mission_inference_objection.png Source: https://upload.wikimedia.org/wikipedia/commons/a/a5/NASA_Stardust_Mission_inference_objection.png License: CC-BY-SA-3.0 Contributors: Transferred from en.wikipedia to Commons. Original artist:The original uploader was Grumpyyoungman01 at English Wikipedia

• File:ParseTree.svg Source: https://upload.wikimedia.org/wikipedia/commons/6/6e/ParseTree.svg License: Public domain Contributors:en:Image:ParseTree.jpg Original artist: Traced by User:Stannered

• File:Question_book-new.svg Source: https://upload.wikimedia.org/wikipedia/en/9/99/Question_book-new.svg License: Cc-by-sa-3.0Contributors:Created from scratch in Adobe Illustrator. Based on Image:Question book.png created by User:Equazcion Original artist:Tkgd2007

• File:Science-symbol-2.svg Source: https://upload.wikimedia.org/wikipedia/commons/7/75/Science-symbol-2.svg License: CC BY 3.0Contributors: en:Image:Science-symbol2.png Original artist: en:User:AllyUnion, User:Stannered

• File:Square_of_opposition,_set_diagrams.svg Source: https://upload.wikimedia.org/wikipedia/commons/5/51/Square_of_opposition%2C_set_diagrams.svg License: Public domain Contributors: Own work Original artist: Watchduck (a.k.a. Tilman Piesk)

• File:Stardust_Mission_Inference_objection_with_co-premise_included.png Source: https://upload.wikimedia.org/wikipedia/commons/8/81/Stardust_Mission_Inference_objection_with_co-premise_included.png License: Public domain Contributors: Own work Originalartist: Grumpyyoungman01

• File:Stroop_icon.svg Source: https://upload.wikimedia.org/wikipedia/commons/d/dd/Stroop_icon.svg License: Public domain Contrib-utors: Own work, based on File:Stroop icon.jpg Original artist: Grutness at en.wikipedia

• File:Text_document_with_red_question_mark.svg Source: https://upload.wikimedia.org/wikipedia/commons/a/a4/Text_document_with_red_question_mark.svg License: Public domain Contributors: Created by bdesham with Inkscape; based upon Text-x-generic.svgfrom the Tango project. Original artist: Benjamin D. Esham (bdesham)

• File:Wiktionary-logo-en.svg Source: https://upload.wikimedia.org/wikipedia/commons/f/f8/Wiktionary-logo-en.svg License: Publicdomain Contributors: Vector version of Image:Wiktionary-logo-en.png. Original artist: Vectorized by Fvasconcellos (talk · contribs),based on original logo tossed together by Brion Vibber

26.3.3 Content license• Creative Commons Attribution-Share Alike 3.0