Artificial Intelligence Learning: decision lists, evaluation, Naive Bayesian networks Peter Antal [email protected] September 26, 2016 1 A.I.
Apr 14, 2019
Artificial IntelligenceLearning: decision lists, evaluation, Naive
Bayesian networks
Peter [email protected]
September 26, 2016 1A.I.
Algorithms for concept learning◦ Best vs. version space
PAC-learning for decision lists
The evaluation of performance
From predictions to optimal decisions
Learning Naiv Bayesian networks
September 26, 2016A.I. 2
Each model specifies true/false for each proposition symbol
E.g. P1,2 P2,2 P3,1false true false
With these symbols, 8 possible models, can be enumerated automatically.
Rules for evaluating truth with respect to a model m:
S is true iff S is false S1 S2 is true iff S1 is true and S2 is trueS1 S2 is true iff S1is true or S2 is trueS1 S2 is true iff S1 is false or S2 is truei.e., is false iff S1 is true and S2 is falseS1 S2 is true iff S1S2 is true andS2S1 is true
Simple recursive process evaluates an arbitrary sentence, e.g.,
P1,2 (P2,2 P3,1) = true (true false) = true true = true
9/26/2016 3A.I.
9/26/2016 4A.I.
Two sentences are logically equivalent} iff true in same models: α ≡ ß iff α╞ β and β╞ α
9/26/2016 5A.I.
B1,1 (P1,2 P2,1)β
1. Eliminate , replacing α β with (α β)(β α).2.
(B1,1 (P1,2 P2,1)) ((P1,2 P2,1) B1,1)
2. Eliminate , replacing α β with α β.
(B1,1 P1,2 P2,1) ((P1,2 P2,1) B1,1)
3. Move inwards using de Morgan's rules and double-negation:
(B1,1 P1,2 P2,1) ((P1,2 P2,1) B1,1)
4. Apply distributivity law ( over ) and flatten:
(B1,1 P1,2 P2,1) (P1,2 B1,1) (P2,1 B1,1)
9/26/2016 6A.I.
Goal: selection of a logical function f: {0,1}n→{0,1} from a function class C,
which is consistent with the data DN={(x1, y1),..,(xN, yN)}, i.e. for i=1..N: f(xi)= yi.
Predicted Ref.:0 Ref.1
0 True negative (TN)
False negative (FN)
1 False positive
(FP)
True positive
(TP)
Learning method:True negative/ True positive: -False negative: generalizeFalse positive: specialize
False negative: generalization
◦ Replace A B to A
◦ Replace A to A B
False positive: specialization
◦ Replace A to A B
◦ Replace A B to A
September 26, 2016A.I. 8
+ + +
+ + +
+ + +
+ + +
+ + +
+ + -
+ + +
+ + -
Bound the set of consistent hypotheses with two limiting sets:◦ S: the set of most specific consistent hypotheses
◦ G: the set of most general consistent hypotheses
Learning from (xi, yi): update Si and Gi
◦ For each hypothesis in Si:
FP: delete
FN: generalize to all neigbours
◦ For each hypothesis in Gi:
FP: specialize to all neighbours
FN: delete
September 26, 2016A.I. 9
Sp
ecia
lg
en
era
l
One possible representation for hypotheses
E.g., here is the “true” tree for deciding whether to wait:
How many distinct decision trees with n Boolean attributes?= number of Boolean functions= number of distinct truth tables with 2n rows = 22n
E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees
How many purely conjunctive hypotheses (e.g., Hungry Rain)?
Each attribute can be in (positive), in (negative), or out 3n distinct conjunctive hypotheses
More expressive hypothesis space◦ increases chance that target function can be expressed◦ increases number of hypotheses consistent with training set
may get worse predictions
Sequential k tests using n attributes: k-DL(n)
Number of tests:
Number of test sequences:
Number of decision lists:
September 26, 2016A.I. 12
),(3
knConj
!),(3)(DL),(
knConjnkknConj
k
i
knOi
nknConj
0
)(2
),(
Number of decision lists:
PAC sample complexity:
September 26, 2016A.I. 13
))(log( 22)(DLkk nnOnk
)))(log(1
(ln1
2
kk nnOm
Sensitivity: p(Prediction=TRUE|Ref=TRUE)
Specificity: p(Prediction=FALSE|Ref=FALSE)
PPV: p(Ref=TRUE|Prediction=TRUE)
NPV: p(Ref=FALSE|Prediction=FALSE)
Mutation
Onset
Bleeding
absent
P(D|a,l,m)
Regularity
weak
Onset=early Onset=late
h.wild
regular irregular
mutated
P(D|a,l,h.w.)
P(D|a,e)
strong
P(D|Bleeding=strong)
Mutation
P(D|w,i,m)
h.wild mutated
P(D|w,i,h.w.)
P(D|w,r)
Decision tree: Each internal node represent a (univariate) test, the leafs contains
the conditional probabilities given the values along the path.
Decision graph: If conditions are equivalent, then subtrees can be merged.
E.g. If (Bleeding=absent,Onset=late) ~ (Bleeding=weak,Regularity=irreg)
Healthy Disease present
threshold t
a1
a0
o0
o1
o0
o1
reported Ref.:0 Ref.1
0 C0|0 C0|1
1 C1|0 C1|1
Variables (nodes) Flu: present/absent
FeverAbove38C: present/absent
Coughing: present/absent
Flu
Fever Coughing
P(Fever=present|Flu=present)=0.6
P(Fever=absent|Flu=present)=1-0.6
P(Fever=present|Flu=absent)=0.01
P(Fever=absent|Flu=absent)=1-0.01
P(Flu=present)=0.001
P(Flu=absent)=1-P(Flu=present)Model
P(Coughing=present|Flu=present)=0.3
P(Coughing=absent|Flu=present)=1-0.7
P(Coughing=present|Flu=absent)=0.02
P(Coughing=absent|Flu=absent)=1-0.02
Assumptions:
1, Two types of nodes: a cause and effects.
2, Effects are conditionally independent of each other given their cause.
Decomposition of the joint:
P(Y,X1,..,Xn) = P(Y)∏iP(Xi,|Y, X1,..,Xi-1) //by the chain rule
= P(Y)∏iP(Xi,|Y) // by the N-BN assumption
2n+1 parameteres!
Diagnostic inference:
P(Y|xi1,..,xik) = P(Y)∏jP(xij,|Y) / P(xi1,..,xik)
If Y is binary, then the oddsP(Y=1|xi1,..,xik) / P(Y=0|xi1,..,xik) = P(Y=1)/P(Y=0) ∏j P(xij,|Y=1) / P(xij,|Y=0)
Flu
Fever Coughing
)|()|()(
),|(
presentFlupresentCoughingppresentFluabsentFeverppresentFlup
presentCoughingabsentFeverpresentFlup
9/26/2016A.I. 20
Naive concept learning
Learning decision lists
Decision trees and graphs
Optimal decisions
Error types in classification
Cost-free performance measures
Naive Bayesian network classifiers
September 26, 2016A.I. 21