Bayesian Classifiers - INAOE - Ciencias Computacionalesesucar/Clases-mgp/Notes/c4-clasif.pdf · Semi-Naive Bayesian Classifiers Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

Introduction

BayesianClassifiers

Naive BayesClassifier

TAN and BAN

Semi-NaiveBayesianClassifiers

Multidimen.BayesianClassifiersBayesian ChainClassifiers

HierarchicalClassification

Applications

References

Bayesian Classifiers

Probabilistic Graphical Models

L. Enrique Sucar, INAOE

(INAOE) 1 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References

Outline

1 Introduction

2 Bayesian Classifiers

3 Naive Bayes Classifier

4 TAN and BAN

5 Semi-Naive Bayesian Classifiers

6 Multidimen. Bayesian ClassifiersBayesian Chain Classifiers

7 Hierarchical Classification

8 Applications

9 References

(INAOE) 2 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References

Introduction

Classification

Classification consists in assigning classes or labels toobjects. There are two basic types of classificationproblems:Unsupervised: in this case the classes are unknown, so the

problem consists in dividing a set of objects inton groups or clusters, so that a class is assignedto each different group. It is also known asclustering.

Supervised: the possible classes or labels are known apriori, and the problem consists in finding afunction or rule that assigns each object to oneof the classes.

(INAOE) 3 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References

Introduction

Probabilistic Classification

• Supervised classification consists in assigning to aparticular object described by its attributes,A1,A2, ...,An, one of m classes, C = {c1, c2, ..., cm},such that the probability of the class given the attributesis maximized:

ArgC [MaxP(C | A1,A2, ...,An)] (1)

• If we denote the set of attributes as A = {A1,A2, ...,An}:ArgC [MaxP(C | A)]

(INAOE) 4 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References

Introduction

Classifier Evaluation

Accuracy: it refers to how well a classifier predicts thecorrect class for unseen examples (that is,those not considered for learning the classifier).

Classification time: how long it takes the classificationprocess to predict the class, once the classifierhas been trained.

Training time: how much time is required to learn theclassifier from data.

Memory requirements: how much space in terms ofmemory is required to store the classifierparameters.

Clarity: if the classifier is easily understood by a person.

(INAOE) 5 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References

Introduction

Class Imbalance

• In general we want to maximize the classificationaccuracy; however, this is only optimal if the cost of awrong classification is the same for all the classes

• When there is imbalance in the costs ofmisclassification, we must then minimize the expectedcost (EC). For two classes, this is given by:

EC = FN × P(−)C(− | +) + FP × P(+)C(+ | −) (2)

Where: FN is the false negative rate, FP is the falsepositive rate, P(+) is the probability of positive, P(−) isthe probability of negative, C(− | +) is the cost ofclassifying a positive as negative, and C(+ | −) is thecost of classifying a negative as positive

(INAOE) 6 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Bayes Classifier (I)

• Applying Bayes rule:

P(C | A1,A2, ...,An) =P(C)P(A1,A2, ...,An | C)

P(A1,A2, ...,An)(3)

• Which can be written more compactly as:

P(C | A) = P(C)P(A | C)/P(A) (4)

(INAOE) 7 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Bayes Classifier (II)

• The classification problem can be formulated as:

ArgC [Max [P(C | A) = P(C)P(A | C)/P(A)]] (5)

• Equivalently:• ArgC [Max [P(C)P(A | C)]]• ArgC [Max [log(P(C)P(A | C))]]• ArgC [Max [(logP(C) + logP(A | C)]]

Note that the probability of the attributes, P(A), doesnot vary with respect to the class, so it can beconsidered as a constant for the maximization.

(INAOE) 8 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Complexity

• The direct application of the Bayes rule results in acomputationally expensive problem

• The number of parameters in the likelihood term,P(A1,A2, ...,An | C), increases exponentially with thenumber of attributes

• An alternative is to consider some independenceproperties as in graphical models, in particular that allattributes are independent given the class, resulting inthe Naive Bayesian Classifier

(INAOE) 9 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References

Naive Bayes Classifier

Naive Bayes

• The naive or simple Bayesian classifier (NBC) is basedon the assumption that all the attributes areindependent given the class variable:

P(C | A1,A2, ...,An) =P(C)P(A1 | C)P(A2 | C)...P(An | C)

P(A)(6)

where P(A) can be considered, as mentioned before, anormalization constant.

(INAOE) 10 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Complexity

• The naive Bayes formulation drastically reduces thecomplexity of the Bayesian classifier, as in this case weonly require the prior probability (one dimensionalvector) of the class, and the n conditional probabilitiesof each attribute given the class (two dimensionalmatrices)

• The space requirement is reduced from exponential tolinear in the number of attributes

• The calculation of the posterior is greatly simplified, asto estimate it (unnormalized) only n multiplications arerequired

(INAOE) 11 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Graphical Model

(INAOE) 12 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Parameter Learning

• The probabilities can be estimated from data using, forinstance, maximum likelihood estimation

• The prior probabilities of the class variable, C, are givenby:

P(ci) ∼ Ni/N (7)

• The conditional probabilities of each attribute, Aj can beestimated as:

P(Ajk | ci) ∼ Njki/Ni (8)

(INAOE) 13 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Inference

• The posterior probability can be obtained just bymultiplying the prior by the likelihood for each attribute

• Given the values for m attributes, a1, ...am, for eachclass ci , the posterior is proportional to:

P(ci | a1, ...am) ∼ P(ci)P(a1 | ci)...P(am | ci) (9)

• The class ck that maximizes the previous equation willbe selected

(INAOE) 14 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Example - Classifier for Golf

Outlook Temperature Humidity Windy Playsunny high high false nosunny high high true no

overcast high high false yesrain medium high false yesrain low normal false yesrain low normal true no

overcast low normal true yessunny medium high false nosunny low normal false yesrain medium normal false yes

sunny medium normal true yesovercast medium high true yesovercast high normal false yes

rain medium high true no

(INAOE) 15 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Example - NBC for Golf

(INAOE) 16 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Example - inference for Golf

Given: Outlook=rain, Temperature=high, Humidity=normal,Windy=noP(Play = yes | overcast ,medium,normal ,no) =k × 0.64× 0.33× 0.22× 0.67× 0.67 = k × 0.021P(Play = no | overcast ,medium,normal ,no) =k × 0.36× 0.40× 0.40× 0.2.× 0.40 = k × 0.0046k = 1/(0.021 + 0.0046) = 39.27P(Play = yes | overcast ,medium,normal ,no) = 0.82P(Play = no | overcast ,medium,normal ,no) = 0.18

(INAOE) 17 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Analysis

• Advantages:• The low number of required parameters, which reduces

the memory requirements and facilitates learning themfrom data.

• The low computational cost for inference (estimating theposteriors) and learning.

• The relatively good performance (classificationprecision) in many domains.

• A simple and intuitive model.• Limitations:

• In some domains, the performance is reduced given thatthe conditional independence assumption is not valid.

• If there are continuous attributes, these need to bediscretized (or consider alternative models such as thelinear discriminator).

(INAOE) 18 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References

TAN and BAN

TAN

• The Tree augmented Bayesian Classifier, or TAN,incorporates some dependencies between theattributes by building a directed tree among the attributevariables

• The n attributes form a graph which is restricted to adirected tree that represents the dependency relationsbetween the attributes

(INAOE) 19 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References

TAN and BAN

BAN

• The Bayesian Network augmented Bayesian Classifier,or BAN, which considers that the dependency structureamong the attributes constitutes a directed acyclicgraph (DAG)

(INAOE) 20 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References

TAN and BAN

Inference and Learning

• The TAN and BAN classifiers can be considered asparticular cases of a more general model, that is,Bayesian networks

• Techniques for inference and for learning Bayesiannetworks, which can be applied to obtain the posteriorprobabilities (inference) and the model (learning) for theTAN and BAN classifiers

(INAOE) 21 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References

Semi-Naive Bayesian Classifiers

Semi-Naive Bayesian Classifiers (SNBC)

• Another alternative to deal with dependent attributes isto transform the basic structure of a naive Bayesianclassifier, while maintaining a star or tree-structurednetwork

• The basic idea of the SNBC is to eliminate or joinattributes which are not independent given the class

• This is analogous to feature selection in machinelearning

(INAOE) 22 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Structural Improvement

• Two alternative operations to modify the structure of aNBC: (i) node elimination, and (ii) node combination,considering that we start from a full structure

(INAOE) 23 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Node elimination and combination

• Node elimination consists in simply eliminating anattribute, Ai , from the model, this could be because it isnot relevant for the class (Ai and C are independent); orbecause the attribute Ai and another attribute, Aj , arenot independent given the class

• Node combination consists in merging two attributes, Aiand Aj , into a new attribute Ak , such that Ak has aspossible values the cross product of the values of Aiand Aj (assuming discrete attributes). For example, ifAi = a,b, c and Aj = 1,2, thenAk = a1,a2,b1,b2, c1, c2. This is an alternative whentwo attributes are not independent given the class

(INAOE) 24 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Node Insertion

• Consists in adding a new attribute that makes twodependent attributes independent

• This new attribute is a kind of virtual or hidden node inthe model, for which we do not have any data

• An alternative for estimating the parameters of hiddenvariables in Bayesian networks, such as in this case, isbased on the Expectation-Maximization (EM) procedure

(INAOE) 25 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References

Multidimen. Bayesian Classifiers

Multidimensional classification

• Several important problems need to predict severalclasses simultaneously

• For example: text classification, where a document canbe assigned to several topics; gene classification, as agene may have different functions; image annotation, asan image may include several objects

(INAOE) 26 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Definition

• The multi-dimensional classification problemcorresponds to searching for a function h that assigns toeach instance represented by a vector of m featuresX = (X1, . . . ,Xm) a vector of d class valuesC = (C1, . . . ,Cd):

ArgMaxc1,...,cd P(C1 = c1, . . . ,Cd = cd |X) (10)

(INAOE) 27 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Multi-label classification

• Multi-label classification is a particular case ofmulti-dimensional classification, where all classvariables are binary

• Two basic approaches:• Binary relevance approaches transform the multi-label

classification problem into d independent binaryclassification problems, one for each class variable,C1, . . . ,Cd

• The label power-set approach transforms the multi-labelclassification problem into a single-class scenario bydefining a new compound class variable whose possiblevalues are all the possible combinations of values of theoriginal classes

(INAOE) 28 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Multidimensional Bayesian NetworkClassifiers

• A multidimensional Bayesian network classifier (MBC) sa Bayesian network with a particular structure

• The set V of variables is partitioned into two setsVC = {C1, . . . ,Cd}, d ≥ 1, of class variables andVX = {X1, . . . ,Xm}, m ≥ 1, of feature variables(d + m = n)

• The set A of arcs is also partitioned into three sets, AC,AX, ACX, such that AC ⊆ VC × VC is composed of thearcs between the class variables, AX ⊆ VX × VX iscomposed of the arcs between the feature variables,and finally, ACX ⊆ VC ×VX is composed of the arcs fromthe class variables to the feature variables

(INAOE) 29 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


MDBNC - Graphical Model

(INAOE) 30 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Inference

• The problem of obtaining the classification of aninstance with a MBC, that is, the most likelycombination of classes, corresponds to the MPE (MostProbable Explanation) or abduction problem

• This is a complex problem with a high computationalcost

(INAOE) 31 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References

Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

Chain Classifiers

• Chain classifiers are an alternative method formulti-label classification that incorporate classdependencies, while keeping the computationalefficiency of the binary relevance approach

• A chain classifier consists of d base binary classifierswhich are linked in a chain, such that each classifierincorporates the classes predicted by the previousclassifiers as additional attributes

• The feature vector for each binary classifier, Li , isextended with the labels (0/1) for all previous classifiersin the chain

(INAOE) 32 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Chain Classifiers - example

(INAOE) 33 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Learning and Inference

• Each classifier in the chain is trained to learn theassociation of label li given the features augmented withall previous class labels in the chain

• For classification, it starts at L1, and propagates thepredicted classes along the chain, such that for Li ∈ L(where L = {L1,L2, . . . ,Ld}) it predictsP(Li | X,L1,L2, . . . ,Li−1)

• The class vector is determined by combining theoutputs of all the binary classifiers

(INAOE) 34 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Bayesian chain classifiers

• Bayesian chain classifiers are a type of chain classifierunder a probabilistic framework:

ArgMaxC1,...,cd P(C1|C2, . . . ,Cd ,X) . . .P(Cd |X) (11)

• If we consider the dependency relations between theclass variables, and represent these relations as adirected acyclic graph (DAG), then we can simplify theprevious Equation:

ArgMaxC1,...,Cd

d∏i=1

P(Ci |Pa(Ci),x) (12)

(INAOE) 35 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


• Further simplification by assuming that the mostprobable joint combination of classes can beapproximated by concatenating the individual mostprobable classes

ArgMaxC1P(C1|Pa(C1),X)ArgMaxC2P(C2|Pa(C2),X)

· · · · · · · · · · · · · · · · · · · · ·ArgMaxCd P(Cd |Pa(Cd),X)

• This last approximation corresponds to a Bayesianchain classifier.

• For the base classifier we can use any of the Bayesianclassifiers presented in the previous sections, forinstance a NBC

(INAOE) 36 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


BCC

(INAOE) 37 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References

Hierarchical Classification


• Hierarchical classification is a type of multidimensionalclassification in which the classes are ordered in apredefined structure, typically a tree, or in general adirected acyclic graph (DAG)

• In hierarchical classification, an example that belongs tocertain class automatically belongs to all itssuperclasses

• Hierarchical classification has application in severalareas, such as text categorization, protein functionprediction, and object recognition

(INAOE) 38 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Hierarchical Classification - example

(INAOE) 39 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Basic approaches

• The global approach builds a classifier to predicts allthe classes at once; this becomes too complexcomputationally for large hierarchies.

• The local approaches train several classifiers andcombine their outputs:

• Local Classifier per hierarchy Level, that trains onemulti-class classifier for each level of the class hierarchy

• Local binary Classifier per Node, in which a binaryclassifier is built for each node (class) in the hierarchy,except the root node

• Local Classifier per Parent Node (LCPN), where amulti-class classifier is trained to predict its child nodes

• Local methods commonly use a top-down approach forclassification

(INAOE) 40 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Types of Methods

(INAOE) 41 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Hierarchical Structure

(INAOE) 42 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Chained path evaluation

• Chained Path Evaluation (CPE) analyzes each possiblepath from the root to a leaf node in the hierarchy, takinginto account the level of the predicted labels to give ascore to each path and finally return the one with thebest score

• Additionally, it considers the relations of each node withits ancestors in the hierarchy, based on chain classifiers

(INAOE) 43 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Training

• A local classifier is trained for each node, Ci , in thehierarchy, except the leaf nodes, to classify its childnodes

• The classifier for each node, Ci , for instance a naiveBayes classifier, is trained considering examples fromall it child nodes, as well as some examples of it siblingnodes in the hierarchy

• To consider the relation with other nodes in thehierarchy, the class predicted by the parent (treestructure) or parents (DAG), are included as additionalattributes in each local classifier

(INAOE) 44 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


Classification

• In the classification phase, the probabilities for eachclass for all local classifiers are obtained based on theinput data

• The score for each path in the hierarchy is calculated bya weighted sum of the log of the probabilities of all localclassifiers in the path:

score =n∑

i=0

wCi × log(P(Ci |Xi ,pa(Ci))) (13)

• The purpose of these weights is to give moreimportance to the upper levels of the hierarchy

• Once the scores for all the paths are obtained, the pathwith the highest score will be selected as the set ofclasses corresponding to certain instance

(INAOE) 45 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References


CPE - Classification Example

(INAOE) 46 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References

Applications

Visual skin detection

• Skin detection is a useful pre-processing stage for manyapplications in computer vision, such as persondetection and gesture recognition, among others

• A simple and very fast way to obtain an approximateclassification of pixels in an image as skin or not − skinis based on the color attributes of each pixel

• Usually, pixels in a digital image are represented as thecombination of three basic (primary) colors: Red (R),Green (G) and Blue (B), in what is known as the RGBmodel. There are alternative color models, such asHSV , YIQ, etc.

(INAOE) 47 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References

Applications

SNBC

• An alternative is to consider a semi-naive Bayesianclassifier and select the best attributes from the differentcolor models for skin classification by eliminating orjoining attributes

• Then an initial NBC was learned based on data -examples of skin and not-skin pixels taken from severalimages. This initial classifier obtained a 94% accuracywhen applied to other (test) images.

• The classifier was then optimized. Starting from the fullNBC with 9 attributes, the method applies the variableelimination and combination stages until the simplestclassifier with maximum accuracy is obtained. With thisfinal model accuracy improved to 98%.

(INAOE) 48 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References

Applications

Optimization

(INAOE) 49 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References

Applications

Example

(INAOE) 50 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References

Applications

HIV drug selection

• The Human Immunodeficiency Virus (HIV) is thecausing agent of AIDS, a condition in which progressivefailure of the immune system allows opportunisticlife-threatening infections to occur

• The Human Immunodeficiency Virus (HIV) is thecausing agent of AIDS, a condition in which progressivefailure of the immune system allows opportunisticlife-threatening infections to occur

• To combat HIV infection several antiretroviral (ARV)drugs belonging to different drug classes that affectspecific steps in the viral replication cycle have beendeveloped. It is important to select the best drugcombination according to the virus’ mutations in apatient.

(INAOE) 51 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References

Applications

Multilabel Classification

• Selecting the best group of antiretroviral drugs for apatient can be seen as an instance of a multi-labelclassification problem

• This particular problem can be accurately modeled witha multi-dimensional Bayesian network classifiers

• This model can be learned from data and then use it forselecting the set of drugs according to the features(virus mutations)

(INAOE) 52 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References

Applications

MBC for HIV drug selection

(INAOE) 53 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References

References

Book

Sucar, L. E, Probabilistic Graphical Models, Springer 2015 –Chapter 4

(INAOE) 54 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References

References

Additional Reading (1)

Bielza, C., Li, G., Larranaga, P.: Multi-DimensionalClassification with Bayesian Networks. InternationalJournal of Approximate Reasoning, (2011)

Borchani, H., Bielza, C., Toro, C., Larranaga, P.:Predicting human immunodeficiency virus inhibitorsusing multi-dimensional Bayesian network classifiers.Artificial Intelligence in Medicine 57, 219–229 (2013)

Cheng, J., Greiner, R.: Comparing Bayesian NetworkClassifiers. Proceedings of the Fifteenth Conference onUncertainty in Artificial Intelligence, 101–108 (1999)

Friedman, N., Geiger, D., Goldszmidt, M.: BayesianNetwork Classifiers. Machine Learning 29, 131–163(1997)

(INAOE) 55 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References

References


Kwoh, C.K, Gillies, D.F.: Using Hidden Nodes inBayesian Networks. Artificial Intelligence 88, Elsevier,Essex, UK, 1–38 (1996)

Martinez M., Sucar, L.E.: Learning an optimal naivebayes classifier. International Conference on PatternRecognition (ICPR), Vol. 3, 1236–1239 (2006)

Read, J., Pfahringer, B., Holmes, G., Frank, E.:Classifier chains for multi-label classification.Proceedings ECML/PKDD, 254–269 (2009)

Ramırez, M., Sucar, L.E., Morales, E.: Path Evaluationfor Hierarchical Multi-label Classification. Proceedings ofthe Twenty-Seventh International Florida ArtificialIntelligence Research Society Conference (FLAIRS), pp.502–507 (2014)

(INAOE) 56 / 56

Introduction

BayesianClassifiers


TAN and BAN




Applications

References

References


Silla-Jr., C. N., Freitas, A. A.: A survey of hierarchicalclassification across different application domains. DataMin. and Knowledge Discovery, 22(1-2), 31–72 (2011).

Sucar, L.E, Bielza, C., Morales, E., Hernandez P.,Zaragoza, J., Larranaga, P.: Multi-label Classificationwith Bayesian Network-based Chain Classifiers. PatternRecognition Letters 41, 14–22 (2014)

Tsoumakas, G., Katakis, I.: Multi-Label Classification:An Overview. International Journal of Data Warehousingand Mining, 3, 1–13 (2007)

van der Gaag L.C., de Waal, P.R.: Multi-dimensionalBayesian Network Classifiers. Third EuropeanConference on Probabilistic Graphic Models, 107–114,Prague, Czech Republic (2006)

(INAOE) 57 / 56

Bayesian Classifiers - INAOE - Ciencias Computacionalesesucar/Clases-mgp/Notes/c4-clasif.pdf · Semi-Naive Bayesian Classifiers Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

Documents