This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Classification consists in assigning classes or labels toobjects. There are two basic types of classificationproblems:Unsupervised: in this case the classes are unknown, so the
problem consists in dividing a set of objects inton groups or clusters, so that a class is assignedto each different group. It is also known asclustering.
Supervised: the possible classes or labels are known apriori, and the problem consists in finding afunction or rule that assigns each object to oneof the classes.
• Supervised classification consists in assigning to aparticular object described by its attributes,A1,A2, ...,An, one of m classes, C = {c1, c2, ..., cm},such that the probability of the class given the attributesis maximized:
ArgC [MaxP(C | A1,A2, ...,An)] (1)
• If we denote the set of attributes as A = {A1,A2, ...,An}:ArgC [MaxP(C | A)]
• In general we want to maximize the classificationaccuracy; however, this is only optimal if the cost of awrong classification is the same for all the classes
• When there is imbalance in the costs ofmisclassification, we must then minimize the expectedcost (EC). For two classes, this is given by:
EC = FN × P(−)C(− | +) + FP × P(+)C(+ | −) (2)
Where: FN is the false negative rate, FP is the falsepositive rate, P(+) is the probability of positive, P(−) isthe probability of negative, C(− | +) is the cost ofclassifying a positive as negative, and C(+ | −) is thecost of classifying a negative as positive
• The direct application of the Bayes rule results in acomputationally expensive problem
• The number of parameters in the likelihood term,P(A1,A2, ...,An | C), increases exponentially with thenumber of attributes
• An alternative is to consider some independenceproperties as in graphical models, in particular that allattributes are independent given the class, resulting inthe Naive Bayesian Classifier
• The naive Bayes formulation drastically reduces thecomplexity of the Bayesian classifier, as in this case weonly require the prior probability (one dimensionalvector) of the class, and the n conditional probabilitiesof each attribute given the class (two dimensionalmatrices)
• The space requirement is reduced from exponential tolinear in the number of attributes
• The calculation of the posterior is greatly simplified, asto estimate it (unnormalized) only n multiplications arerequired
• The Tree augmented Bayesian Classifier, or TAN,incorporates some dependencies between theattributes by building a directed tree among the attributevariables
• The n attributes form a graph which is restricted to adirected tree that represents the dependency relationsbetween the attributes
• The Bayesian Network augmented Bayesian Classifier,or BAN, which considers that the dependency structureamong the attributes constitutes a directed acyclicgraph (DAG)
• The TAN and BAN classifiers can be considered asparticular cases of a more general model, that is,Bayesian networks
• Techniques for inference and for learning Bayesiannetworks, which can be applied to obtain the posteriorprobabilities (inference) and the model (learning) for theTAN and BAN classifiers
• Another alternative to deal with dependent attributes isto transform the basic structure of a naive Bayesianclassifier, while maintaining a star or tree-structurednetwork
• The basic idea of the SNBC is to eliminate or joinattributes which are not independent given the class
• This is analogous to feature selection in machinelearning
• Two alternative operations to modify the structure of aNBC: (i) node elimination, and (ii) node combination,considering that we start from a full structure
• Node elimination consists in simply eliminating anattribute, Ai , from the model, this could be because it isnot relevant for the class (Ai and C are independent); orbecause the attribute Ai and another attribute, Aj , arenot independent given the class
• Node combination consists in merging two attributes, Aiand Aj , into a new attribute Ak , such that Ak has aspossible values the cross product of the values of Aiand Aj (assuming discrete attributes). For example, ifAi = a,b, c and Aj = 1,2, thenAk = a1,a2,b1,b2, c1, c2. This is an alternative whentwo attributes are not independent given the class
• Consists in adding a new attribute that makes twodependent attributes independent
• This new attribute is a kind of virtual or hidden node inthe model, for which we do not have any data
• An alternative for estimating the parameters of hiddenvariables in Bayesian networks, such as in this case, isbased on the Expectation-Maximization (EM) procedure
• Several important problems need to predict severalclasses simultaneously
• For example: text classification, where a document canbe assigned to several topics; gene classification, as agene may have different functions; image annotation, asan image may include several objects
• The multi-dimensional classification problemcorresponds to searching for a function h that assigns toeach instance represented by a vector of m featuresX = (X1, . . . ,Xm) a vector of d class valuesC = (C1, . . . ,Cd):
• Multi-label classification is a particular case ofmulti-dimensional classification, where all classvariables are binary
• Two basic approaches:• Binary relevance approaches transform the multi-label
classification problem into d independent binaryclassification problems, one for each class variable,C1, . . . ,Cd
• The label power-set approach transforms the multi-labelclassification problem into a single-class scenario bydefining a new compound class variable whose possiblevalues are all the possible combinations of values of theoriginal classes
• A multidimensional Bayesian network classifier (MBC) sa Bayesian network with a particular structure
• The set V of variables is partitioned into two setsVC = {C1, . . . ,Cd}, d ≥ 1, of class variables andVX = {X1, . . . ,Xm}, m ≥ 1, of feature variables(d + m = n)
• The set A of arcs is also partitioned into three sets, AC,AX, ACX, such that AC ⊆ VC × VC is composed of thearcs between the class variables, AX ⊆ VX × VX iscomposed of the arcs between the feature variables,and finally, ACX ⊆ VC ×VX is composed of the arcs fromthe class variables to the feature variables
• The problem of obtaining the classification of aninstance with a MBC, that is, the most likelycombination of classes, corresponds to the MPE (MostProbable Explanation) or abduction problem
• This is a complex problem with a high computationalcost
• Chain classifiers are an alternative method formulti-label classification that incorporate classdependencies, while keeping the computationalefficiency of the binary relevance approach
• A chain classifier consists of d base binary classifierswhich are linked in a chain, such that each classifierincorporates the classes predicted by the previousclassifiers as additional attributes
• The feature vector for each binary classifier, Li , isextended with the labels (0/1) for all previous classifiersin the chain
• Each classifier in the chain is trained to learn theassociation of label li given the features augmented withall previous class labels in the chain
• For classification, it starts at L1, and propagates thepredicted classes along the chain, such that for Li ∈ L(where L = {L1,L2, . . . ,Ld}) it predictsP(Li | X,L1,L2, . . . ,Li−1)
• The class vector is determined by combining theoutputs of all the binary classifiers
• If we consider the dependency relations between theclass variables, and represent these relations as adirected acyclic graph (DAG), then we can simplify theprevious Equation:
• Further simplification by assuming that the mostprobable joint combination of classes can beapproximated by concatenating the individual mostprobable classes
• Hierarchical classification is a type of multidimensionalclassification in which the classes are ordered in apredefined structure, typically a tree, or in general adirected acyclic graph (DAG)
• In hierarchical classification, an example that belongs tocertain class automatically belongs to all itssuperclasses
• Hierarchical classification has application in severalareas, such as text categorization, protein functionprediction, and object recognition
• Chained Path Evaluation (CPE) analyzes each possiblepath from the root to a leaf node in the hierarchy, takinginto account the level of the predicted labels to give ascore to each path and finally return the one with thebest score
• Additionally, it considers the relations of each node withits ancestors in the hierarchy, based on chain classifiers
• A local classifier is trained for each node, Ci , in thehierarchy, except the leaf nodes, to classify its childnodes
• The classifier for each node, Ci , for instance a naiveBayes classifier, is trained considering examples fromall it child nodes, as well as some examples of it siblingnodes in the hierarchy
• To consider the relation with other nodes in thehierarchy, the class predicted by the parent (treestructure) or parents (DAG), are included as additionalattributes in each local classifier
• In the classification phase, the probabilities for eachclass for all local classifiers are obtained based on theinput data
• The score for each path in the hierarchy is calculated bya weighted sum of the log of the probabilities of all localclassifiers in the path:
score =n∑
i=0
wCi × log(P(Ci |Xi ,pa(Ci))) (13)
• The purpose of these weights is to give moreimportance to the upper levels of the hierarchy
• Once the scores for all the paths are obtained, the pathwith the highest score will be selected as the set ofclasses corresponding to certain instance
• Skin detection is a useful pre-processing stage for manyapplications in computer vision, such as persondetection and gesture recognition, among others
• A simple and very fast way to obtain an approximateclassification of pixels in an image as skin or not − skinis based on the color attributes of each pixel
• Usually, pixels in a digital image are represented as thecombination of three basic (primary) colors: Red (R),Green (G) and Blue (B), in what is known as the RGBmodel. There are alternative color models, such asHSV , YIQ, etc.
• An alternative is to consider a semi-naive Bayesianclassifier and select the best attributes from the differentcolor models for skin classification by eliminating orjoining attributes
• Then an initial NBC was learned based on data -examples of skin and not-skin pixels taken from severalimages. This initial classifier obtained a 94% accuracywhen applied to other (test) images.
• The classifier was then optimized. Starting from the fullNBC with 9 attributes, the method applies the variableelimination and combination stages until the simplestclassifier with maximum accuracy is obtained. With thisfinal model accuracy improved to 98%.
• The Human Immunodeficiency Virus (HIV) is thecausing agent of AIDS, a condition in which progressivefailure of the immune system allows opportunisticlife-threatening infections to occur
• The Human Immunodeficiency Virus (HIV) is thecausing agent of AIDS, a condition in which progressivefailure of the immune system allows opportunisticlife-threatening infections to occur
• To combat HIV infection several antiretroviral (ARV)drugs belonging to different drug classes that affectspecific steps in the viral replication cycle have beendeveloped. It is important to select the best drugcombination according to the virus’ mutations in apatient.
Bielza, C., Li, G., Larranaga, P.: Multi-DimensionalClassification with Bayesian Networks. InternationalJournal of Approximate Reasoning, (2011)
Borchani, H., Bielza, C., Toro, C., Larranaga, P.:Predicting human immunodeficiency virus inhibitorsusing multi-dimensional Bayesian network classifiers.Artificial Intelligence in Medicine 57, 219–229 (2013)
Cheng, J., Greiner, R.: Comparing Bayesian NetworkClassifiers. Proceedings of the Fifteenth Conference onUncertainty in Artificial Intelligence, 101–108 (1999)
Silla-Jr., C. N., Freitas, A. A.: A survey of hierarchicalclassification across different application domains. DataMin. and Knowledge Discovery, 22(1-2), 31–72 (2011).
Tsoumakas, G., Katakis, I.: Multi-Label Classification:An Overview. International Journal of Data Warehousingand Mining, 3, 1–13 (2007)
van der Gaag L.C., de Waal, P.R.: Multi-dimensionalBayesian Network Classifiers. Third EuropeanConference on Probabilistic Graphic Models, 107–114,Prague, Czech Republic (2006)