Cost-aware Pre-training for Multiclass Cost-sensitive Deep ...people.csail.mit.edu/andyyuan/docs/ijcai-16.csdnn.poster.pdfl First work that studies thoroughly on Cost-sensitive Deep

www.postersession.com

Cost-aware Pre-training for MulticlassCost-sensitive Deep Learning

Yu-An Chung1 Hsuan-Tien Lin1 Shao-Wen Yang2

1 Dept. of Computer Science and Information Engineering 2 Intel LabsNational Taiwan University, Taipei, Taiwan Intel Corporation, USA

Cost-sensitive Classification

What is the status of the patient?

H1N1-infected Cold-infected Healthy

l Cost of each kind of mis-prediction:

C = 0 1000 𝟏𝟎𝟎𝟎𝟎𝟎100 0 3000100 30 0Healthy

Cold

H1N1H1N1 Cold Healthy

PredictedActual

Predict H1N1-infectedas Healthy: very high cost!

Predict Cold-infectedas Healthy: high cost

Predict correctly: no cost

l Input: a training set 𝑆 = x+, 𝑦+ +./0 and a Cost Matrix C, where x+ ∈ 𝒳,𝑦+∈ 𝒴 = 1,2,… , 𝐾 ,

and C 𝑖, 𝑗 is the cost of classifying a class 𝑖 example as class 𝑗

l Goal: Use 𝑆 and C to train a classifier 𝑔:𝒳 ⇢ 𝒴 such that the expected cost C 𝑦, 𝑔 x ontest example x, 𝑦 is minimal

Our Goal & ContributionsShallow Models (e.g., SVM) Deep Learning

Regular (Cost-insensitive) Classification Well-studied Popular and undergoing

Cost-sensitive Classification Well-studied Our work lies here!

l First work that studies thoroughly on Cost-sensitive Deep Learning1) a novel cost-sensitive loss function for any deep model

2) a Cost-sensitive Autoencoder (CAE) equipped with the loss function for pre-training

fully-connected deep model

3) a combination of 1) and 2) as a complete cost-sensitive deep learning (CSDNN) solution

The Input-to-Cost Regression Network

l Regression network: estimate the costsl Train a regression network

• any end-to-end loss function for regression (e.g., MSE linear regression) could be applied• a loss function built on top of [Tu and Lin, 2010] is derived in this work, given a training set

𝑆 = x+, 𝑦+ +./0 and C, we define

𝛿+,= ≡ ln 1 + exp 𝑧+,= E 𝑟= x+ − C 𝑦+, 𝑘 ,

where 𝑧+,= ≡ 2 c+ 𝑘 = C 𝑦+, 𝑘 − 1.• train the regression network by minimizing the derived Cost-Sensitive Loss (CSL) over the

training set 𝑆:

𝐿LMN =O O 𝛿+,=P

=./

0

+./l Prediction: 𝑔 x ≡ argmin

/V=VP𝑟= x

Cost-sensitive Autoencoder (CAE)l Autoencoder (AE): pre-training a fully-connected neural network (FCNN) for regular classification

l Cost-sensitive Autoencoder (CAE): pre-training the DNN for cost-sensitive classification

Autoencoder (AE)• Goal: to reconstruct the original input x

• Reconstructed errors measured by the cross-

entropy loss 𝐿LW

Cost-sensitive Autoencoder(CAE)

• Goal: to reconstruct both the

original input xand the

cost information C 𝑦, :

• Mixture reconstructed errors:

𝐿LXW 𝑆 = 1 − 𝛼 E 𝐿LW + 𝛼 E 𝐿LMN

Conclusionsl CSL: make any deep model cost-sensitive (see paper for CNN with CSL)

l CSDNN = CAE pre-training + CSL training: both techniques lead to significant improvements

Cost-aware

Experimentsl FCNN: traditional fully-connected neural network for regular classification

l FCNN_CSL: the fully-connected regression network trained by the loss function 𝐿LMNl The proposed Cost-sensitive Deep Neural Network (CSDNN)

Cost-aware Pre-training for Multiclass Cost-sensitive Deep ...people.csail.mit.edu/andyyuan/docs/ijcai-16.csdnn.poster.pdfl First work that studies thoroughly on Cost-sensitive Deep

Documents