Top Banner
Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition
21

Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.

Mar 31, 2015

Download

Documents

Manuel Pullum
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.

Bayesian network classification using spline-approximated KDE

Y. Gurwicz, B. Lerner

Journal of Pattern Recognition

Page 2: Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.

Outline

• Introduction

• Background on Naïve Bayesian Network

• Computational Issue with KDE

• Proposed solution: Spline Approximated KDE

• Experiments

• Conclusion

Page 3: Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.

Introduction

• Bayesian Network (NB) classifiers have been successfully applied to a variety of domains

• Attains asymptotically optimal classification error (i.e., Bayes Risk) given that the conditional and prior density estimates are asymptotically consistent (e.g., KDE)

• A particular form of the BN is the Naïve BN (NBN) which has shown to provide good performance in practice and can help alleviate the curse of dimensionality [Zhang 2004]

• Hence NBN is the focus of this work

Page 4: Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.

Naïve Bayesian Network (NBN)

• A BN expresses joint probability distributions (nodes = RVs, edges = dependencies)

• Because expressing node densities is difficult in high dimensions (sample density becomes sparse), the BN can be constrained so that the attributes (RVs) are independent for a given class (increases sample densities)

• This constrained BN is called the Naïve BN• The following introductory slides are obtained

from A. Moore tutorial

Page 5: Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.
Page 6: Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.
Page 7: Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.
Page 8: Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.
Page 9: Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.
Page 10: Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.
Page 11: Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.
Page 12: Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.

Estimating prior and conditional probabilities

• Methods for estimating prior P(C) and conditional P(e|C) probabilities– Parametric

• Gaussian form are mainly used (CRV)• Fast to compute• May not accurately reflect the true distribution

– Non-parametric• KDE• Slow• Can accurately model the true distribution

• Can we come up with a fast non-parametric method?

Page 13: Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.

Cost of calculating conditionals

• Let N_ts = test patterns; N_tr = training patterns; N_f = # of dimensions; N_c = # of classes

• Parametric approach: O(N_ts * N_c * N_f)

• Non-parametric approach: O(N_ts * N_tr * N_f)

• N_c << N_tr

Page 14: Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.

Reducing N_tr: Spline approximation

• Estimate the KDE using splines• Splines are piecewise polynomial regression of

order P interpolated at K intervals over the domain constrained to some smoothness property (e.g., s1’’=s2’’)

• Spline regression only requires O(P * Log K) or O (P) (if a hash function is employed)

• Usually P = 4• Hence significant computational savings can be

attained over the direct KDE

Page 15: Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.

Constructing the Splines

• Calculate the endpoints for the K intervals to interpolate– K+1 estimates from the KDE– O(K * N_tr)

• Calculate the P coefficients for all the individual splines of the K intervals– O(K * P)

• Once splines have been obtained, a density query can be computed in O(P) time

Page 16: Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.

Experiment

• Measurement– Approximation accuracy– Classification accuracy– Classification speed

• Classifiers– BN-KDE– BN-Spline– BN-Gauss

• Synthetic and real-world

Page 17: Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.

Approximation Accuracy

Page 18: Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.

Classification Accuracy

Page 19: Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.

Classification Speed

Page 20: Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.

Conclusion• Spline based method can well approximate the univariate standard

KDE • Speed gains can be realized over the direct KDE• Comments

– How to determine the # of intervals in the splines? Analogous problem to bandwidth specification in KDE..

– Assigns static intervals.. Same problem as the global bandwidth– This is an approximation for the global bandwidth KDE. How well do the

splines approximate the AKDE?– Proposed method works for static data set, however if data distribution

changes, then splines will need to be reconstructed• May not be directly applicable to data streams

– Implication to LR-KDE• Develop multi-query algorithms (e.g., deriving K+1 endpoints/knots)• Assign dynamic spline intervals based on regularized LR since each LR

models a simple density

Page 21: Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.

Reference

• H. Zhang, “The optimality of Naïve Bayes”, AAAI 2004