DC2 at GSFC - 28 Jun 05 - T. Burnett 1 DC2 C++ decision trees Toby Burnett Frank Golf Quick review of classification (or decision) trees Training and testing How Bill does it with Insightful Miner Applications to the “good- energy” trees: how does it compare?
DC2 C++ decision trees. Quick review of classification (or decision) trees Training and testing How Bill does it with Insightful Miner Applications to the “good-energy” trees: how does it compare?. Toby Burnett Frank Golf. predicate. predicate. purity. Quick Review of Decision Trees. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
DC2 at GSFC - 28 Jun 05 - T. Burnett 1
DC2 C++ decision trees
Toby Burnett
Frank Golf
Quick review of classification (or decision) trees Training and testing How Bill does it with Insightful Miner Applications to the “good-energy” trees: how does it compare?
DC2 at GSFC - 28 Jun 05 - T. Burnett 2
Quick Review of Decision Trees
Introduced to GLAST by Bill Atwood, using InsightfulMiner
Each branch node is a predicate, or cut on a variable, likeCalCsIRLn > 4.222
If true, this defines the right branch, otherwise the left branch.
If there is no branch, the node is a leaf; a leaf contains the purity of the sample that reaches that point
Thus the tree defines a function of the event variables used, returning a value for the purity from the training sample
Analyze a training sample containing a mixture of “good” and “bad” events: I use the even events in order to have an independent set for testing
Choose set of variables and find the optimal cut for such that the left and right subsets are purer than the orignal. Two standard criteria for this: “Gini” and entropy. I currently use the former.
WS : sum of signal weights
WB : sum of background weights
Gini = 2 WS WB/(WS +WB)
Thus Gini wants to be small.
Actually maximize the improvement:Gini(parent)-Gini(left child)-Gini(right child)
Apply this recursively until too few events. (100 for now) Finally test with the odd events: measure purity for each
The current classifier “goodcal” single-tree algorithm applied to all energies is slightly better than the three individual IM trees Boosting will certainly improve the result
Done: One-track vs. vertex: which estimate is better?
In progress in Seattle (as we speak) PSF tail suppression: 4 trees to start.
In progress in Padova (see F. Longo’s summary) Good-gamma prediction, or background rejection