DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS Meir Dalal Or Gorodissky 1 Deep Neural Decision Forests Microsoft Research Cambridge UK , ICCV 2015 Decision Forests, Convolutional Networks and the Models in-Between Microsoft Research Technical Report arXiv 3 Mar. 2016
42
Embed
Decision Trees & Random Forests x Deep Neural Networksweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2017/03/DT-x-DNN.pdf · DECISION TREES & RANDOM FORESTS X ... Novel algorithm
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
DECISION TREES & RANDOM FORESTSX
CONVOLUTIONAL NEURAL NETWORKS
Meir Dalal
Or Gorodissky
1
Deep Neural Decision Forests
Microsoft Research Cambridge UK , ICCV 2015
Decision Forests, Convolutional Networks and the Models in-Between
Microsoft Research Technical Report arXiv 3 Mar. 2016
MOTIVATION
DECISION TREES
RANDOM FORESTS
DECISION TREES VS CNN
OVERVIEW OF THE PRESENTATION
2
COMBINING DECISION TREE & CNN
MOTIVATION
3
Combining CNN’s feature learning with Random Forest’s classification capacities
DECISION TREE - WHAT IS IT
4
Supervised learning algorithm used for classification
An inductive learning task - use particular facts to make more generalized conclusions
A predictive model based on a branching series of tests
These smaller tests are less complex than a one-stage classifier (Divide & Conquer)
Different way to look at : each node either predicates the answer or passes the problem to a
different node
Example…
DECISION TREES - TYPICAL (NAIVE) PROBLEM
5
Training examples
Example Attributes Target
DECISION TREES - TYPICAL (NAIVE) PROBLEM CONT.
6
DECISION TREES - TYPICAL (NAIVE) PROBLEM CONT.
7
DECISION TREES - HOW TO CONSTRUCT
8
When to stop
All the instances have the same target class
There are no more instances
There are no more attributes
Reach to pre-defined max depth
How to split? constructing a decision trees usually work top-down
Gini impurity
Information gain
…
DECISION TREES - TERMINOLOGY
9
Prediction Node
Decision Node
Root Node
Splitting
DECISION TREES - STOCHASTIC ROUTING
10
Input space 𝜒, output space 𝒴
Decision nodes : 𝑛 ∈ Ν ∶ 𝑑𝑛(∙;Θ)
Prediction nodes : 𝑙 ∈ ℒ: 𝜋𝑙 𝑜𝑣𝑒𝑟 𝒴
Θ - Decision node parameterization
Routing function till now
𝒅𝒏 is binary and the routing is deterministic
Leaf prediction mark as 𝜋𝑙
Stochastic routing function
𝒅𝒏(∙;Θ) : 𝜒 → 0,1
Routing decision is an output of a Bernoulli random variable with mean 𝑑𝑛(∙;Θ)
Leaf node contain a probability for each class
𝝅:
DECISION TREE - ENSEMBLE METHODS
11
If a decision tree is fully grown, it may lose some generalization capability
→Overfitting
How to solve it?
Ensemble methods
Involve group of predictive models to achieve a better accuracy and model stability
RANDOM FOREST
12
When you can’t think of any algorithm , use random forest!
Algorithm (Bootstrap Aggregation)
1. Grow K different decision trees
1. Pick a random subset of the training examples (with return)
2. Pick d << D random attributes to split the data
3. Each tree is grown to the largest extent possible and there
is no pruning
2. Given a new data point 𝜒1. Classify 𝜒 using each of the trees 𝑇1…𝑇𝐾2. Predict new data by aggregating the predictions of the tree
trees (i.e., majority votes for classification, average for
regression).F O R E S T D E C I S I O N
A v e r a g i n g a l l t h e t r e e s ’ p r e d i c t i o n s
DT
Levels
Divide & Conquer
Only log2 𝑁 parameters used in test time
No feature learned (at most)
Training is done layer wise
High efficiency
Layers
High dimensionality
Use all the parameters in test time!
Feature learning integrated classification
Training E2E with S/GD
State of the art accuracy
CNN
DECISION TREES X CONV NEURAL NETS
13
How to efficiently combine DT/RF with CNN?
DECISION TREE BY CNN FEATURESARCHITECTURE
14
SoftmaxRF
CNN
DECISION TREE BY CNN FEATURESARCHITECTURE
15
RF
CNN
DECISION TREE BY CNN FEATURESARCHITECTURE
16
FOREST DEC IS ION
Avera g i n g a l l t h e t re e s ’ p red i c t i on s
RF
CNN
DECISION TREE BY CNN FEATURESARCHITECTURE
17
Decision Nodes
𝑑𝑛 ∙;Θ = 𝜎 𝑓𝑛 𝓍 ;Θ
𝜎 𝑥 = 1 + 𝑒−𝑥 −1 (sigmoid function)
𝑓𝑛(∙;Θ) : 𝜒 → ℝ
Prediction Probability
Prediction for sample 𝓍 ∶ 𝕡𝑇 𝓎 𝓍, Θ, 𝜋 = 𝑙∈ ℒ 𝜋𝑙𝓎𝜇𝑙(𝓍|Θ) where
𝜋𝑙𝓎 - probability of a sample reaching a leaf ℓ to take class 𝓎
𝜇𝑙(𝓍|Θ) - probability that sample 𝓍 will reach leaf ℓ 𝑙∈ ℒ 𝜇𝑙(𝓍|Θ) = 1
Forest Of Decision Trees
Deliver a prediction for a 𝓍 sample by averaging the output of each tree: ℙℱ 𝓎 𝓍 =1