Agnostically Learning Agnostically Learning Decision Trees Decision Trees Parikshit Gopalan Parikshit Gopalan MSR-Silicon MSR-Silicon Valley Valley , IITB’00. , IITB’00. Adam Tauman Kalai Adam Tauman Kalai MSR-New England MSR-New England Adam R. Klivans Adam R. Klivans UT Austin UT Austin 0 1 0 0 1 1 1 0 X 1 X 2 X 3 0 0 1 1 0 1 0 0 1 1
48
Embed
Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Agnostically Learning Agnostically Learning Decision TreesDecision Trees
Parikshit GopalanParikshit Gopalan MSR-Silicon ValleyMSR-Silicon Valley, , IITB’00.IITB’00.
Adam Tauman Kalai Adam Tauman Kalai MSR-New EnglandMSR-New EnglandAdam R. KlivansAdam R. Klivans UT AustinUT Austin
0 1
0
0 1
1
1 0 X1
X2 X3
0 01 1
0
1 00
1
1
Computational LearningComputational Learning
Computational LearningComputational Learning
Computational LearningComputational Learning
Learning: Predict f from examples.
x, f(x)
f:{0,1}n ! {0,1}
Valiant’s ModelValiant’s Model
x, f(x)
f:{0,1}n ! {0,1}
Assumption: f comes from a nice concept class.
Halfspaces:
+-
++
+
+
+ +
+ -
-
-
--
--
-
--
Valiant’s ModelValiant’s Model
x, f(x)
f:{0,1}n ! {0,1}
Assumption: f comes from a nice concept class.
Decision Trees:
X1
X2 X3
0 01 1
0
1 00
1
1
The Agnostic Model The Agnostic Model [Kearns-Schapire-[Kearns-Schapire-
Sellie’94]Sellie’94]
x, f(x)
f:{0,1}n ! {0,1}
No assumptions about f.
Learner should do as well as best decision tree.
Decision Trees:
X2 X3
0 01 1
0
1 00
1
1
X1
The Agnostic Model The Agnostic Model [Kearns-Schapire-[Kearns-Schapire-
Sellie’94]Sellie’94]
x, f(x)
No assumptions about f.
Learner should do as well as best decision tree.
Decision Trees:
X2 X3
0 01 1
0
1 00
1
1
X1
Agnostic Model = Noisy Agnostic Model = Noisy LearningLearning
f:{0,1}n ! {0,1}
+ =
Concept: Message Truth table: Encoding Function f: Received word.
Coding: Recover the Message.
Learning: Predict f.
X2 X3
0 01 1
0
1 00
1
1
X1
Uniform Distribution Uniform Distribution Learning for Decision Learning for Decision
TreesTreesNoiseless Setting:
– No queries: nlog n [Ehrenfeucht-Haussler’89].– With queries: poly(n). [Kushilevitz-Mansour’91]
Reconstruction for sparse real polynomials in the l1 norm.
Agnostic Setting:
Polynomial time, uses queries. [G.-Kalai-Klivans’08]
The Fourier Transform The Fourier Transform MethodMethod
Powerful tool for uniform distribution Powerful tool for uniform distribution learning.learning.
Introduced by Introduced by Linial-Mansour-NisanLinial-Mansour-Nisan..– Small depth circuitsSmall depth circuits [Linial-Mansour-Nisan’89][Linial-Mansour-Nisan’89]– DNFsDNFs [Jackson’95][Jackson’95]– Decision treesDecision trees [Kushilevitz-Mansour’94, [Kushilevitz-Mansour’94,
Parity of Parity of ½½ [n] [n]: : (x) = (x) = i i 22 XXii
Write Write f(x) = f(x) = c( c())(x)(x)
– c(c()) =1. =1.
Low Degree FunctionsLow Degree Functions
Sparse Functions: Sparse Functions: Most of the Most of the weight lies on small subsets.weight lies on small subsets. Halfspaces, Small-depth Halfspaces, Small-depth circuits.circuits. Low-degree algorithm. Low-degree algorithm. [Linial-Mansour-Nisan][Linial-Mansour-Nisan] Finds the low-degree Finds the low-degree Fourier coefficients.Fourier coefficients.
Least Squares Regression: Find low-degree P minimizing Ex[ |P(x) – f(x)|2 ].
Sparse FunctionsSparse FunctionsSparse Functions: Sparse Functions: Most of the Most of the weight lies on a few subsets.weight lies on a few subsets.
Sparse Sparse l2 Regression RegressionSparse Functions: Sparse Functions: Most of the Most of the weight lies on a few subsets.weight lies on a few subsets.
Projecting onto the L1 ball does not increase L1 distance.
Sparse Sparse l1 Regression Regression
Variables: c()’s.
Constraint: |c() | · t
Minimize: Ex|P(x) – f(x)|
• L1(P, P’) · 2
• L1(P, P’) · 2t
• L2(P, P’)2 · 4t
PP’
Can take = 1/t2.
Sparse L1 Regression: Find a sparse polynomial P minimizing Ex[ |P(x) – f(x)| ].
[G.-Kalai-Klivans’08]:[G.-Kalai-Klivans’08]: Can get within Can get within of optimum in of optimum in poly(t,1/poly(t,1/)) iterations.iterations. Algorithm for Algorithm for SparseSparse ll11 RegressionRegression. .
First polynomial time algorithm for First polynomial time algorithm for Agnostically Learning Sparse Polynomials.Agnostically Learning Sparse Polynomials.
Agnostically Learning Agnostically Learning Decision TreesDecision Trees
Function f: D ! [-1,1], Orthonormal Basis B.
Sparse l2 Regression: Find a t-sparse polynomial P minimizing Ex[ |P(x) – f(x)|2 ].
Sparse l1 Regression: Find a t-sparse polynomial P minimizing Ex[ |P(x) – f(x)| ].
[G.-Kalai-Klivans’08]:[G.-Kalai-Klivans’08]: Given solution to Given solution to l2
Regression, can solve , can solve l1 Regression. Regression.
l1 Regression from Regression from l2
RegressionRegression
Problem: Can we agnostically learn DNFs in polynomial time? (uniform dist. with queries)