Introduction Additive Regression Generalized Additive Regression Rate of Convergence Model Selection Criteria Adaptive Methods Semiparametric and Nonparametric Additive Regression Models Matúš Maciak Department of Probability and Mathematical Statistics March 30, 2007 Matúš Maciak MFF UK - [email protected]Semiparametric and Nonparametric Additive Regression Models
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Semiparametric and NonparametricAdditive Regression Models
Matúš Maciak
Department of Probability and Mathematical Statistics
March 30, 2007Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Contents
1 IntroductionMotivationCurse of DimensionalityAdditive Decomposition
Estimates of functions p1, p2 W Kernel Density Estimation
fh(x) =
∑Ni=1 κh(Xi − x)Yi∑N
i=1 κh(Xi − x),
where κh is the multivariate, multiplicative kernel and h = (h1, . . . , hJ)is a vector of appropriate bandwidths.V Problems:“Curse of Dimensionality” and low asymptotic convergence...
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
Estimates of functions p1, p2 W Kernel Density Estimation
fh(x) =
∑Ni=1 κh(Xi − x)Yi∑N
i=1 κh(Xi − x),
where κh is the multivariate, multiplicative kernel and h = (h1, . . . , hJ)is a vector of appropriate bandwidths.V Problems:“Curse of Dimensionality” and low asymptotic convergence...
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
Additive approaches...
Let (X, Y ) ∈ RJ+1 be a pair of random variables such thatX = (X1, . . . , XJ) and Y is a real valued variable with a meanEY = µ and the finite second moment 0 < EY 2 ≤ K < ∞.
Consider an unknown regression function f : RJ → R of Yon X ∈ RJ so that f (x) = E[Y |X = x] (f : [0, 1]J → R).
We impose one more condition:
f (x1, . . . , xJ) = µ +J∑
j=1
fj(xj)
The functional components fj areuniquely determined and Efj(Xj) = 0.
Smoothness assumption remains...(smoothness of functional components)
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
Additive Estimates...
Let (X1, Y1), (X2, Y2), . . . , (XN , YN) denote an independentrandom sample, where each pair (Xi , Yi) has the samedistribution as (X, Y ).Estimates of the true underlying regression function are given bydifferent approaches (splines techniques, B-splines and kernelestimates)Semi-parametric (Nonparametric) estimate is based on therandom sample of size N - it can be written in the additive form:
fN(x1, . . . , xJ) = Y N +J∑
j=1
fNj(xj)
Regarding to the assumption on the functional components fj onehas to consider that
∑i=1,...,N fNj(Xij) = 0 for all j ∈ {1, . . . , J}.
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
Splines vs. Kernels
Spline estimates:
1 Semi-parametric approaches2 High-dimensional data3 Extra-large sample sizes4 No asymptotic distribution5 No uniform convergence over
the whole interval6 No measure of uniform
accuracy (except L2 norm)7 So called Sledge-hammer
technique
Kernel estimates:
1 Nonparametric techniques2 Too costly for large dimension3 Too costly for large sample
sizes N ∈ N4 Asymptotic (normal)
distribution (conf. intervals)5 Uniform convergence over the
whole interval6 So called Sharp-knife
technique
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
Splines vs. Kernels
Spline estimates:
1 Semi-parametric approaches2 High-dimensional data3 Extra-large sample sizes4 No asymptotic distribution5 No uniform convergence over
the whole interval6 No measure of uniform
accuracy (except L2 norm)7 So called Sledge-hammer
technique
Kernel estimates:
1 Nonparametric techniques2 Too costly for large dimension3 Too costly for large sample
sizes N ∈ N4 Asymptotic (normal)
distribution (conf. intervals)5 Uniform convergence over the
whole interval6 So called Sharp-knife
technique
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
Contents
1 IntroductionMotivationCurse of DimensionalityAdditive Decomposition
6 Adaptive MethodsWell-known algorithmsReal data - Example
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
The additive form of regression function
Is the true underlying regression function genuinely additive?
1 V YES: → the straightforward estimation of the functionalcomponents (just an occasional case)
2 V No: → then one has to find some additive approximation,which will be consequentially estimated
How to define a measure of accuracy between the underlyingregression function and its approximation???
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
The additive form of regression function
Is the true underlying regression function genuinely additive?
1 V YES: → the straightforward estimation of the functionalcomponents (just an occasional case)
2 V No: → then one has to find some additive approximation,which will be consequentially estimated
How to define a measure of accuracy between the underlyingregression function and its approximation???
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
The additive form of regression function
Is the true underlying regression function genuinely additive?
1 V YES: → the straightforward estimation of the functionalcomponents (just an occasional case)
2 V No: → then one has to find some additive approximation,which will be consequentially estimated
How to define a measure of accuracy between the underlyingregression function and its approximation???
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
The additive form of regression function
Is the true underlying regression function genuinely additive?
1 V YES: → the straightforward estimation of the functionalcomponents (just an occasional case)
2 V No: → then one has to find some additive approximation,which will be consequentially estimated
How to define a measure of accuracy between the underlyingregression function and its approximation???
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
Additive Decomposition - approximation
Consider a regression function f which is not genuinely additive.In such a case the regression function f can be successfullydecomposed into main effects (additive decomposition).
Condition 1
Let the distribution of X ∈ [0, 1]J is absolutely continuous and let itsdensity g is bounded away from zero and infinity(∃ b > 0 ∃B > b ∀x ∈ C = [0, 1]Jb ≤ g ≤ B.
The additive approximation to f can be obtained as a sumof J univariate functions f ∗j (xj) where
f ∗j (xj) = E[f (x)|Xj = xj
]− E
[f (x)
], x = (x1, . . . , xJ) ∈ [0, 1]J .
If there are interactions between some variables required onecan obtain them in a similar way...
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
Additive Decomposition – definiteness...
Lemma 1
Let the random variable∑
j hj(Xj) has a finite second moment where
hj are functions on [0, 1]. Set δ =√
(1− b/B) and let SD(·) denotesthe standard deviation. Then each hj(Xj) has a finite second momentand the next statement holds:
6 Adaptive MethodsWell-known algorithmsReal data - Example
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Regression splines
the first method - polynomial estimates (over-fitting, etc.)
polynomial regression with penalties (not used anymore)
To avoid some problems related to polynomial regression ⇒implementation of spline approaches (piecewise polynomial)
Definition 1 - Spline function
Spline is a piecewise polynomial function of nth degree, where singlepolynomial pieces joint together in the knot points, obeying continuityconditions for function itself and its n − 1 derivatives.
Problem: How to chose the number and the positions of knots?
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Regression splines
the first method - polynomial estimates (over-fitting, etc.)
polynomial regression with penalties (not used anymore)
To avoid some problems related to polynomial regression ⇒implementation of spline approaches (piecewise polynomial)
Definition 1 - Spline function
Spline is a piecewise polynomial function of nth degree, where singlepolynomial pieces joint together in the knot points, obeying continuityconditions for function itself and its n − 1 derivatives.
Problem: How to chose the number and the positions of knots?
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Regression splines - power basis
Spline estimation approaches are based on a set of basis functions:
1 Spline Power basis takes a following form:{1, x , x2, . . . xn, (x − ξ1)
n+, . . . (x − ξK )n
+} (n - spline order)2 The estimate of each functional component fj is defined as:
fj(xj) =∑n
l=0 β0jx lj +
∑Kk=1 βkn(xj − ξk )n
+
3 The estimate of the underlying additive regression function isdefined as a minimizing problem
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Regression splines with penalties
In a case we redefine the former minimization problem such that weprovide minimization with respect to knots positions ⇒ there is aproblem of over-smoothing (interpolation).
Regression Splines ⇒ Regression Splines with penalties
to ensure a better flexibility of the final estimate
to get an ability to control the amount of smoothness
The estimate of the true underlying regression function f = (fj , . . . , fJ)is given by the minimization problem:
MinimizeN∑
i=1
(Yi − Y N −J∑
j=1
fj(xij))2 + λ
∫ 1
0(f ′′(x))2dx ,
where λ is so called smoothing parameter.
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Regression splines with penalties
In a case we redefine the former minimization problem such that weprovide minimization with respect to knots positions ⇒ there is aproblem of over-smoothing (interpolation).
Regression Splines ⇒ Regression Splines with penalties
to ensure a better flexibility of the final estimate
to get an ability to control the amount of smoothness
The estimate of the true underlying regression function f = (fj , . . . , fJ)is given by the minimization problem:
MinimizeN∑
i=1
(Yi − Y N −J∑
j=1
fj(xij))2 + λ
∫ 1
0(f ′′(x))2dx ,
where λ is so called smoothing parameter.
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
B-splines basis
Consider B-splines basis of the nth order. Then it holds:1 Each B-spline function consist of n + 1 polynomial pieces2 Single pieces joint in n inner knots3 At the knot points - continuity condition up to the order n − 14 Each B-spline basis is positive over a domain spanned by n + 2
knots - everywhere else it is zero by definition5 B-spline function is overlapped by 2n another basis functions6 At any points x ∈ [0, 1] there are n + 1 nonzero basis functions
−2 0 2 4 6 8 10 12
0.00
0.10
0.20
x values
y va
lues
−5 0 5 10 15
0.00
0.04
0.08
x values
y va
lues
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
B-splines - estimation
The estimate of each functional component is written as a linearcombination of spline basis functions (piecewise polynomials ofthe degree n ∈ N).
fj∆(xj) =K+n+1∑
k=1
ϑjk ·Bkn(xj)
The estimate of the whole unknown regression function f isdefined as a following minimization problem
minϑjk∈R
N∑i=1
[Yi − Y N −
∑Jj=1 ϑjk · Bkn(xij)
]2
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
B-splines with penalties (P-splines)
In regard to ensure better control over smoothness and a betterflexibility of the final estimate, there were proposed B-splines withpenalties:
The estimate is given as a minimization problem
Minimize :N∑
i=1
(fN(Xi)− Yi)2 +
J∑j=1
λj ·∫ ξK+1
ξ0
(f ′′j∆(xj))2dxj
subject to basis coefficients ϑjk and parameter λ with the sameB-splines basis as in a case of simple B-splines estimates.
The optimal choice of the smoothing parameter λ⇒ Model Selection Criteria
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
B-splines with penalties (P-splines)
In regard to ensure better control over smoothness and a betterflexibility of the final estimate, there were proposed B-splines withpenalties:
The estimate is given as a minimization problem
Minimize :N∑
i=1
(fN(Xi)− Yi)2 +
J∑j=1
λj ·∫ ξK+1
ξ0
(f ′′j∆(xj))2dxj
subject to basis coefficients ϑjk and parameter λ with the sameB-splines basis as in a case of simple B-splines estimates.
The optimal choice of the smoothing parameter λ⇒ Model Selection Criteria
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Power basis vs. B-spline basis
Power basis:
1 Relation between a knot andcorresponding basis function
2 Greater correlation betweenbasis functions
B-spline basis:
1 Numerically much morestable set of basis functions
2 Smaller correlation betweenbasis functions
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Power basis vs. B-spline basis
Power basis:
1 Relation between a knot andcorresponding basis function
2 Greater correlation betweenbasis functions
B-spline basis:
1 Numerically much morestable set of basis functions
2 Smaller correlation betweenbasis functions
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Contents
1 IntroductionMotivationCurse of DimensionalityAdditive Decomposition
6 Adaptive MethodsWell-known algorithmsReal data - Example
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Additive Kernel Estimates - progress
Multidimensional estimate of the unknown regression function f⇒ consequentially we estimate single components f1, . . . , fJ .
1 Motivated by additive linear regression2 First iterative procedures (backfitting algorithm)3 Some other iterative procedures (RPR, PPR, MARS)4 Proposed so called Direct Integration Method - 1994
↪→ the statistical properties of such an estimate are straightforward toderive (bias, variance, asymptotical properties, confidence int., etc.)
↪→ the asymptotical normality of DIM estimates
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Additive Kernel Estimates - progress
Multidimensional estimate of the unknown regression function f⇒ consequentially we estimate single components f1, . . . , fJ .
1 Motivated by additive linear regression2 First iterative procedures (backfitting algorithm)3 Some other iterative procedures (RPR, PPR, MARS)4 Proposed so called Direct Integration Method - 1994
↪→ the statistical properties of such an estimate are straightforward toderive (bias, variance, asymptotical properties, confidence int., etc.)
↪→ the asymptotical normality of DIM estimates
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Direct Integration Method
Consider a multivariate unknown regression function f (x) which is inadditive form. Let X = (X1, X) ∈ R× RJ−1 and define the functionalϕ1(x1) as follows:
ϕ1(x1) =
∫ 1
0f (x1, x)p2(x)d x
Under the assumption about the additive form of functionf = (f1, . . . , fJ), it holds that ϕ1 = f1 up to the additive constant µ.
V Multivariate Nadaraya-Watson kernel estimateV Kernel estimates of functions f () and p2
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Direct Integration Method
the estimate of f1(x1) is given as a sample version of thefunctional ϕ1(x1):
f1(x1) =1N
N∑i=1
f (x1, Xi)
the estimate f1(x1) can be written in the form:
f1(x1) =N∑
i=1
wi(x1)Yi ,
where wi(x1) = n−1 ∑Ni=1 wi(x1, Xi). Weights wi(x1, Xi) are given
by the equation f (x1, X) =∑N
i=1 wi(x1, Xi)Yi
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Asymptotical normality of Kernel Additive Estimate
the functional components f2, . . . , fJ can be by obtained by thesimilar process, considering the functional ϕk (xk ) and a partition(Xk , X) ∈ R× RJ−1, and X = (X1, . . . , Xk−1, Xk+1, . . . XJ).
Theorem (Asymptotical normality)
Under some assumptions on N ∈ N and smoothness bandwidth hand g for Kernel estimates, it holds
N25[ϕj(xj)− ϕj(xj)
]→ N(bj(xj), vj(xj))
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Generalization into the GAM
In a case of binary data (survival time data) it is more convenient toimplement Generalized Additive Models - GAM.
1 Full model specification - the conditional distribution function ofY given X belongs to an exponential family - known link function
G[f (x)
]= µ +
J∑j=1
fj(xj)
2 Partial model specification - no restrictions on exponentialfamily - variance stays function unrestricted
If one takes a link function G to be an identity ⇒ Classical AdditiveRegression model (another choices: logit , probit , logarithm).
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
GAM - estimation
The estimation procedure is similar to that in Additive Kernelestimates.Let X = (X1, X) such that X = (X2, . . . , XJ). Let’s define ϕ1(x1):
ϕ1(x1) =
∫G[f (x1, x)] · p2(x)d x
Multidimensional Nadaraya - Watson kernel estimator⇒ nonparametric multivariate kernel estimates of p2 and f .
the estimate of f1 is unified with the estimate of ϕ1
the estimate of ϕ1 is given by the equation:
ϕ1(x1) =1N
N∑i=1
G[f (x1, Xi)], Xi = (Xi2, . . . , XiJ).
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
ADVANTAGES or DISADVANTAGES?
What is the main advantage of an additive approach?
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
The Optimal Global Rate of Convergence
The sequence {bN} is the optimal rate of convergence if:
limc→0
lim infN→∞
supf∈κ
P[‖ TN − f ‖q> c · bN
]= 1
limc→∞
lim supN→∞
supf∈κ
P[‖ TN − f ‖q> c · bN
]= 0
The optimal global rate of convergence given by Stone:
Theorem (Rate of Convergence for Nonparametric Estimates)
Let β ∈ (0, 1] and set p = k + β. Let 0 < q ≤ ∞ and set r = p−m2p+J .
Then the optimal global rate of convergence is
{N−r}, q ∈ (0,∞) {(N−1 · ln N)r}, q = ∞.
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Additive Reduction Principle
The effectiveness of the additive reduction principle onsimpleness and interpretability of the model.
“Curse of dimensionality” prevention.
The improvement of the optimal global rate of convergence.(r = p−m
2p+J −→ r = p−m2p+1 )
Predic
tor XPredictor Z
Response Y
0 5000 15000 25000
−20
−10
010
20
income
s(in
com
e,3.
12)
6 8 10 12 14 16
−20
−10
010
20education
s(ed
ucat
ion,
3.18
)
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Additive Expansion in L2 Norm
Consider and additive estimate fN of the regression function f .Set γ = 1/(2p + 1) and r = (p −m)/(2p + 1).
Theorem (Rate of Convergence for Additive Estimates)
Suppose that all necessary conditions hold. Let NN ∼ Nγ . Then:
Figure: The optimal global rate of convergence for the additive models in the case of twodimensional regression surface for the Supremum norm and the Euclidean norm.
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Optimal model selection
1 Spline estimates: - in a case of implementation of smoothingparameter λ one gets a set of “good” admissible models⇒ there come up a requirement to take only one
2 Penalized splines: - a set of all admissible models even moreincreases once we consider a minimization problem oversmoothing parameter λ and knots positions ∆ too.
3 Kernel regression: - a problem of a right selection of thesmoothing parameter h - a measure of localness(or a multiple bandwidth parameter h)
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Optimal model selection
1 Spline estimates: - in a case of implementation of smoothingparameter λ one gets a set of “good” admissible models⇒ there come up a requirement to take only one
2 Penalized splines: - a set of all admissible models even moreincreases once we consider a minimization problem oversmoothing parameter λ and knots positions ∆ too.
3 Kernel regression: - a problem of a right selection of thesmoothing parameter h - a measure of localness(or a multiple bandwidth parameter h)
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Optimal model selection
1 Spline estimates: - in a case of implementation of smoothingparameter λ one gets a set of “good” admissible models⇒ there come up a requirement to take only one
2 Penalized splines: - a set of all admissible models even moreincreases once we consider a minimization problem oversmoothing parameter λ and knots positions ∆ too.
3 Kernel regression: - a problem of a right selection of thesmoothing parameter h - a measure of localness(or a multiple bandwidth parameter h)
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Optimal model selection
1 Spline estimates: - in a case of implementation of smoothingparameter λ one gets a set of “good” admissible models⇒ there come up a requirement to take only one
2 Penalized splines: - a set of all admissible models even moreincreases once we consider a minimization problem oversmoothing parameter λ and knots positions ∆ too.
3 Kernel regression: - a problem of a right selection of thesmoothing parameter h - a measure of localness(or a multiple bandwidth parameter h)
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Model Selection Criteria
●
●
●
●●●
●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●●●●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●●●●●●●●
●●
●●●●●●
●
●
●
●●●
●
●
●
●
●
●●●
●
0.0 0.2 0.4 0.6 0.8 1.0
1.0
1.5
2.0
2.5
3.0
x values
y va
lues
Smoothing parameters:
lambda = 0.000 008
Cross-Validation
Generalized Cross-Validation
Akaike information Criterion
Bayesian InformationCriterion
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Model Selection Criteria
●
●
●
●●●
●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●●●●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●●●●●●●●
●●
●●●●●●
●
●
●
●●●
●
●
●
●
●
●●●
●
0.0 0.2 0.4 0.6 0.8 1.0
1.0
1.5
2.0
2.5
3.0
x values
y va
lues
Smoothing parameters:
lambda = 0.001 357
Cross-Validation
Generalized Cross-Validation
Akaike information Criterion
Bayesian InformationCriterion
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Model Selection Criteria
●
●
●
●●●
●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●●●●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●●●●●●●●
●●
●●●●●●
●
●
●
●●●
●
●
●
●
●
●●●
●
0.0 0.2 0.4 0.6 0.8 1.0
1.0
1.5
2.0
2.5
3.0
x values
y va
lues
Smoothing parameters:
lambda = 0.037 821
Cross-Validation
Generalized Cross-Validation
Akaike information Criterion
Bayesian InformationCriterion
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Model Selection Criteria
●
●
●
●●●
●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●●●●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●●●●●●●●
●●
●●●●●●
●
●
●
●●●
●
●
●
●
●
●●●
●
0.0 0.2 0.4 0.6 0.8 1.0
1.0
1.5
2.0
2.5
3.0
x values
y va
lues
Smoothing parameters:
lambda = 0.199 624
Cross-Validation
Generalized Cross-Validation
Akaike information Criterion
Bayesian InformationCriterion
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Model Selection Criteria
●
●
●
●●●
●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●●●●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●●●●●●●●
●●
●●●●●●
●
●
●
●●●
●
●
●
●
●
●●●
●
0.0 0.2 0.4 0.6 0.8 1.0
1.0
1.5
2.0
2.5
3.0
x values
y va
lues
Smoothing parameters:
lambda = 1.053 625
Cross-Validation
Generalized Cross-Validation
Akaike information Criterion
Bayesian InformationCriterion
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Well-known algorithmsReal data - Example
Contents
1 IntroductionMotivationCurse of DimensionalityAdditive Decomposition
6 Adaptive MethodsWell-known algorithmsReal data - Example
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Well-known algorithmsReal data - Example
Example: Polynomial Regression
Polynomial Regression Estimate- spline estimate of 3th degree.Life exp. v S12[Log(People/TV ), Log(People/physician)]
Log(people per physician)Log(people per TV)
Average Life E
xp.
Residual Sum of Squares:
[1] 10.56021
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Well-known algorithmsReal data - Example
Example: Additive Regression Model
Additive Regression Estimate - generalization of PPR- additive spline estimate of the 3rd degreeLife expectancy v S1[Log(People/TV )] + S2[Log(People/phys)]
Log(people per physician)Log(people per TV)
Average Life E
xp.
Residual Sum of Squares:
[1] 11.89261
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Well-known algorithmsReal data - Example
Example: Recursive Partitioning Regression
Recursive Partitioning Regression Estimate- locally constant estimate - spline of the 0rd degreeLife expectancy v
∑Vv=1 πv
∏{x∈Bv}
Log(people per physician)Log(people per TV)
Average Life E
xp.
Residual Sum of Squares:
[1] NaN
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Well-known algorithmsReal data - Example
Example: MARS Algorithm
Multivariate Adaptive Regression Splines - MARS- modification of the RPR algorithmLife expectancy v s0 +
∑Vv=1 svBv (x)
Log(people per physician)Log(people per TV)
Average Life E
xp.
Residual Sum of Squares:
[1] 11.32777
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Well-known algorithmsReal data - Example
Example: Projection Pursuit Regression
Projection Pursuit Regression - PPR- projection into the lower dimensionsLife expectancy v
∑Vv=1 gv (bT
v x)
Log(people per physician)Log(people per TV)
Average Life E
xp.
Residual Sum of Squares:
[1] 6.129001
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Well-known algorithmsReal data - Example
Additive Regression Models with Regression Splines