This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Chapter 4: Introduction to Predictive Modeling: Regressions
4.1 Introduction
4.2 Selecting Regression Inputs
4.3 Optimizing Regression Complexity
4.4 Interpreting Regression Models
4.5 Transforming Inputs
4.6 Categorical Inputs
4.7 Polynomial Regressions (Self-Study)
2
Chapter 4: Introduction to Predictive Modeling: Regressions
4.1 Introduction4.1 Introduction
4.2 Selecting Regression Inputs
4.3 Optimizing Regression Complexity
4.4 Interpreting Regression Models
4.5 Transforming Inputs
4.6 Categorical Inputs
4.7 Polynomial Regressions (Self-Study)
3
Model Essentials – Regressions
Predict new cases.
Select useful inputs.
Optimize complexity.
...
4
Model Essentials – Regressions
Best modelfrom sequence
Sequentialselection
Predict new cases.
Select useful inputs
Optimize complexity
Select useful inputs.
Optimize complexity.
...
6
Linear Regression Prediction Formula
parameterestimate
inputmeasurement
interceptestimate
= w0 + w1 x1 + w2 x2 ^ ^ ^y · · prediction
estimate^
Choose intercept and parameter estimates to minimize:
∑( yi – yi )2
trainingdata
^squared error function
...
7
Linear Regression Prediction Formula
parameterestimate
inputmeasurement
interceptestimate
= w0 + w1 x1 + w2 x2 ^ ^ ^y · · prediction
estimate^
Choose intercept and parameter estimates to minimize.
∑( yi – yi )2
trainingdata
^squared error function
...
8
Logistic Regression Prediction Formula
= w0 + w1 x1 + w2 x2 ^ ^ ^· · logit scores
...
^log
p
1 – p( )^
9
Logit Link Function
= w0 + w1 x1 + w2 x2 ^ ^ ^· ·
...
logitlink function
0 1
5
-5
The logit link function transforms probabilities (between 0 and 1) to logit scores (between −∞ and +∞).
^log
p
1 – p( )^
logit scores
10
Logit Link Function
= w0 + w1 x1 + w2 x2 ^ ^ ^· · logit scores
...
logitlink function
0 1
5
-5
The logit link function transforms probabilities (between 0 and 1) to logit scores (between −∞ and +∞).
^log
p
1 – p( )^
11
Logit Link Function
= w0 + w1 x1 + w2 x2 ^ ^ ^· ·
...
^log
p
1 – p( )^
1
1 + e-logit( p )p = ^^
^logit( p )
To obtain prediction estimates, the logit equation is solved for p. ^
=
12
Logit Link Function
= w0 + w1 x1 + w2 x2 ^ ^ ^· ·
...
^log
p
1 – p( )^
1
1 + e-logit( p )p = ^^
^logit( p )
To obtain prediction estimates, the logit equation is solved for p. ^
=
13
Logit Link Function
...
14
Simple Prediction Illustration – Regressions Predict dot color for each x1 and x2.
You need intercept and parameter estimates.
...
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
x1
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x2
0.40
0.50
0.60
0.70
15
Simple Prediction Illustration – Regressions
You need intercept and parameter estimates.
...
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
x1
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x2
0.40
0.50
0.60
0.70
16
Simple Prediction Illustration – Regressions
log-likelihood function
Find parameter estimates by maximizing
...
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
x1
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x2
0.40
0.50
0.60
0.70
17
Simple Prediction Illustration – Regressions
log-likelihood function
Find parameter estimates by maximizing
...
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
x1
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x2
0.40
0.50
0.60
0.70
18
Simple Prediction Illustration – Regressions
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
x1
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x2
0.40
0.50
0.60
0.70
Using the maximum likelihood estimates, the prediction formula assigns a logit score to each x1 and x2.
...
19
20
4.01 Multiple Choice PollWhat is the logistic regression prediction for the indicated point?
a. 0.243
b. 0.56
c. yellow
d. It depends.
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
x1
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x2
0.40
0.50
0.60
0.70
21
4.01 Multiple Choice Poll – Correct AnswerWhat is the logistic regression prediction for the indicated point?
a. 0.243
b. 0.56
c. yellow
d. It depends.
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
x1
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x2
0.40
0.50
0.60
0.70
22
Regressions: Beyond the Prediction Formula
Manage missing values.
Interpret the model.
Account for nonlinearities.
Handle extreme or unusual values.
Use nonnumeric inputs.
...
23
Regressions: Beyond the Prediction Formula
Manage missing values.
Interpret the model.
Account for nonlinearities.
Handle extreme or unusual values.
Use nonnumeric inputs.
...
24
Missing Values and Regression Modeling
Training Datatargetinputs
Problem 1: Training data cases with missing values on inputs used by a regression model are ignored.
...
25
Consequence: missing values can significantly reduce your amount of training data for regression modeling!
Missing Values and Regression Modeling
Training Datatargetinputs
Problem 1: Training data cases with missing values on inputs used by a regression model are ignored.
...
26
Missing Values and Regression Modeling
Consequence: Missing values can significantly reduce your amount of training data for regression modeling!
Training Datatargetinputs
...
27
Missing Values and the Prediction Formula
Predict: (x1, x2) = (0.3, ? )
Problem 2: Prediction formulas cannot score cases with missing values.
...
28
Missing Values and the Prediction Formula
Predict: (x1, x2) = (0.3, ? )
Problem 2: Prediction formulas cannot score cases with missing values.
...
29
Missing Values and the Prediction Formula
...
Problem 2: Prediction formulas cannot score cases with missing values.
30
Missing Values and the Prediction Formula
...
Problem 2: Prediction formulas cannot score cases with missing values.
31
Missing Value Issues
Manage missing values.
Problem 1: Training data cases with missing values on inputs used by a regression model are ignored.
...
Problem 2: Prediction formulas cannot score cases with missing values.
32
Missing Value Issues
Manage missing values.
Problem 1: Training data cases with missing values on inputs used by a regression model are ignored.
...
Problem 2: Prediction formulas cannot score cases with missing values.
33
Missing Value Causes
Manage missing values.
Non-applicable measurement
No match on merge
Non-disclosed measurement
...
34
Missing Value Remedies
Manage missing values.
xi = f(x1, … ,xp)
Non-applicable measurement
No match on merge
Non-disclosed measurement
...
35
Managing Missing Values
This demonstration illustrates how to impute synthetic data values and create missing value indicators.
36
Running the Regression Node
This demonstration illustrates using the Regression tool.
37
Chapter 4: Introduction to Predictive Modeling: Regressions
4.02 PollThe three sequential selection methods for building regression models can never lead to the same model for the same set of data.
True
False
61
4.02 Poll – Correct AnswerThe three sequential selection methods for building regression models can never lead to the same model for the same set of data.
True
False
62
Selecting Inputs
This demonstration illustrates using stepwise selection to choose inputs for the model.
63
Chapter 4: Introduction to Predictive Modeling: Regressions
This demonstration illustrates how to add polynomial regression terms autonomously.
100
Exercises
This exercise reinforces the concepts discussed previously.
101
Regression Tools ReviewReplace missing values for interval (means) and categorical data (mode). Create a unique replacement indicator.
Create linear and logistic regression models. Select inputs with a sequential selection method and appropriate fit statistic. Interpret models with odds ratios.
Regularize distributions of inputs. Typical transformations control for input skewness via a log transformation.
continued...
102
Regression Tools Review
Consolidate levels of a nonnumeric input using the Replacement Editor window.
Add polynomial terms to a regression either by hand or by an autonomous exhaustive search.