Chapter 1: Introduction to Predictive Modeling: Decision Trees 1,1 Introduction 1.2 Cultivating Decision Trees 1.3 Optimizing the Complexity of Decision Trees 1.4 Understanding Additional Diagnostic Tools (Self-Stud^ 1.5 Autonomous Tree Growth Options (Self-Study)
1
Chapter 1: Introduction to Predictive Modeling: Decision Trees 1.1 Introduction 1.2 Cultivating Decision Trees 1.3 Optimizing the Complexity of Decision Trees 1.4 Understanding Additional Diagnostic Tools (Self-Study) 1.5 Autonomous Tree Growth Options (Self-Study)
2
1
Predictive Modeling T h e Essence of Data Mining
"Most of the big payoff [in data mining] has been in predictive modeling." - Herb Edelstein
Predictive Modeling Applications Database marketing Financial risk management Fraud detection Process monitoring Pattern detection
4
Predictive Modeling Training Data Training Data
warn mm mm mm mm
M
Training data case; categorical or numeric input and target measurements
M
i
5
Predictive Modeling Training Data Training Data RT. inputs ' J3I
r £Hf
l
i —
h i
B H • H
MHH
target 1
mm • H H i mm Hi H mm H i H i mm
• i
Training data case: categorical or numeric input and target measurements
HB
6 3
Predictive Model Training Data input
—
•
Predictive model: a concise representation of the input and target association
mm mm i mm mm i L
3
•11
§sas
Predictive Model Training Data
target i mmm \ & inputs i mm& CZJ wmm mm n m*
Predictive model: a concise representation of the input and target association
mm
8 4
Predictive Model Training Data
9
Predictive Model predictions
Predictions: output of the predictive model given a set of input measurements
10
5
Predictive Model
Predictions: output of the predictive model given a set of input measurements
zzz
ยงsasj8s.
Modeling Essentials jj
Predict new cases. Select useful inputs. Optimize complexity.
12
6
Modeling Essentials Predict new cases.
13
Modeling Essentials Predict new cases.
14
Three Prediction Types Inputs
mm mm mm m%mwm mm mm mm H i H i H i ! mm H i H i m mm
prediction 1
decisions
mmm mmm
rankings estimates
1
1
;
15
Three Prediction Types decisions rankings estimates
16
8
Decision Predictions
primary
A predictive model uses input measurements to make the best decision for each case.
secondary
17
§sas
Decision Predictions •1
I
K
1
inputs
53
H6S9 secondary tertiary primary
A predictive model uses input measurements to make the best decision for each case.
Ranking Predictions inp
rn
mm • H i mm mm mm mm mm wm mm mm mm mm mm
520 580 470
A predictive model uses input measurements to optimally rank each case.
630
19
§sas
Ranking Predictions SHI
T
HHH
inputs
mmm
\
720 520 580 470
i
i
•
•
•
630
20 10
A predictive model uses input measurements to optimally rank each case.
Estimate Predictions H
E
inputs "5 |
mm mm mm wm mm WM H i
MM
••1 mm mm mm
0.33 0.54 0.28
A predictive model uses input measurements to optimally estimate the target value.
21
§sas
Estimate Predictions ;
HHi
Wt- inputs <9
HHI
prediction
0.33 0.54
1
1
22 11
A predictive model uses input measurements to optimally estimate the target value.
Modeling Essentials - Predict Review Predict new cases.
D e c i d e , rank, a n d estimate.
1.01 Quiz Match the Predictive Modeling Application to the Decision T y p e . Predictive Modeling Application
Decision Type
Loss Reserving Risk Profiling
A. B.
Decision Ranking
Credit Scoring Fraud Detection Revenue Forecasting Voice Recognition
C.
Estimate
25
101 Quiz - Correct Answer Match the Predictive Modeling Application to the Decision T y p e . Predictive Modeling Application
c B B A C A
Decision Type
Loss Reserving Risk Profiling Credit Scoring Fraud Detection Revenue Forecasting Voice Recognition
26
13
A.
Decision
B.
Ranking
c.
Estimate
Modeling Essentials
^
Select useful inputs.
27
§sas
:
The Curse of Dimensionality 1-D
• • • • • • • • 2-D
3-D
28
14
•
©
•
•
/
ยงSaS sba
1.02 Quiz H o w m a n y inputs are in your modeling data? Mark your selection
in the poiling
30
15
area.
Input Reduction - Redundancy Redundancy
Irrelevancy
31
Input Reduction - Redundancy Redundancy
Input x has the same information as input x 2
v
16
'Mmt
Input Reduction - Irrelevancy Irrelevancy
Predictions change with input x but much less with input x . 4
3
33
ยงsas
Modeling Essentials - Select Review
Eradicate redundancies and irrelevancies,
Select useful inputs
34
17
ยงsas!^a
Modeling Essentials - Select Review
Select useful inputs.
35
Modeling Essentials
Optimize complexity.
36 18
Eradicate redundancies and i r r e l e v a n c i e s .
Model Complexity Too c o m p l e x
Not c o m p l e x enough
37
ยงsas
Data Partitioning Nidation Data
Trainina Data
mm mm mm mm
mm
mm
-
mm
mm mm mm mm
โ ขI
Partition available data into training and validation sets.
38
19
mm
mm
Data Partitioning
39
Predictive Model Sequence Traininq Data
i
Validation Di
mm mm
1
mm mm mm mm mm mm E H
h
mm mm
mm mm mm mm wm mm 1.1 H mm :
Create a sequence of models with increasing complexity.
40
Model Complexity 20
Create a sequence of models with increasing complexity. ^ 5 Model Comnipxitv 41
Model Performance Assessment Traininq Data
Validation Data
Rate model performance using validation data.
42
21
Model Performance Assessment Validation Data
Trainina Data
Inputs MM
• ••i
MM
m MM
mm
5 5
II
W
l i •
mm* wm
* * * * * Rate model ^Wn^ performance using ^ - v validation data.
Validation Comolexitv Assessment
43
§sas
Model Selection Trainina Data
MM
Validation D
nsif MM
m
• i
5^
MM
n
• • MM
* * * * * Select the simplest iWrt^t model with the highest a&wr& validation assessment.
e/
Validation
44 22
Model Selection Validation Data inputs
target
mm e n â&#x20AC;˘ i warn mmM mmW mm mwm mm mm mm mm
Select the simplest model with the highest validation assessment.
45
Model Validation Comolexitv Assessment
§sas
Modeling Essentials - Optimize Review
Optimize complexity.
46
23
T u n e m o d e l s with validation data.
Creating Training and Validation Data This demonstration illustrates how to partition a data source into training and validation sets
47
L__
ยงsas
Chapter 1: Introduction to Predictive Modeling: Decision Trees 1.1 Introduction
1.2 Cultivating Decision Trees .
. .
1.3 Optimizing the Complexity of Decision Trees 1.4 Understanding Additional Diagnostic Tools (Self-Study) 1.5 Autonomous Tree Growth Options (Self-Study)
48
24
I B B
Predictive Modeling Tools Primary
£
Regression
Decision Tree
ft*
r-teurai r-fetw on-
b» ] ( S r i ^"l T l
• M
^
rie.rt*
49
§sas
Predictive Modeling Tools Primary
| I
k^-^
Gradient ^Boosing
v Dmne Regression
Specialty
MBR
f
v Paniai Leasi Sguoras
. IARS
k £
6 50
25
Predictive Modeling Tools
>. _ - l : â&#x20AC;˘-'
P*
- .* 1
J
Multiple
,\BlS8flt!te
Tw o9ags
51
Model Essentials - Decision Trees Predict new cases.
Prediction rules
Select useful inputs.
Split search
Optimize complexity.
Pruning
52
26
Model Essentials - Decision Trees ||
Predict new cases.
^
Select useful inputs.
Split search
1
Optimize complexity.
Pruning
Prediction rules
53
Simple Prediction Illustration Training Data 1.0 0.9
Predict dot color for each x * and x . 9
X
2
0.5 0.4 0.2
54
27
• .V V * If
•
•. • •
•
Simple Prediction Illustration Training Data
Decision Tree Prediction Rules root node
56
28
Decision Tree Prediction Rules Predict:,. root node <0.63
Š
>0.63
interior node
leaf node 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 [
1
57
§sas
Decision Tree Prediction Rules Predict:
\
53
29
Decision = Estimate = 0.70
Decision Tree Prediction Rules .. .
Predict* n
^
©
>0.63
°'
Decision =
_ .. , , V Estimate = 0.70
*v
ft n
•
-
9
©
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
59
§sas
~™
Model Essentials - Decision Trees Predict new cases.
Prediction rules
60 30
Model Essentials - Decision Trees Predict new cases. ^
Prediction rules
Select useful inputs,
Split search
61
Model Essentials - Decision Trees
Select useful inputs.
62 31
Split search
Decision Tree Split Search left
right 0. 0. 0.
Confusion Matrix
°-
x o. 2
Calculate the logworth of every partition on input Xj.
Decision Tree Split Search left
right
0.52
0.9
;\*
0.8
»
0.7 0.6
Confusion Matrix X
2
0.5 0.4
Calculate the logworth of every partition on input x v
0.3 0.2 0.1 0.0
m m
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.
64 32
•
LVTsUti-rBl
Decision Tree Split Search left
.•MB 53%
right 42%
max logworth(x) 0.95 7
47%
58%
Xn
0
Select the partition with the maximum logworth. 0.0 0.1 0.2 0.3 0,4 0.5 0.6 0.7 0.8 0.9 1.0 •1
65
33
Decision Tree Split Search
bottom
top
0.0 0.1 0.2 0,3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
1
[
67
ยงsas
Decision Tree Split Search
0.63
54%
35%
46%
65%
max logworth(/ ) 4.92 2
0.0 0. i
68
34
u.z U.J
0.9 1.0
Decision Tree Split Search left 53% 47%
right
â&#x20AC;˘M
42% 58%
max logworth(xJ 0.95
_ l
bottom
fop
54%
35%
Compare partition logworth ratings. max logworth(x) 4.92 2
46%
65%
69
35
Decision Tree Split Search
<0.63 ÂŽ
>0.63
0.63
Š Create a partition rule from the best partition across all inputs.
71
Decision Tree Split Search
1.0
<0.63
>0.63
X
2
0.5
Repeat the process in each subset. 0.0 0.0 0,1 0.2 0,3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
72
36
Decision Tree Split Search
â&#x20AC;˘
left
|
right 1.0 1
1 0.8 1 0.7 1 0.9
0.6 1 X
2
0.5 0.4 0.3 0.2 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
73
Decision Tree Split Search
61%
55%
39%
45%
max logworth(j^) 5.72
0.0 0.1 0.2 0.3
74
37
U.9
0.7 0.8 0.9 1.0
Decision Tree Split Search
bottom
top
0.0 0.1 0.
0.5 0.6 0.7 0.8 0.9 1.0
75
Decision Tree Split Search
38%
55%
max logworth(x) â&#x20AC;˘2.01 2
62%
45%
0.02 0.0 0.1 0.2 0,3 0.4 0.5 0.6
76
38
Decision Tree Split Search
•
left
right
61%
55%
39%
45%
1.0
max logworth^) 5.72
0.9 0.7 0.6
•
bottom
top
38%
55%
X
2
0.5 0.4
max logworth(x) •2.01
0.3 0.2
2
62%
45%
0.0 0.1 0.2
77
39
0.5 0.6 0.
Decision Tree Split Search
0 <0.63
w
>0.63
«...
<0.52 ~
>0.52
Create a second partition rule.
0.0 0.
0.6 0.7 0.8 0.9 1.0
79
§sas
Decision Tree Split Search
<0.63 ®
>0.63
• •
<0.52^>0.52
Create a second partition rule.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0. o
80 40
u.y i.
Decision Tree Split Search
Repeat to form a maximal tree.
0.9 1.0
0.0 0.1 0.2 0.3 0.4 0.5
81
ยงsas
Model Essentials - Decision Trees
Select useful inputs
82 41
Split search
Constructing a Decision Tree Predictive Model This demonstration illustrates constructing a decision tree model interactively.
83
ยงsas
Chapter 1: Introduction to Predictive Modeling Decision Trees 1.1 Introduction 1.2 Cultivating Decision Trees
1.3 Optimizing the Complexity of Decision Trees 1.4 Understanding Additional Diagnostic Tools (Self-Study) 1.5 Autonomous Tree Growth Options (Self-Study)
84
42
£
£
,__
.
1
.,
l » §
jgjggj,
Model Essentials - Decision Trees
Optimize complexity.
85
§sass™,
Predictive Model Sequence Training Data
Validation DaU
•5"
Create a sequence of models with increasing complexity.
86
43
&
&wk £ ® -
The Maximal Tree Training Data
•
Validation Data
Create a sequence of models with increasing complexity. A maximal tree is the most complex model in the sequence. Mods'
The Maximal Tree Training Dai
-
•
•
lidation Data
m
•M H I MM
1 • MM -
" -
Create a sequence of models with increasing complexity. A maximal tree is the most complex model in the sequence. 88
Complexity 44
A maximal tree is the most complex model in the sequence. Model 89
Pruning One Split Validation Data
The next model in the sequence is formed by pruning one split from the maximal tree.
jÂŁ\ X 90 45
Pruning One Split Training Data
Validation Data
Inputs
Urgot
1 1 MM
H-\-
3
4>
The next model in the sequence is formed by pruning one split from the maximal tree.
A
Model Complexity
91
ยงsas
Pruning One Split Training Dati
Validation Data
to
Model 92
46
Each subtree's predictive performance is rated on validation data.
Pruning One Split Training Data M i 1 MHftpUMt j H I
Validation Data Inputs mW
mmm
mmm
target
m mm mm
mmm
• •
The subtree with the highest validation assessment is selected.
* *
• • • o
Model Complexity
93
§sas
Pruning One Split Training Data
Validation Data mm
mm
m
i
mm
mm
The subtree with the highest validation assessment is selected. 94
47
Pruning Two Splits Training Data mmm
™
Validation Data
mmm
HH mmm
mmm mmm mmm mmm HH
™
—
s s ™
Similarly this is done for subsequent models.
95
Model Complexity
§sas
Pruning Two Splits Validation Data
Prune two splits from the maximal tree,...
•x
96
Model Complexity 48
mmW
Pruning Two Splits Training Data
Validation Data
inputs
target
H4S\
^ * .
s\ ^
Prune two splits from the maximal tree,...
•<>>
4k
Model 97
§sas
Pruning Two Splits Validation Data
^ r ^ r * ^ . ••
98 49
...rate each subtree using validation assessment, and
Pruning Two Splits Training Data
Validation Data
â&#x20AC;˘.. ^
_
_
mmm
mmm mmm mmm mmm mmm mmm H H mmm
A
...select the subtree with the best assessment rating.
Mode/ 99
mplexity
§sas
Pruning Two Splits Validation Data
M i b b mmm mmm
2
A
100
...select the subtree with the best assessment rating.
Complexity 50
Subsequent Pruning Training Data HHH 1 IHBiiflHHH j H R rargor
Validation Data
| |Z 3 C 3 U .
Continue pruning until all subtrees are considered.
Mode/ 101
Subsequent Pruning n Data
Training Data
i
' I B H I H H H^H mmm
mm
Continue pruning until all subtrees are considered.
.... 102
51
Subsequent Pruning Training Data
Validation Data
HH : HHRijhHHI HfJ HH M i H H :::•;•] HH HH HH
M HH HH HH
H H |. largei HH HH HH HH HH HH HH HH
Continue pruning until all subtrees are considered.
Model Complexity
103
§sas
Subsequent Pruning Validation Data
Training Dati HHB HHUiV
mi llll mi HH
H H H H H HH H HH
Continue pruning until all subtrees are considered.
A' A
104 52
Subsequent Pruning Training Data r
•jpj
Validation Data
mmm i P?- inputs ^mm\ mm mmm pjpj M I mmm
inputs
•••
mmm mmm
—
™
Continue pruning until all subtrees are considered.
A Model 105
r.nmntoYi
Subsequent Pruning Training Data
rn
Validation Data
in
• H M M H
—
Continue pruning until all subtrees are considered.
106 53
Selecting the Best Tree Training Data inputs
mm
BOM
Validation Data
- — —
mmm
HHH ' HHHI
• • mmm mmm mmm mm mm HHH
Compare validation &&&& assessment between tree complexities.
•:•> Validation
Model 107
Selecting the Best Tree idation Data inputs
mm mm
HHHI i
ess
mm
1 H :
j
•
yy^j ^ gu^pgg should c
be selected as the best
A
m
o
Validation 108 54
d
e
l
?
Validation Assessment Validation Data
Training Data
Choose the simplest model with the highest validation assessment.
A
4k
Validation
Model 109
Validation Assessment Validation Data In
mm
t
H
mmm
H mm mm
mWM
mm mm
H H H I I
â&#x20AC;˘ H
-
mm flHBf HBH flVATflmm B D mm mm
What are appropriate validation assessment ratings'?
A
110 55
Validation Assessment Validation Data inputs
••J
mmm
i
mmmmmm
Validation Data mma^nm H ^ B I H
H
S
Hs mmm
HHB BBH H U BSSMD
What are appropriate validation assessment ratings?
111
Assessment Statistics
Validation Data •
Inputs
• •—mm
urn
m5 1
• •
Ratings depend on...
[ • —E H H •
target measurement i
K B
Hi
112 56
prediction type.
Assessment Statistics
Validation Data Inputs
•
—
'
Ratings depend on...
fargef
•m* •« mm mm •«• mm
target measurement prediction type.
EHgtaj • • • • • • • •
••••••
Binary Targets
topufs
i n
fzjzjy
target 1
primary outcome
0
secondary outcome
1 1
114
57
.. i -
....
.
Binary Targets
/npute m
m
m
•
•ma
mm
wm
1
primary outcome
0
secondary outcome
0
;
116
58
s,
'^' - 'Mi-' s
118
59
Decision Optimization
—
H H
primary
1
decisions
0
mm
0 E—:--^s
M M
119
§sas
Decision Optimization - Accuracy
true positive true negative Maximize accuracy: agreement between outcome and prediction
120
60
Decision Optimization - Accuracy
0
primary
true positive
| secondary
true negative Maximize accuracy: agreement between outcome and prediction
121
§sas
Decision Optimization - Misclassification
target
• H kVH HH mmm H U H mmm H H HH
_
HH
m
•
m
m
1
paction secondary primary
0
520
H
mmm HHOH
122 61
\
false negative false positive Minimize misclassification: disagreement between outcome and prediction
Decision Optimization - Misclassification
ll'l
•
•jpjpj •jpjpj • • • •
mm
i l l S S
::i
mm B W M M
•HH
secondary primary
0 ' PJBJBJI BHJj HHR/;
false negative false positive Minimize misclassification: disagreement between outcome and prediction
1
HHHI H H BBS
123
Ranking Optimization
124
62
Ranking Optimization
mm mm • •
—
am • •
0
ZIZ Z
520
rankings
720
125
§sas
Ranking Optimization - Concordance
prediction mm mm
~
520
muz
1
t a r g e t = 0 — » l o w score target=l—>high score
A
0.249 Maximize concordance:
i
proper ordering of primary and secondary outcomes
126
63
Ranking Optimization - Concordance
target=0—>low score target=l—>high score Maximize concordance: proper ordering of primary and secondary outcomes
127
§sas
Ranking Optimization - Discordance
target i prediction
t a r g e t = 0 — » h i g h score target=l—>low score Minimize discordance: improper ordering of primary and secondary outcomes
128
64
Ranking Optimization - Discordance
m
H H
• H i
H H
H
_
H
H
m
secondary
m
H H
0
primary
H
0
. * .
m
1
mm
— V 0.24
1
t a r g e t = 0 ^ h i g h score target=l—>low score Minimize discordance: improper ordering of primary and secondary outcomes
129
Estimate Optimization
estimates
130
65
Estimate Optimization
131
Minimize squared error. squared difference between target and prediction 132
66
Estimate Optimization - Squared Error
ary
0.249
||
(target-estimate)
2
Minimize squared error. squared difference between target and prediction 133
Complexity Optimization - Summary
••
mm ~
•OB
mm mm 'mm
_ jllll
mm mm mm
decisions 0
accuracy / misclassification
0
rankings
1
concordance/ discordance
estimates
1
squared error
134
67
Assessing a Decision Tree This demonstration illustrates how to assess a tree model.
;
ยง s a s @s.
Chapter 1: Introduction to Predictive Modeling: Decision Trees 1.1 Introduction 1.2 Cultivating Decision Trees 1.3 Optimizing the Complexity of Decision Trees 1.4 Understanding Additional Diagnostic Tools (Self-Study) 1.5 Autonomous Tree Growth Options (Self-Study)
136
68
Understanding Additional Plots and Tables (Optional) This demonstration presents several additional plots and tables f r o m the S A S Enterprise Miner Tree Desktop Application.
137
Chapter 1: Introduction to Predictive Modeling: Decision Trees 1.1 Introduction
0k
1.2 Cultivating Decision Trees 1.3 Optimizing the Complexity of Decision Trees 1.4 Understanding Additional Diagnostic Tools (Self-Study)
1.5 Autonomous Tree Growth Options (Self-Study)
69
Autonomous Decision Tree Defaults Default Settings
^ / * \ Maximum Branches Splitting Rule Criterion X*^
Subtree Method
2
Logworth
Average Profit
Tree Size Options
Bonferroni Adjust Split Adjust Maximum Depth Leaf Size
139
Tree Variations: Maximum Branches
Trades height for width
Uses heuristic shortcuts
140
70
Tree Variations: Maximum Branches Variables Interactive Use Frozen Tree Use Multiple Targets
•J a No
'interval Criterion ;• fiominal Criterion ^Ordinal Criterion i Signif cance Level h Missing Values iUse Input Onc£r Maximum Branch i'Maximum Dea\h -Minimum Categorical Size
ProbF ProbChlsq Entropy |0.2 Use In search No
Maximum branches in split
?
6 |S
r Leaf Size 5 i Number of Rules 5 t-Number of Surrogate Rules'O '•Split Size i-Exhaustive -Node Sample
500D 20000
hMethad I Number of Leaves '•Assessment Measure "Assessment Fraction
Assessment 1 Average Sguare Error 0 25
Exhaustive search size limit
141
Ssas
Tree Variations: Splitting Rule Criterion Yields similar splits
Grows enormous trees
•
Favors many-level inputs
142
71
Tree Variations: Splitting Rule Criterion Property
Split Criterion
PrcbF I Interval Criterion Nominal Criterion ProbChisq Ordinal Criterion Entropy rSignlllcanceLevBl tot rMissing Values Use in search rUse Input Once •Maximum Branch Maximum Depth -Minimum Categorical Size 5 |-Leaf Size •Number of Rules [-Number of Surrogate Rules 0 - Spin 5Cza 1 L
[•Exhaustive "Node Sample Method Number ot Leaves Assessment Measure Assessment Fraction
5000 20000 Assessment
Categorical Criteria
Variance Prob-F logworth
Avera ge Square E nor 025
-Perform Cross Va I Id ati on Number of Subsets 10 Number ot Repeats Seed osrrvatlon Bisea imperial Observation Based imp_orlaiNci_ Number Single Var ImportarS
Logworth adjustments
hBcnferronl Adjustment i-Time of Kass Adjustment Before i-tnputs ';• Number of Inputs SplitArtjuslir^-ii Yes 1
143
Tree Variations: Subtree Method
144
72
Interval Criteria
Tree Variations: Subtree Method
Pruning options Pruning metrics Decision Average Square Error Misclassification Lift 145
Tree Variations: Tree Size Options Avoids orphan nodes
146
73
Tree Variations: Tree Size Options Value
Property : Interval Criterion r-Nomlnal Criterion '-Ordinal Criterion
'robF ProbChisq Entropy
Logworth threshold
i Significance Level 0.2 i-Missinrj Values !use In search JO • Use Input Once i-Maximum Droncii 1 [-Maximum Depth '-Minimum Categorical Size 5
'
; Leaf Size • Number of Rules i-Number of Surrogate Rules Split Size
1|
Maximum tree depth Minimum leaf size
;
i-Exhaustive ' Node Sample
5000 20000
!- Method • NumberoILeavas • Assessment Measure "Assessment Fraction
Assessment fl Average Square Error 0.15
[-Perform Cross Validation No [ Number of Subsets 10 ' Number of Repeals 'Seed 12345
_____
^Observation Based ImportaiNo -Number Single VarlmpoilarS •• Bonferroni Adjustment I Time of Kass Adjustment r Inputs i Numberoflnputs Split Adiustmsnl ;
147
Yes Before No II Yes
Threshold depth adjustment
1
M
Exercises T h e s e exercises reinforce the concepts discussed previously.
148
74
Decision Tree Tools Review Partition raw data into training and validation sets.
Interactively grow trees using the Tree Desktop application. You can control rules, selected inputs, and tree complexity. A u t o n o m o u s l y grow decision trees based o n property settings. Settings include branch count, split rule criterion, subtree m e t h o d , and tree size options.
149
75
Chapter 2: Introduction to Predictive Modeling: Regressions 2.1 Introduction 2.2 Selecting Regression Inputs 2.3 Optimizing Regression Complexity 2.4 Interpreting Regression Models 2.5 Transforming Inputs 2.6 Categorical Inputs 2.7 Polynomial Regressions (Self-Study)
ยง s a s sโ ข.
Chapter 2: Introduction to Predictive Modeling Regressions 2.1 Introduction 2.2 Selecting Regression Inputs 2.3 Optimizing Regression Complexity 2.4 Interpreting Regression Models 2.5 Transforming Inputs 2.6 Categorical Inputs 2.7 Polynomial Regressions (Self-Study)
77
J
ยงsas
Model Essentials - Regressions Predict new cases.
Prediction formula
Select useful inputs.
Sequential selection
Optimize complexity.
Best model from sequence
ยงsas
Model Essentials - Regressions Predict new cases.
Prediction formula
Select useful inputs.
Sequential selection
Optimize complexity.
Best model from sequence
4 78
Model Essentials - Regressions Prediction formula
Predict new cases,
ยงsas
Linear Regression Prediction Formula A
A
A
y = w
0
+
Wf x 1
+
A
w
2
x
2
Choose intercept and parameter estimates to minimize:
Z(y,-y,) training data
79
2
Linear Regression Prediction Formula input measurement A
y =w + w x + 0
i
i
A
wx ^ i
2
intercept parameter estimate estimate
Choose intercept and parameter estimates to minimize. squared error function
I(y,-y,)
2
training data
§sas
Logistic Regression Prediction Formula ^A
log
=
+
1
1 +
(tt) ^° ^ * ™* *
80
2l09its
Logit Link Function A
p W
+
Q
WfX
1
+
W$ X
logitscores
2
The logit link function transforms probabilities (between 0 and 1) to logit scores (between -°° and +*>), -5
§sas
Logit Link Function A
p
LOG
(TTT)
M
w + iv - x + w x 0
A
1
2
2
The logit link function transforms probabilities (between 0 and 1) to logit scores (between -°° and +°°), •5
10
81
Logit Link Function A
p
108
(TTJ)= w + w x + w x = logit( p) Q
f
2
2
1
A
P=
A
1 + Q-'og'tfp)
To obtain prediction estimates, the logit equation is solved for p.
11
Logit Link Function w + w x + vv x = logit( p) 0
f
i
2
1
A
P=
2
1 + -logit( p) e
A
To obtain prediction estimates, the logit equation is solved for p
12 82
Logit Link Function
logit(p) = w + w^XA + w x 0
2
2
1
A
P=
^ + g-logit(p)
13
ยงsas
Simple Prediction Illustration - Regressions Predict dot color for each x and x . i
... A
logit( p) = w + 0
A
P=
w Xi y
+
w x t
2
1 \ + e'
!OGIT(
P
You need intercept and parameter estimates.
14
83
2
:
.
_._
L
ÂŁ
<: > ; :
'' " '
Simple Prediction Illustration - Regressions
0.0 0.1 0.2 0.3 0.4 0,5 0.6 0.7 0.8 0.9 1.0 *1
Simple Prediction Illustration - Regressions logit{ p ) = tv + w x-, + w . x 0
r
2
2
A 1 P = 1 + e"' ^ P0
Find parameter estimates by maximizing
Ilog(p>Iiog(l-p,) primary outcome
framing cases
secondary outcome
training cases
log-likelihood func
16
84
Simple Prediction Illustration - Regressions logit( p ) = w + w x + w - x 0
A
P=
r
2
1
2
1 1 + e*'째9 < P it
Find parameter estimates by maximizing
Xlog(pVXlog(1-p\) primary outcome training cases
secondary outcome training cases
log-likelihood function
0.3 0.4 0,5 0.6 0.7 0.8 0.9 1.0
17
Simple Prediction Illustration - Regressions logit(p) =-0.81+0.92x., +1.11 X A
2
1 1 + e-' ^^ 0
Using the maximum likelihood estimates, the prediction formula assigns a logit score to each x and x . 1
Xo o.
2
18
85
2.01 Multiple Choice Poll W h a t is the logistic regression prediction for the indicated point?
0.0 0.1 0.2
*1
86
2.01 Multiple Choice Poll - Correct Answer W h a t is the logistic regression prediction for the indicated point?
Regressions: Beyond the Prediction Formula Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities.
87
Regressions: Beyond the Prediction Formula Manage missing values.
23
Missing Values and Regression Modeling Training Data
CZZ1
Problem 1: Training data cases with missing values on inputs used by a regression model are ignored.
24
88
Missing Values and Regression Modeling
1 1
1
1B H
mmm
n
mmm mmm
\ mmm
mmm
Problem 1: Training data cases with missing values on inputs used by a regression model are ignored.
25
.O
Missing Values and Regression Modeling Training Dati
Consequence: Missing values can significantly reduce your amount of training data for regression modeling!
26
89
TQKHOW
Missing Values and the Prediction Formula logit(p) = -0.81 + 0.92 -x, +1.11 x
2
Predict: (x1, x2) = (0.3, ? )
Problem 2: Prediction formulas cannot score cases with missing values.
27
ยงsas s~
Missing Values and the Prediction Formula logit(p) = -0.81 + 0.92-0.3+1.11-? Predict: (x1, x2) = (0.3, ? )
Problem 2: Prediction formulas cannot score cases with missing values.
28
90
Missing Values and the Prediction Formula logit(p) = -0-81 + 0.92-0.3+ 1.11 - ?
Problem 2: Prediction formulas cannot score cases with missing values.
29
Missing Values and the Prediction Formula logit(p) = ?
Problem 2: Prediction formulas cannot score cases with missing values.
30 91
Missing Value Issues Manage missing values. Problem 1: Training data cases with missing values on inputs used by a regression model are ignored.
Problem 2: Prediction formulas cannot score cases with missing values.
31
Missing Value Issues Manage missing values. Problem 1: Training data cases with missing values on inputs used by a regression model are ignored.
Problem 2: Prediction formulas cannot score cases with missing values.
32
92
Missing Value Causes Manage missing values. Non-applicable measurement ~T No match on merge l
l| Non-disclosed measurement
Missing Value Remedies Manage missing values. 5 ] Non-applicable measurement !
Synthetic distribution
No match on merge Non-disclosed measurement
34 93
Estimation Xj â&#x20AC;&#x201D;
f(Xj,
...
,x) p
Managing Missing Values This demonstration illustrates how to impute synthetic data values and create missing value indicators.
ยงsas
Running the Regression Node This demonstration illustrates using the Regression tool.
36 94
Chapter 2: Introduction to Predictive Modeling: Regressions 2.1 Introduction
2.2 Selecting Regression Inputs 2.3 Optimizing Regression Complexity 2.4 Interpreting Regression Models 2.5 Transforming Inputs 2.6 Categorical Inputs 2.7 Polynomial Regressions (Self-Study)
37
ยงsas
Model Essentials - Regressions
Select useful inputs,
38 95
Sequential selection
Sequential Selection - Forward Input p-value
Entry Cutoff
39
§sas — h2M
Sequential Selection - Forward Input p-value
E n t r y
40
96
C u t o f f
Sequential Selection - Forward Input p-value
Entry Cutoff
41
Sequential Selection - Forward Input p-value
Entry Cutoff
42
97
Sequential Selection - Forward Input p-value
Entry Cutoff
43
§sas gb.
Sequential Selection - Backward
• •••• ln
P P" ut
value
44
98
Stay Cutoff
Sequential Selection - Backward Input p-value
•••••• •••••••
s t a y
C u t o f f
45
§sas
Sequential Selection - Backward Input p-value^
Stay Cutoff
• • • • • •
••••••• •••• • •
46
99
»
1
.
.
i
:
:
-
-
Sequential Selection - Backward Input p-value
•
Stay Cutoff
••••••• •••• • •
•
•
•
•
•
•
•
mm* I
—
47
Sequential Selection - Backward Input p-value
•••••••• ••••••• •••• • •
s t a y
43
100
C u t o f f
Sequential Selection - Backward Input p-value
•••••••• ••••••• • •• • • • _ • • •
Stay Cutoff
u
49
§sas
Sequential Selection - Backward Input p-value
j
1
••••••• ••••• •••• • 11 • • • • •
50
101
Stay Cutoff
Sequential Selection - Backward Input p-value
•••••••• •••• MM • • • • • • • • • •
s t a y
Cutoff
51
§sas
Sequential Selection - Stepwise Input p-value
Entry Cutoff Stay Cutoff
52
102
Sequential Selection - Stepwise Input p-value
Entry Cutoff
•
sta
y
Cutoff
53
Sequential Selection - Stepwise Input p-value
Entry Cutoff
•
SteyCutoff
54
103
Sequential Selection - Stepwise Input p-value
Entry Cutoff
â&#x20AC;˘
sta
y
Cutoff
55
Sequential Selection - Stepwise Input p-value
Entry Cutoff Stay Cutoff
56
104
Sequential Selection - Stepwise Input p-value
Entry Cutoff
•
_
s
t
a
y
C
u
t
o
f
f
t
57
Sequential Selection - Stepwise
• [ • • • •••• • •
Input p-value
Entry Cutoff Stay Cutoff
•
58
105
Selecting Inputs This demonstration illustrates using stepwise selection to choose inputs for the m o d e l .
59
ยง s a s m&
Chapter 2: Introduction to Predictive Modeling: Regressions 2.1 Introduction 2.2 Selecting Regression Inputs
2.3 Optimizing Regression Complexity 2.4 Interpreting Regression Models 2.5 Transforming Inputs 2.6 Categorical Inputs 2.7 Polynomial Regressions (Self-Study)
60
106
Model Essentials - Regressions
Optimize complexity.
61
107
Best model from sequence
Select Model with Optimal Validation Fit Model fit statistic
Choose simplest optimal model. â&#x20AC;˘a
â&#x20AC;˘
3
63
§sas
Optimizing Complexity This demonstration illustrates tuning a regression model to give optimal performance on the validation data.
64
108
Chapter 2: Introduction to Predictive Modeling: Regressions 2.1 Introduction 2.2 Selecting Regression Inputs 2.3 Optimizing Regression Complexity
2.4 Interpreting Regression Models 2.5 Transforming Inputs 2.6 Categorical Inputs 2.7 Polynomial Regressions (Self-Study)
65
ยงsas]fe
Beyond the Prediction Formula
Interpret the model.
66
109
.. Beyond the Prediction Formula
' "- n
Interpret the model.
67
K1
Logistic Regression Prediction Formula
63
110
Odds Ratios and Doubling Amounts A
p log
(T36)
=
Ax:1 Doubling amount:
How much does an input have to change to double the odds?
consequence
1 => odds x exp(w,-) 0.69 _> odds x2
Odds ratio: Amount
odds change with unit change in input.
69
ยงsas
Interpreting a Regression Model This demonstration illustrates interpreting a regression model using odds ratios.
70
111
Chapter 2: Introduction to Predictive Modeling: Regressions 2.1 Introduction 2.2 Selecting Regression Inputs 2.3 Optimizing Regression Complexity 2.4 Interpreting Regression Models
2.5 Transforming Inputs 2.6 Categorical Inputs 2.7 Polynomial Regressions (Self-Study)
71
ยงsas
Beyond the Prediction Formula
Handle extreme or unusual values
72
112
Extreme Distributions and Regressions Original Input Scale
true association
skewed input distribution
high leverage points
73
Extreme Distributions and Regressions Regularized Scale
Original Input Scale
true association Standard regression
;
m
standard ~ skewed inout
regressio^.^^- ^ 0
true association hioh leveraoe oomts distribution
74
113
Regularizing Input Transformations Regularized Scale
Original Input Scale
standard regression
*
standard regression
•• • shewed /npirf distribution
high leverage points
•
more symmetric distribution
75
§SaS -Sau
Regularizing Input Transformations Original Input Scale
Regularized Scale
standard regression y**"
regularized estimate
B f c h i — • I., - -
regularized estimate
76
114
standard regression
Regularizing Input Transformations Regularized Scale
Original Input Scale
true association regularized estimate
regularized estimate true association
77
Transforming Inputs This demonstration illustrates using the T r a n s f o r m Variables tool to apply standard transformations to a set of inputs.
78
115
Chapter 2: Introduction to Predictive Modeling: Regressions 2.1 Introduction 2.2 Selecting Regression Inputs 2.3 Optimizing Regression Complexity 2.4 Interpreting Regression Models 2.5 Transforming Inputs
2.6 Categorical Inputs 2.7 Polynomial Regressions (Self-Study)
79
Beyond the Prediction Formula
Use nonnumeric inputs.
80
116
Beyond the Prediction Formula
Use nonnumeric inputs
81
Nonnumeric Input Coding Level
D
A
D
B
D
D
c
D
D
D
E
D
F
G
D
H
D
A B C D E
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
F G H I
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
82
117
f
Coding Redundancy Level
DA
D
B
D
A B
1 0
0 1
0 0
c D
0 0 0 0 0 0 0
0 0 0 0 0 0 0
1 0 0 0 0 0
E F G H I
D
C
0
E
D
F
0 0 0
0 0 0
0 0
1 0 0 0 0 0
0 1 0 0 0
0 0 1 0 0 0
0
0
D
G
0 0 0
o
0 0 0 1 0
0 0 0 0 1 0
0
y
0 0 D
83
ยงsas|fe
Coding Consolidation Level A B C D \g
H I
DA
D
D
1 0 0 0 0
0 1 0 0 0
0 0 0 0
D
D
D
D
0 \ 0 0 0 0 0 1 0 0 1 0 0 0 0
0 0 0 0 0 1 0
0
0 0
0 0 1 0 0 0 0
0 0 0 0 0 1
0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
1 0
B
c
Do
84
118
E
0 0
F
G
H
I
Coding Consolidation Level A B C D E G H i
D
A B C D
1 1 1 1 0 0
0 0 0 0 1
0 0 0
0 0 0
1
0 0 0 0 0 0 1 1 0
85
ยงsas
Recoding Categorical Inputs This demonstration illustrates using the R e p l a c e m e n t tool to facilitate the process of combining input levels.
86
119
Chapter 2: Introduction to Predictive Modeling: Regressions 2.1 Introduction 2.2 Selecting Regression Inputs 2.3 Optimizing Regression Complexity 2.4 Interpreting Regression Models 2.5 Transforming Inputs 2.6 Categorical Inputs 2.7 Polynomial Regressions (Self-Study)
87
§sas
Beyond the Prediction Formula
Account for nonlinearities.
120
_ â&#x20AC;&#x201D;
Beyond the Prediction Formula
Account for nonlinearities. 89
Standard Logistic Regression A
W + IVf 0
.
A
+ w x 2
2
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 '1
90 121
Polynomial Logistic Regression
91
Adding Polynomial Regression Terms Selectively This demonstration illustrates how to a d d polynomial regression terms selectively.
92
122
Adding Polynomial Regression Terms Autonomously (Self-Study) This demonstration illustrates h o w to a d d polynomial regression terms autonomously.
93
This exercise reinforces the concepts discussed previously.
94
123
Regression Tools Review h p Lite
Replace missing values for interval (means) and categorical data ( m o d e ) . Create a unique replacement indicator. Create linear and logistic regression models. Select inputs with a sequential selection method and appropriate fit statistic. Interpret models with odds ratios.
â&#x20AC;˘
~f< Transform \ Variables
Regularize distributions of inputs. Typical transformations control for input s k e w n e s s via a log transformation.
continued.
95
Regression Tools Review Consolidate levels of a nonnumeric input using the Replacement Editor window.
ftHynorrial Regression
A d d polynomial terms to a regression either by hand or by an a u t o n o m o u s exhaustive search.
96
124
Chapter 3: Introduction to Predictive Modeling: Neural Networks and Other Modeling Tools 3.1 Introduction
Model Essentials - Neural Networks Prediction formula
Predict new cases. Select useful inputs.
None
Optimize complexity.
Stopped training
2 125
Model Essentials - Neural Networks Prediction formula
Predict new cases.
^
Select useful inputs.
None
Optimize complexity.
Stopped training
3
Model Essentials - Neural Networks Prediction formula
Predict new cases.
4
126
Neural Network Prediction Formula
5
ยงsas
Neural Network Prediction Formula 1/ estimate
j
=w
00
w<
+
0
bia: estimate
+w H +w 02
2
H
03
3
est;
H = tanh(vv +
x + w
10
1
H = tanh(vv o 2
2
+
1
w
21
^
x
+
30
6
127
3
2
^22 2)
H = tanh(w + vv i * i + w 3
x)
12
x
32
^2)
Pima?
Neural Network Binary Prediction Formula A log
(T7A')
=
+
^oV 1 H
+
w
02 H + w - H 2
03
3
//nfc function
Hf = tanh(vv +
x + w
10
1
H = tanh(vv + vv 20
2
-5
H = tanh(vv + w 30
3
21
x
31
i
+
12
x) 2
^22 ^2)
x + vv x ) 1
32
2
ยงsas
Neural Network Diagram
H = tanh(w + Wn
+w
10
1
12
x) 2
H = tanh(vv + vv x + vv x ) 2
20
H = tanh(vv + w 30
3
8
128
21
31
1
22
2
x + vv x ) 1
32
2
m
Neural Network Diagram
H = tanh(vv + vv^ x + w x ) 10
1
1
12
2
H = tanh(vv + w x + w x ) 20
2
21
H = tanh(w + w * i 30
3
input layer
hidden layer
31
1
2
22
+
w * ) 32
2
target layer
ยงsas
Prediction Illustration - Neural Networks 1.0
logit equation !ogit{ p ) = vv + w - H + w - H + w H 00
01
H-, = tanh(vv H = tanh(vv 2
H = tanh(tv 3
02
A
10
+ w
u
20
+ vv
30
+ w
21
31
2
oi
+w
12
x) 2
+ vv x ) 22
2
0.9 z
0.7 0.6
x, + vv x ) 32
2
0.4 0.3 0.2 0.1 0.0
0.
10
129
โ ข
ยงsas
Prediction Illustration - Neural Networks logit equation "A
+ w
*v
w -H
01
Q2
A 2
w -
+
0.8
03
H = tanh(w
10
+ w x, + w
1 2
x)
H = tanh{w
20
+ w
2 2
x )
H = tanh(w
30
+ vv x + t v
32
x )
1
2
3
u
2 1
Xi + w
31
1
2
0.6
2
2
0.3
Need weight estimates.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
11
Prediction Illustration - Neural Networks logit equation logit(p) = w
0 0 +
w
0 1
H
1 +
w
0 2
H -f w H 2
0i
H = tanh(w
10
+ w
H = tanh{w
20
+ w
x-, + w
2 2
x )
H = tanh(w
30
+ vv x + w
3 2
x)
1
2
3
+ iv
n
2 1
31
A
12
2
x ) 2
2
2
0.4 Weight estimates found by maximizing'.
0.3 0.2
X log(p) + I log(1 - p,) 0.1 f
prtman/ outcome training cists
outcome training cases
0.0 0.0 0.1 0.2 0.3
12
130
Prediction Illustration - Neural Networks logit equation
logit(p)= 15 + -2.6 H, + -1.9 H + 0.63H 2
3
H, = tanh{-1.8 + 0.25 x, + -1.8 x ) 2
tf = tanh{ 2.7 + 2.7 x< + -5.3 x ) 2
2
H =tanh(-5.0+ 8.1 Xt + 4.3 x ) 3
2
1
A
P
~
=
1 + -logit(p) e
Probability estimates are obtained by solving the logit equation for p for each (x x ). 1(
2
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 "l
13
Neural Nets: Beyond the Prediction Formula Manage missing values. Handle extreme or unusual values. Use non-numeric inputs. Account for nonlinearities. Interpret the model.
131
— — — \ — JHHES&ati! _. Neural Nets: Beyond the Prediction Formula Manage missing values. ^
Handle extreme or unusual values.
^
Use non-numeric inputs.
Account for nonlinearities.
15
Training a Neural Network This demonstration illustrates using the Neural Network tool.
16
132