SAS EM part1(1)

Page 1

Chapter 1: Introduction to Predictive Modeling: Decision Trees 1,1 Introduction 1.2 Cultivating Decision Trees 1.3 Optimizing the Complexity of Decision Trees 1.4 Understanding Additional Diagnostic Tools (Self-Stud^ 1.5 Autonomous Tree Growth Options (Self-Study)

1

Chapter 1: Introduction to Predictive Modeling: Decision Trees 1.1 Introduction 1.2 Cultivating Decision Trees 1.3 Optimizing the Complexity of Decision Trees 1.4 Understanding Additional Diagnostic Tools (Self-Study) 1.5 Autonomous Tree Growth Options (Self-Study)

2

1


Predictive Modeling T h e Essence of Data Mining

"Most of the big payoff [in data mining] has been in predictive modeling." - Herb Edelstein

Predictive Modeling Applications Database marketing Financial risk management Fraud detection Process monitoring Pattern detection

4


Predictive Modeling Training Data Training Data

warn mm mm mm mm

M

Training data case; categorical or numeric input and target measurements

M

i

5

Predictive Modeling Training Data Training Data RT. inputs ' J3I

r £Hf

l

i —

h i

B H • H

MHH

target 1

mm • H H i mm Hi H mm H i H i mm

• i

Training data case: categorical or numeric input and target measurements

HB

6 3


Predictive Model Training Data input

Predictive model: a concise representation of the input and target association

mm mm i mm mm i L

3

•11

§sas

Predictive Model Training Data

target i mmm \ & inputs i mm& CZJ wmm mm n m*

Predictive model: a concise representation of the input and target association

mm

8 4


Predictive Model Training Data

9

Predictive Model predictions

Predictions: output of the predictive model given a set of input measurements

10

5


Predictive Model

Predictions: output of the predictive model given a set of input measurements

zzz

ยงsasj8s.

Modeling Essentials jj

Predict new cases. Select useful inputs. Optimize complexity.

12

6


Modeling Essentials Predict new cases.

13

Modeling Essentials Predict new cases.

14


Three Prediction Types Inputs

mm mm mm m%mwm mm mm mm H i H i H i ! mm H i H i m mm

prediction 1

decisions

mmm mmm

rankings estimates

1

1

;

15

Three Prediction Types decisions rankings estimates

16

8


Decision Predictions

primary

A predictive model uses input measurements to make the best decision for each case.

secondary

17

§sas

Decision Predictions •1

I

K

1

inputs

53

H6S9 secondary tertiary primary

A predictive model uses input measurements to make the best decision for each case.


Ranking Predictions inp

rn

mm • H i mm mm mm mm mm wm mm mm mm mm mm

520 580 470

A predictive model uses input measurements to optimally rank each case.

630

19

§sas

Ranking Predictions SHI

T

HHH

inputs

mmm

\

720 520 580 470

i

i

630

20 10

A predictive model uses input measurements to optimally rank each case.


Estimate Predictions H

E

inputs "5 |

mm mm mm wm mm WM H i

MM

••1 mm mm mm

0.33 0.54 0.28

A predictive model uses input measurements to optimally estimate the target value.

21

§sas

Estimate Predictions ;

HHi

Wt- inputs <9

HHI

prediction

0.33 0.54

1

1

22 11

A predictive model uses input measurements to optimally estimate the target value.


Modeling Essentials - Predict Review Predict new cases.

D e c i d e , rank, a n d estimate.


1.01 Quiz Match the Predictive Modeling Application to the Decision T y p e . Predictive Modeling Application

Decision Type

Loss Reserving Risk Profiling

A. B.

Decision Ranking

Credit Scoring Fraud Detection Revenue Forecasting Voice Recognition

C.

Estimate

25

101 Quiz - Correct Answer Match the Predictive Modeling Application to the Decision T y p e . Predictive Modeling Application

c B B A C A

Decision Type

Loss Reserving Risk Profiling Credit Scoring Fraud Detection Revenue Forecasting Voice Recognition

26

13

A.

Decision

B.

Ranking

c.

Estimate


Modeling Essentials

^

Select useful inputs.

27

§sas

:

The Curse of Dimensionality 1-D

• • • • • • • • 2-D

3-D

28

14

©

/


ยงSaS sba

1.02 Quiz H o w m a n y inputs are in your modeling data? Mark your selection

in the poiling

30

15

area.


Input Reduction - Redundancy Redundancy

Irrelevancy

31

Input Reduction - Redundancy Redundancy

Input x has the same information as input x 2

v

16


'Mmt

Input Reduction - Irrelevancy Irrelevancy

Predictions change with input x but much less with input x . 4

3

33

ยงsas

Modeling Essentials - Select Review

Eradicate redundancies and irrelevancies,

Select useful inputs

34

17


ยงsas!^a

Modeling Essentials - Select Review

Select useful inputs.

35

Modeling Essentials

Optimize complexity.

36 18

Eradicate redundancies and i r r e l e v a n c i e s .


Model Complexity Too c o m p l e x

Not c o m p l e x enough

37

ยงsas

Data Partitioning Nidation Data

Trainina Data

mm mm mm mm

mm

mm

-

mm

mm mm mm mm

โ ขI

Partition available data into training and validation sets.

38

19

mm

mm


Data Partitioning

39

Predictive Model Sequence Traininq Data

i

Validation Di

mm mm

1

mm mm mm mm mm mm E H

h

mm mm

mm mm mm mm wm mm 1.1 H mm :

Create a sequence of models with increasing complexity.

40

Model Complexity 20


Create a sequence of models with increasing complexity. ^ 5 Model Comnipxitv 41

Model Performance Assessment Traininq Data

Validation Data

Rate model performance using validation data.

42

21


Model Performance Assessment Validation Data

Trainina Data

Inputs MM

• ••i

MM

m MM

mm

5 5

II

W

l i •

mm* wm

* * * * * Rate model ^Wn^ performance using ^ - v validation data.

Validation Comolexitv Assessment

43

§sas

Model Selection Trainina Data

MM

Validation D

nsif MM

m

• i

5^

MM

n

• • MM

* * * * * Select the simplest iWrt^t model with the highest a&wr& validation assessment.

e/

Validation

44 22


Model Selection Validation Data inputs

target

mm e n • i warn mmM mmW mm mwm mm mm mm mm

Select the simplest model with the highest validation assessment.

45

Model Validation Comolexitv Assessment

§sas

Modeling Essentials - Optimize Review

Optimize complexity.

46

23

T u n e m o d e l s with validation data.


Creating Training and Validation Data This demonstration illustrates how to partition a data source into training and validation sets

47

L__

ยงsas

Chapter 1: Introduction to Predictive Modeling: Decision Trees 1.1 Introduction

1.2 Cultivating Decision Trees .

. .

1.3 Optimizing the Complexity of Decision Trees 1.4 Understanding Additional Diagnostic Tools (Self-Study) 1.5 Autonomous Tree Growth Options (Self-Study)

48

24


I B B

Predictive Modeling Tools Primary

£

Regression

Decision Tree

ft*

r-teurai r-fetw on-

b» ] ( S r i ^"l T l

• M

^

rie.rt*

49

§sas

Predictive Modeling Tools Primary

| I

k^-^

Gradient ^Boosing

v Dmne Regression

Specialty

MBR

f

v Paniai Leasi Sguoras

. IARS

k £

6 50

25


Predictive Modeling Tools

>. _ - l : •-'

P*

- .* 1

J

Multiple

,\BlS8flt!te

Tw o9ags

51

Model Essentials - Decision Trees Predict new cases.

Prediction rules

Select useful inputs.

Split search

Optimize complexity.

Pruning

52

26


Model Essentials - Decision Trees ||

Predict new cases.

^

Select useful inputs.

Split search

1

Optimize complexity.

Pruning

Prediction rules

53

Simple Prediction Illustration Training Data 1.0 0.9

Predict dot color for each x * and x . 9

X

2

0.5 0.4 0.2

54

27

• .V V * If

•. • •


Simple Prediction Illustration Training Data

Decision Tree Prediction Rules root node

56

28


Decision Tree Prediction Rules Predict:,. root node <0.63

Š

>0.63

interior node

leaf node 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 [

1

57

§sas

Decision Tree Prediction Rules Predict:

\

53

29

Decision = Estimate = 0.70


Decision Tree Prediction Rules .. .

Predict* n

^

©

>0.63

°'

Decision =

_ .. , , V Estimate = 0.70

*v

ft n

-

9

©

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

59

§sas

~™

Model Essentials - Decision Trees Predict new cases.

Prediction rules

60 30


Model Essentials - Decision Trees Predict new cases. ^

Prediction rules

Select useful inputs,

Split search

61

Model Essentials - Decision Trees

Select useful inputs.

62 31

Split search


Decision Tree Split Search left

right 0. 0. 0.

Confusion Matrix

°-

x o. 2

Calculate the logworth of every partition on input Xj.

Decision Tree Split Search left

right

0.52

0.9

;\*

0.8

»

0.7 0.6

Confusion Matrix X

2

0.5 0.4

Calculate the logworth of every partition on input x v

0.3 0.2 0.1 0.0

m m

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.

64 32


LVTsUti-rBl

Decision Tree Split Search left

.•MB 53%

right 42%

max logworth(x) 0.95 7

47%

58%

Xn

0

Select the partition with the maximum logworth. 0.0 0.1 0.2 0.3 0,4 0.5 0.6 0.7 0.8 0.9 1.0 •1

65

33


Decision Tree Split Search

bottom

top

0.0 0.1 0.2 0,3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

1

[

67

ยงsas

Decision Tree Split Search

0.63

54%

35%

46%

65%

max logworth(/ ) 4.92 2

0.0 0. i

68

34

u.z U.J

0.9 1.0


Decision Tree Split Search left 53% 47%

right

•M

42% 58%

max logworth(xJ 0.95

_ l

bottom

fop

54%

35%

Compare partition logworth ratings. max logworth(x) 4.92 2

46%

65%

69

35


Decision Tree Split Search

<0.63 ÂŽ

>0.63

0.63

Š Create a partition rule from the best partition across all inputs.

71

Decision Tree Split Search

1.0

<0.63

>0.63

X

2

0.5

Repeat the process in each subset. 0.0 0.0 0,1 0.2 0,3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

72

36


Decision Tree Split Search

•

left

|

right 1.0 1

1 0.8 1 0.7 1 0.9

0.6 1 X

2

0.5 0.4 0.3 0.2 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

73

Decision Tree Split Search

61%

55%

39%

45%

max logworth(j^) 5.72

0.0 0.1 0.2 0.3

74

37

U.9

0.7 0.8 0.9 1.0


Decision Tree Split Search

bottom

top

0.0 0.1 0.

0.5 0.6 0.7 0.8 0.9 1.0

75

Decision Tree Split Search

38%

55%

max logworth(x) •2.01 2

62%

45%

0.02 0.0 0.1 0.2 0,3 0.4 0.5 0.6

76

38


Decision Tree Split Search

left

right

61%

55%

39%

45%

1.0

max logworth^) 5.72

0.9 0.7 0.6

bottom

top

38%

55%

X

2

0.5 0.4

max logworth(x) •2.01

0.3 0.2

2

62%

45%

0.0 0.1 0.2

77

39

0.5 0.6 0.


Decision Tree Split Search

0 <0.63

w

>0.63

«...

<0.52 ~

>0.52

Create a second partition rule.

0.0 0.

0.6 0.7 0.8 0.9 1.0

79

§sas

Decision Tree Split Search

<0.63 ®

>0.63

• •

<0.52^>0.52

Create a second partition rule.

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0. o

80 40

u.y i.


Decision Tree Split Search

Repeat to form a maximal tree.

0.9 1.0

0.0 0.1 0.2 0.3 0.4 0.5

81

ยงsas

Model Essentials - Decision Trees

Select useful inputs

82 41

Split search


Constructing a Decision Tree Predictive Model This demonstration illustrates constructing a decision tree model interactively.

83

ยงsas

Chapter 1: Introduction to Predictive Modeling Decision Trees 1.1 Introduction 1.2 Cultivating Decision Trees

1.3 Optimizing the Complexity of Decision Trees 1.4 Understanding Additional Diagnostic Tools (Self-Study) 1.5 Autonomous Tree Growth Options (Self-Study)

84

42


£

£

,__

.

1

.,

l » §

jgjggj,

Model Essentials - Decision Trees

Optimize complexity.

85

§sass™,

Predictive Model Sequence Training Data

Validation DaU

•5"

Create a sequence of models with increasing complexity.

86

43


&

&wk £ ® -

The Maximal Tree Training Data

Validation Data

Create a sequence of models with increasing complexity. A maximal tree is the most complex model in the sequence. Mods'

The Maximal Tree Training Dai

-

lidation Data

m

•M H I MM

1 • MM -

" -

Create a sequence of models with increasing complexity. A maximal tree is the most complex model in the sequence. 88

Complexity 44


A maximal tree is the most complex model in the sequence. Model 89

Pruning One Split Validation Data

The next model in the sequence is formed by pruning one split from the maximal tree.

jÂŁ\ X 90 45


Pruning One Split Training Data

Validation Data

Inputs

Urgot

1 1 MM

H-\-

3

4>

The next model in the sequence is formed by pruning one split from the maximal tree.

A

Model Complexity

91

ยงsas

Pruning One Split Training Dati

Validation Data

to

Model 92

46

Each subtree's predictive performance is rated on validation data.


Pruning One Split Training Data M i 1 MHftpUMt j H I

Validation Data Inputs mW

mmm

mmm

target

m mm mm

mmm

• •

The subtree with the highest validation assessment is selected.

* *

• • • o

Model Complexity

93

§sas

Pruning One Split Training Data

Validation Data mm

mm

m

i

mm

mm

The subtree with the highest validation assessment is selected. 94

47


Pruning Two Splits Training Data mmm

Validation Data

mmm

HH mmm

mmm mmm mmm mmm HH

s s ™

Similarly this is done for subsequent models.

95

Model Complexity

§sas

Pruning Two Splits Validation Data

Prune two splits from the maximal tree,...

•x

96

Model Complexity 48

mmW


Pruning Two Splits Training Data

Validation Data

inputs

target

H4S\

^ * .

s\ ^

Prune two splits from the maximal tree,...

•<>>

4k

Model 97

§sas

Pruning Two Splits Validation Data

^ r ^ r * ^ . ••

98 49

...rate each subtree using validation assessment, and


Pruning Two Splits Training Data

Validation Data

•.. ^

_

_

mmm

mmm mmm mmm mmm mmm mmm H H mmm

A

...select the subtree with the best assessment rating.

Mode/ 99

mplexity

§sas

Pruning Two Splits Validation Data

M i b b mmm mmm

2

A

100

...select the subtree with the best assessment rating.

Complexity 50


Subsequent Pruning Training Data HHH 1 IHBiiflHHH j H R rargor

Validation Data

| |Z 3 C 3 U .

Continue pruning until all subtrees are considered.

Mode/ 101

Subsequent Pruning n Data

Training Data

i

' I B H I H H H^H mmm

mm

Continue pruning until all subtrees are considered.

.... 102

51


Subsequent Pruning Training Data

Validation Data

HH : HHRijhHHI HfJ HH M i H H :::•;•] HH HH HH

M HH HH HH

H H |. largei HH HH HH HH HH HH HH HH

Continue pruning until all subtrees are considered.

Model Complexity

103

§sas

Subsequent Pruning Validation Data

Training Dati HHB HHUiV

mi llll mi HH

H H H H H HH H HH

Continue pruning until all subtrees are considered.

A' A

104 52


Subsequent Pruning Training Data r

•jpj

Validation Data

mmm i P?- inputs ^mm\ mm mmm pjpj M I mmm

inputs

•••

mmm mmm

Continue pruning until all subtrees are considered.

A Model 105

r.nmntoYi

Subsequent Pruning Training Data

rn

Validation Data

in

• H M M H

Continue pruning until all subtrees are considered.

106 53


Selecting the Best Tree Training Data inputs

mm

BOM

Validation Data

- — —

mmm

HHH ' HHHI

• • mmm mmm mmm mm mm HHH

Compare validation &&&& assessment between tree complexities.

•:•> Validation

Model 107

Selecting the Best Tree idation Data inputs

mm mm

HHHI i

ess

mm

1 H :

j

yy^j ^ gu^pgg should c

be selected as the best

A

m

o

Validation 108 54

d

e

l

?


Validation Assessment Validation Data

Training Data

Choose the simplest model with the highest validation assessment.

A

4k

Validation

Model 109

Validation Assessment Validation Data In

mm

t

H

mmm

H mm mm

mWM

mm mm

H H H I I

• H

-

mm flHBf HBH flVATflmm B D mm mm

What are appropriate validation assessment ratings'?

A

110 55


Validation Assessment Validation Data inputs

••J

mmm

i

mmmmmm

Validation Data mma^nm H ^ B I H

H

S

Hs mmm

HHB BBH H U BSSMD

What are appropriate validation assessment ratings?

111

Assessment Statistics

Validation Data •

Inputs

• •—mm

urn

m5 1

• •

Ratings depend on...

[ • —E H H •

target measurement i

K B

Hi

112 56

prediction type.


Assessment Statistics

Validation Data Inputs

'

Ratings depend on...

fargef

•m* •« mm mm •«• mm

target measurement prediction type.

EHgtaj • • • • • • • •

••••••

Binary Targets

topufs

i n

fzjzjy

target 1

primary outcome

0

secondary outcome

1 1

114

57


.. i -

....

.

Binary Targets

/npute m

m

m

•ma

mm

wm

1

primary outcome

0

secondary outcome

0

;

116

58

s,

'^' - 'Mi-' s


118

59


Decision Optimization

H H

primary

1

decisions

0

mm

0 E—:--^s

M M

119

§sas

Decision Optimization - Accuracy

true positive true negative Maximize accuracy: agreement between outcome and prediction

120

60


Decision Optimization - Accuracy

0

primary

true positive

| secondary

true negative Maximize accuracy: agreement between outcome and prediction

121

§sas

Decision Optimization - Misclassification

target

• H kVH HH mmm H U H mmm H H HH

_

HH

m

m

m

1

paction secondary primary

0

520

H

mmm HHOH

122 61

\

false negative false positive Minimize misclassification: disagreement between outcome and prediction


Decision Optimization - Misclassification

ll'l

•jpjpj •jpjpj • • • •

mm

i l l S S

::i

mm B W M M

•HH

secondary primary

0 ' PJBJBJI BHJj HHR/;

false negative false positive Minimize misclassification: disagreement between outcome and prediction

1

HHHI H H BBS

123

Ranking Optimization

124

62


Ranking Optimization

mm mm • •

am • •

0

ZIZ Z

520

rankings

720

125

§sas

Ranking Optimization - Concordance

prediction mm mm

~

520

muz

1

t a r g e t = 0 — » l o w score target=l—>high score

A

0.249 Maximize concordance:

i

proper ordering of primary and secondary outcomes

126

63


Ranking Optimization - Concordance

target=0—>low score target=l—>high score Maximize concordance: proper ordering of primary and secondary outcomes

127

§sas

Ranking Optimization - Discordance

target i prediction

t a r g e t = 0 — » h i g h score target=l—>low score Minimize discordance: improper ordering of primary and secondary outcomes

128

64


Ranking Optimization - Discordance

m

H H

• H i

H H

H

_

H

H

m

secondary

m

H H

0

primary

H

0

. * .

m

1

mm

— V 0.24

1

t a r g e t = 0 ^ h i g h score target=l—>low score Minimize discordance: improper ordering of primary and secondary outcomes

129

Estimate Optimization

estimates

130

65


Estimate Optimization

131

Minimize squared error. squared difference between target and prediction 132

66


Estimate Optimization - Squared Error

ary

0.249

||

(target-estimate)

2

Minimize squared error. squared difference between target and prediction 133

Complexity Optimization - Summary

••

mm ~

•OB

mm mm 'mm

_ jllll

mm mm mm

decisions 0

accuracy / misclassification

0

rankings

1

concordance/ discordance

estimates

1

squared error

134

67


Assessing a Decision Tree This demonstration illustrates how to assess a tree model.

;

ยง s a s @s.

Chapter 1: Introduction to Predictive Modeling: Decision Trees 1.1 Introduction 1.2 Cultivating Decision Trees 1.3 Optimizing the Complexity of Decision Trees 1.4 Understanding Additional Diagnostic Tools (Self-Study) 1.5 Autonomous Tree Growth Options (Self-Study)

136

68


Understanding Additional Plots and Tables (Optional) This demonstration presents several additional plots and tables f r o m the S A S Enterprise Miner Tree Desktop Application.

137

Chapter 1: Introduction to Predictive Modeling: Decision Trees 1.1 Introduction

0k

1.2 Cultivating Decision Trees 1.3 Optimizing the Complexity of Decision Trees 1.4 Understanding Additional Diagnostic Tools (Self-Study)

1.5 Autonomous Tree Growth Options (Self-Study)

69


Autonomous Decision Tree Defaults Default Settings

^ / * \ Maximum Branches Splitting Rule Criterion X*^

Subtree Method

2

Logworth

Average Profit

Tree Size Options

Bonferroni Adjust Split Adjust Maximum Depth Leaf Size

139

Tree Variations: Maximum Branches

Trades height for width

Uses heuristic shortcuts

140

70


Tree Variations: Maximum Branches Variables Interactive Use Frozen Tree Use Multiple Targets

•J a No

'interval Criterion ;• fiominal Criterion ^Ordinal Criterion i Signif cance Level h Missing Values iUse Input Onc£r Maximum Branch i'Maximum Dea\h -Minimum Categorical Size

ProbF ProbChlsq Entropy |0.2 Use In search No

Maximum branches in split

?

6 |S

r Leaf Size 5 i Number of Rules 5 t-Number of Surrogate Rules'O '•Split Size i-Exhaustive -Node Sample

500D 20000

hMethad I Number of Leaves '•Assessment Measure "Assessment Fraction

Assessment 1 Average Sguare Error 0 25

Exhaustive search size limit

141

Ssas

Tree Variations: Splitting Rule Criterion Yields similar splits

Grows enormous trees

Favors many-level inputs

142

71


Tree Variations: Splitting Rule Criterion Property

Split Criterion

PrcbF I Interval Criterion Nominal Criterion ProbChisq Ordinal Criterion Entropy rSignlllcanceLevBl tot rMissing Values Use in search rUse Input Once •Maximum Branch Maximum Depth -Minimum Categorical Size 5 |-Leaf Size •Number of Rules [-Number of Surrogate Rules 0 - Spin 5Cza 1 L

[•Exhaustive "Node Sample Method Number ot Leaves Assessment Measure Assessment Fraction

5000 20000 Assessment

Categorical Criteria

Variance Prob-F logworth

Avera ge Square E nor 025

-Perform Cross Va I Id ati on Number of Subsets 10 Number ot Repeats Seed osrrvatlon Bisea imperial Observation Based imp_orlaiNci_ Number Single Var ImportarS

Logworth adjustments

hBcnferronl Adjustment i-Time of Kass Adjustment Before i-tnputs ';• Number of Inputs SplitArtjuslir^-ii Yes 1

143

Tree Variations: Subtree Method

144

72

Interval Criteria


Tree Variations: Subtree Method

Pruning options Pruning metrics Decision Average Square Error Misclassification Lift 145

Tree Variations: Tree Size Options Avoids orphan nodes

146

73


Tree Variations: Tree Size Options Value

Property : Interval Criterion r-Nomlnal Criterion '-Ordinal Criterion

'robF ProbChisq Entropy

Logworth threshold

i Significance Level 0.2 i-Missinrj Values !use In search JO • Use Input Once i-Maximum Droncii 1 [-Maximum Depth '-Minimum Categorical Size 5

'

; Leaf Size • Number of Rules i-Number of Surrogate Rules Split Size

1|

Maximum tree depth Minimum leaf size

;

i-Exhaustive ' Node Sample

5000 20000

!- Method • NumberoILeavas • Assessment Measure "Assessment Fraction

Assessment fl Average Square Error 0.15

[-Perform Cross Validation No [ Number of Subsets 10 ' Number of Repeals 'Seed 12345

_____

^Observation Based ImportaiNo -Number Single VarlmpoilarS •• Bonferroni Adjustment I Time of Kass Adjustment r Inputs i Numberoflnputs Split Adiustmsnl ;

147

Yes Before No II Yes

Threshold depth adjustment

1

M

Exercises T h e s e exercises reinforce the concepts discussed previously.

148

74


Decision Tree Tools Review Partition raw data into training and validation sets.

Interactively grow trees using the Tree Desktop application. You can control rules, selected inputs, and tree complexity. A u t o n o m o u s l y grow decision trees based o n property settings. Settings include branch count, split rule criterion, subtree m e t h o d , and tree size options.

149

75


Chapter 2: Introduction to Predictive Modeling: Regressions 2.1 Introduction 2.2 Selecting Regression Inputs 2.3 Optimizing Regression Complexity 2.4 Interpreting Regression Models 2.5 Transforming Inputs 2.6 Categorical Inputs 2.7 Polynomial Regressions (Self-Study)

ยง s a s sโ ข.

Chapter 2: Introduction to Predictive Modeling Regressions 2.1 Introduction 2.2 Selecting Regression Inputs 2.3 Optimizing Regression Complexity 2.4 Interpreting Regression Models 2.5 Transforming Inputs 2.6 Categorical Inputs 2.7 Polynomial Regressions (Self-Study)

77


J

ยงsas

Model Essentials - Regressions Predict new cases.

Prediction formula

Select useful inputs.

Sequential selection

Optimize complexity.

Best model from sequence

ยงsas

Model Essentials - Regressions Predict new cases.

Prediction formula

Select useful inputs.

Sequential selection

Optimize complexity.

Best model from sequence

4 78


Model Essentials - Regressions Prediction formula

Predict new cases,

ยงsas

Linear Regression Prediction Formula A

A

A

y = w

0

+

Wf x 1

+

A

w

2

x

2

Choose intercept and parameter estimates to minimize:

Z(y,-y,) training data

79

2


Linear Regression Prediction Formula input measurement A

y =w + w x + 0

i

i

A

wx ^ i

2

intercept parameter estimate estimate

Choose intercept and parameter estimates to minimize. squared error function

I(y,-y,)

2

training data

§sas

Logistic Regression Prediction Formula ^A

log

=

+

1

1 +

(tt) ^° ^ * ™* *

80

2l09its


Logit Link Function A

p W

+

Q

WfX

1

+

W$ X

logitscores

2

The logit link function transforms probabilities (between 0 and 1) to logit scores (between -°° and +*>), -5

§sas

Logit Link Function A

p

LOG

(TTT)

M

w + iv - x + w x 0

A

1

2

2

The logit link function transforms probabilities (between 0 and 1) to logit scores (between -°° and +°°), •5

10

81


Logit Link Function A

p

108

(TTJ)= w + w x + w x = logit( p) Q

f

2

2

1

A

P=

A

1 + Q-'og'tfp)

To obtain prediction estimates, the logit equation is solved for p.

11

Logit Link Function w + w x + vv x = logit( p) 0

f

i

2

1

A

P=

2

1 + -logit( p) e

A

To obtain prediction estimates, the logit equation is solved for p

12 82


Logit Link Function

logit(p) = w + w^XA + w x 0

2

2

1

A

P=

^ + g-logit(p)

13

ยงsas

Simple Prediction Illustration - Regressions Predict dot color for each x and x . i

... A

logit( p) = w + 0

A

P=

w Xi y

+

w x t

2

1 \ + e'

!OGIT(

P

You need intercept and parameter estimates.

14

83

2


:

.

_._

L

ÂŁ

<: > ; :

'' " '

Simple Prediction Illustration - Regressions

0.0 0.1 0.2 0.3 0.4 0,5 0.6 0.7 0.8 0.9 1.0 *1

Simple Prediction Illustration - Regressions logit{ p ) = tv + w x-, + w . x 0

r

2

2

A 1 P = 1 + e"' ^ P0

Find parameter estimates by maximizing

Ilog(p>Iiog(l-p,) primary outcome

framing cases

secondary outcome

training cases

log-likelihood func

16

84


Simple Prediction Illustration - Regressions logit( p ) = w + w x + w - x 0

A

P=

r

2

1

2

1 1 + e*'째9 < P it

Find parameter estimates by maximizing

Xlog(pVXlog(1-p\) primary outcome training cases

secondary outcome training cases

log-likelihood function

0.3 0.4 0,5 0.6 0.7 0.8 0.9 1.0

17

Simple Prediction Illustration - Regressions logit(p) =-0.81+0.92x., +1.11 X A

2

1 1 + e-' ^^ 0

Using the maximum likelihood estimates, the prediction formula assigns a logit score to each x and x . 1

Xo o.

2

18

85


2.01 Multiple Choice Poll W h a t is the logistic regression prediction for the indicated point?

0.0 0.1 0.2

*1

86


2.01 Multiple Choice Poll - Correct Answer W h a t is the logistic regression prediction for the indicated point?

Regressions: Beyond the Prediction Formula Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities.

87


Regressions: Beyond the Prediction Formula Manage missing values.

23

Missing Values and Regression Modeling Training Data

CZZ1

Problem 1: Training data cases with missing values on inputs used by a regression model are ignored.

24

88


Missing Values and Regression Modeling

1 1

1

1B H

mmm

n

mmm mmm

\ mmm

mmm

Problem 1: Training data cases with missing values on inputs used by a regression model are ignored.

25

.O

Missing Values and Regression Modeling Training Dati

Consequence: Missing values can significantly reduce your amount of training data for regression modeling!

26

89

TQKHOW


Missing Values and the Prediction Formula logit(p) = -0.81 + 0.92 -x, +1.11 x

2

Predict: (x1, x2) = (0.3, ? )

Problem 2: Prediction formulas cannot score cases with missing values.

27

ยงsas s~

Missing Values and the Prediction Formula logit(p) = -0.81 + 0.92-0.3+1.11-? Predict: (x1, x2) = (0.3, ? )

Problem 2: Prediction formulas cannot score cases with missing values.

28

90


Missing Values and the Prediction Formula logit(p) = -0-81 + 0.92-0.3+ 1.11 - ?

Problem 2: Prediction formulas cannot score cases with missing values.

29

Missing Values and the Prediction Formula logit(p) = ?

Problem 2: Prediction formulas cannot score cases with missing values.

30 91


Missing Value Issues Manage missing values. Problem 1: Training data cases with missing values on inputs used by a regression model are ignored.

Problem 2: Prediction formulas cannot score cases with missing values.

31

Missing Value Issues Manage missing values. Problem 1: Training data cases with missing values on inputs used by a regression model are ignored.

Problem 2: Prediction formulas cannot score cases with missing values.

32

92


Missing Value Causes Manage missing values. Non-applicable measurement ~T No match on merge l

l| Non-disclosed measurement

Missing Value Remedies Manage missing values. 5 ] Non-applicable measurement !

Synthetic distribution

No match on merge Non-disclosed measurement

34 93

Estimation Xj —

f(Xj,

...

,x) p


Managing Missing Values This demonstration illustrates how to impute synthetic data values and create missing value indicators.

ยงsas

Running the Regression Node This demonstration illustrates using the Regression tool.

36 94


Chapter 2: Introduction to Predictive Modeling: Regressions 2.1 Introduction

2.2 Selecting Regression Inputs 2.3 Optimizing Regression Complexity 2.4 Interpreting Regression Models 2.5 Transforming Inputs 2.6 Categorical Inputs 2.7 Polynomial Regressions (Self-Study)

37

ยงsas

Model Essentials - Regressions

Select useful inputs,

38 95

Sequential selection


Sequential Selection - Forward Input p-value

Entry Cutoff

39

§sas — h2M

Sequential Selection - Forward Input p-value

E n t r y

40

96

C u t o f f


Sequential Selection - Forward Input p-value

Entry Cutoff

41

Sequential Selection - Forward Input p-value

Entry Cutoff

42

97


Sequential Selection - Forward Input p-value

Entry Cutoff

43

§sas gb.

Sequential Selection - Backward

• •••• ln

P P" ut

value

44

98

Stay Cutoff


Sequential Selection - Backward Input p-value

•••••• •••••••

s t a y

C u t o f f

45

§sas

Sequential Selection - Backward Input p-value^

Stay Cutoff

• • • • • •

••••••• •••• • •

46

99

»

1


.

.

i

:

:

-

-

Sequential Selection - Backward Input p-value

Stay Cutoff

••••••• •••• • •

mm* I

47

Sequential Selection - Backward Input p-value

•••••••• ••••••• •••• • •

s t a y

43

100

C u t o f f


Sequential Selection - Backward Input p-value

•••••••• ••••••• • •• • • • _ • • •

Stay Cutoff

u

49

§sas

Sequential Selection - Backward Input p-value

j

1

••••••• ••••• •••• • 11 • • • • •

50

101

Stay Cutoff


Sequential Selection - Backward Input p-value

•••••••• •••• MM • • • • • • • • • •

s t a y

Cutoff

51

§sas

Sequential Selection - Stepwise Input p-value

Entry Cutoff Stay Cutoff

52

102


Sequential Selection - Stepwise Input p-value

Entry Cutoff

sta

y

Cutoff

53

Sequential Selection - Stepwise Input p-value

Entry Cutoff

SteyCutoff

54

103


Sequential Selection - Stepwise Input p-value

Entry Cutoff

•

sta

y

Cutoff

55

Sequential Selection - Stepwise Input p-value

Entry Cutoff Stay Cutoff

56

104


Sequential Selection - Stepwise Input p-value

Entry Cutoff

_

s

t

a

y

C

u

t

o

f

f

t

57

Sequential Selection - Stepwise

• [ • • • •••• • •

Input p-value

Entry Cutoff Stay Cutoff

58

105


Selecting Inputs This demonstration illustrates using stepwise selection to choose inputs for the m o d e l .

59

ยง s a s m&

Chapter 2: Introduction to Predictive Modeling: Regressions 2.1 Introduction 2.2 Selecting Regression Inputs

2.3 Optimizing Regression Complexity 2.4 Interpreting Regression Models 2.5 Transforming Inputs 2.6 Categorical Inputs 2.7 Polynomial Regressions (Self-Study)

60

106


Model Essentials - Regressions

Optimize complexity.

61

107

Best model from sequence


Select Model with Optimal Validation Fit Model fit statistic

Choose simplest optimal model. •a

•

3

63

§sas

Optimizing Complexity This demonstration illustrates tuning a regression model to give optimal performance on the validation data.

64

108


Chapter 2: Introduction to Predictive Modeling: Regressions 2.1 Introduction 2.2 Selecting Regression Inputs 2.3 Optimizing Regression Complexity

2.4 Interpreting Regression Models 2.5 Transforming Inputs 2.6 Categorical Inputs 2.7 Polynomial Regressions (Self-Study)

65

ยงsas]fe

Beyond the Prediction Formula

Interpret the model.

66

109


.. Beyond the Prediction Formula

' "- n

Interpret the model.

67

K1

Logistic Regression Prediction Formula

63

110


Odds Ratios and Doubling Amounts A

p log

(T36)

=

Ax:1 Doubling amount:

How much does an input have to change to double the odds?

consequence

1 => odds x exp(w,-) 0.69 _> odds x2

Odds ratio: Amount

odds change with unit change in input.

69

ยงsas

Interpreting a Regression Model This demonstration illustrates interpreting a regression model using odds ratios.

70

111


Chapter 2: Introduction to Predictive Modeling: Regressions 2.1 Introduction 2.2 Selecting Regression Inputs 2.3 Optimizing Regression Complexity 2.4 Interpreting Regression Models

2.5 Transforming Inputs 2.6 Categorical Inputs 2.7 Polynomial Regressions (Self-Study)

71

ยงsas

Beyond the Prediction Formula

Handle extreme or unusual values

72

112


Extreme Distributions and Regressions Original Input Scale

true association

skewed input distribution

high leverage points

73

Extreme Distributions and Regressions Regularized Scale

Original Input Scale

true association Standard regression

;

m

standard ~ skewed inout

regressio^.^^- ^ 0

true association hioh leveraoe oomts distribution

74

113


Regularizing Input Transformations Regularized Scale

Original Input Scale

standard regression

*

standard regression

•• • shewed /npirf distribution

high leverage points

more symmetric distribution

75

§SaS -Sau

Regularizing Input Transformations Original Input Scale

Regularized Scale

standard regression y**"

regularized estimate

B f c h i — • I., - -

regularized estimate

76

114

standard regression


Regularizing Input Transformations Regularized Scale

Original Input Scale

true association regularized estimate

regularized estimate true association

77

Transforming Inputs This demonstration illustrates using the T r a n s f o r m Variables tool to apply standard transformations to a set of inputs.

78

115


Chapter 2: Introduction to Predictive Modeling: Regressions 2.1 Introduction 2.2 Selecting Regression Inputs 2.3 Optimizing Regression Complexity 2.4 Interpreting Regression Models 2.5 Transforming Inputs

2.6 Categorical Inputs 2.7 Polynomial Regressions (Self-Study)

79

Beyond the Prediction Formula

Use nonnumeric inputs.

80

116


Beyond the Prediction Formula

Use nonnumeric inputs

81

Nonnumeric Input Coding Level

D

A

D

B

D

D

c

D

D

D

E

D

F

G

D

H

D

A B C D E

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0 0 0 0 1

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

F G H I

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1

82

117

f


Coding Redundancy Level

DA

D

B

D

A B

1 0

0 1

0 0

c D

0 0 0 0 0 0 0

0 0 0 0 0 0 0

1 0 0 0 0 0

E F G H I

D

C

0

E

D

F

0 0 0

0 0 0

0 0

1 0 0 0 0 0

0 1 0 0 0

0 0 1 0 0 0

0

0

D

G

0 0 0

o

0 0 0 1 0

0 0 0 0 1 0

0

y

0 0 D

83

ยงsas|fe

Coding Consolidation Level A B C D \g

H I

DA

D

D

1 0 0 0 0

0 1 0 0 0

0 0 0 0

D

D

D

D

0 \ 0 0 0 0 0 1 0 0 1 0 0 0 0

0 0 0 0 0 1 0

0

0 0

0 0 1 0 0 0 0

0 0 0 0 0 1

0 0 0 0 0 0 0

0 0

0 0

0 0

0 0

0 0

1 0

B

c

Do

84

118

E

0 0

F

G

H

I


Coding Consolidation Level A B C D E G H i

D

A B C D

1 1 1 1 0 0

0 0 0 0 1

0 0 0

0 0 0

1

0 0 0 0 0 0 1 1 0

85

ยงsas

Recoding Categorical Inputs This demonstration illustrates using the R e p l a c e m e n t tool to facilitate the process of combining input levels.

86

119


Chapter 2: Introduction to Predictive Modeling: Regressions 2.1 Introduction 2.2 Selecting Regression Inputs 2.3 Optimizing Regression Complexity 2.4 Interpreting Regression Models 2.5 Transforming Inputs 2.6 Categorical Inputs 2.7 Polynomial Regressions (Self-Study)

87

§sas

Beyond the Prediction Formula

Account for nonlinearities.

120

_ —


Beyond the Prediction Formula

Account for nonlinearities. 89

Standard Logistic Regression A

W + IVf 0

.

A

+ w x 2

2

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 '1

90 121


Polynomial Logistic Regression

91

Adding Polynomial Regression Terms Selectively This demonstration illustrates how to a d d polynomial regression terms selectively.

92

122


Adding Polynomial Regression Terms Autonomously (Self-Study) This demonstration illustrates h o w to a d d polynomial regression terms autonomously.

93

This exercise reinforces the concepts discussed previously.

94

123


Regression Tools Review h p Lite

Replace missing values for interval (means) and categorical data ( m o d e ) . Create a unique replacement indicator. Create linear and logistic regression models. Select inputs with a sequential selection method and appropriate fit statistic. Interpret models with odds ratios.

•

~f< Transform \ Variables

Regularize distributions of inputs. Typical transformations control for input s k e w n e s s via a log transformation.

continued.

95

Regression Tools Review Consolidate levels of a nonnumeric input using the Replacement Editor window.

ftHynorrial Regression

A d d polynomial terms to a regression either by hand or by an a u t o n o m o u s exhaustive search.

96

124


Chapter 3: Introduction to Predictive Modeling: Neural Networks and Other Modeling Tools 3.1 Introduction

Model Essentials - Neural Networks Prediction formula

Predict new cases. Select useful inputs.

None

Optimize complexity.

Stopped training

2 125


Model Essentials - Neural Networks Prediction formula

Predict new cases.

^

Select useful inputs.

None

Optimize complexity.

Stopped training

3

Model Essentials - Neural Networks Prediction formula

Predict new cases.

4

126


Neural Network Prediction Formula

5

ยงsas

Neural Network Prediction Formula 1/ estimate

j

=w

00

w<

+

0

bia: estimate

+w H +w 02

2

H

03

3

est;

H = tanh(vv +

x + w

10

1

H = tanh(vv o 2

2

+

1

w

21

^

x

+

30

6

127

3

2

^22 2)

H = tanh(w + vv i * i + w 3

x)

12

x

32

^2)


Pima?

Neural Network Binary Prediction Formula A log

(T7A')

=

+

^oV 1 H

+

w

02 H + w - H 2

03

3

//nfc function

Hf = tanh(vv +

x + w

10

1

H = tanh(vv + vv 20

2

-5

H = tanh(vv + w 30

3

21

x

31

i

+

12

x) 2

^22 ^2)

x + vv x ) 1

32

2

ยงsas

Neural Network Diagram

H = tanh(w + Wn

+w

10

1

12

x) 2

H = tanh(vv + vv x + vv x ) 2

20

H = tanh(vv + w 30

3

8

128

21

31

1

22

2

x + vv x ) 1

32

2


m

Neural Network Diagram

H = tanh(vv + vv^ x + w x ) 10

1

1

12

2

H = tanh(vv + w x + w x ) 20

2

21

H = tanh(w + w * i 30

3

input layer

hidden layer

31

1

2

22

+

w * ) 32

2

target layer

ยงsas

Prediction Illustration - Neural Networks 1.0

logit equation !ogit{ p ) = vv + w - H + w - H + w H 00

01

H-, = tanh(vv H = tanh(vv 2

H = tanh(tv 3

02

A

10

+ w

u

20

+ vv

30

+ w

21

31

2

oi

+w

12

x) 2

+ vv x ) 22

2

0.9 z

0.7 0.6

x, + vv x ) 32

2

0.4 0.3 0.2 0.1 0.0

0.

10

129

โ ข


ยงsas

Prediction Illustration - Neural Networks logit equation "A

+ w

*v

w -H

01

Q2

A 2

w -

+

0.8

03

H = tanh(w

10

+ w x, + w

1 2

x)

H = tanh{w

20

+ w

2 2

x )

H = tanh(w

30

+ vv x + t v

32

x )

1

2

3

u

2 1

Xi + w

31

1

2

0.6

2

2

0.3

Need weight estimates.

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

11

Prediction Illustration - Neural Networks logit equation logit(p) = w

0 0 +

w

0 1

H

1 +

w

0 2

H -f w H 2

0i

H = tanh(w

10

+ w

H = tanh{w

20

+ w

x-, + w

2 2

x )

H = tanh(w

30

+ vv x + w

3 2

x)

1

2

3

+ iv

n

2 1

31

A

12

2

x ) 2

2

2

0.4 Weight estimates found by maximizing'.

0.3 0.2

X log(p) + I log(1 - p,) 0.1 f

prtman/ outcome training cists

outcome training cases

0.0 0.0 0.1 0.2 0.3

12

130


Prediction Illustration - Neural Networks logit equation

logit(p)= 15 + -2.6 H, + -1.9 H + 0.63H 2

3

H, = tanh{-1.8 + 0.25 x, + -1.8 x ) 2

tf = tanh{ 2.7 + 2.7 x< + -5.3 x ) 2

2

H =tanh(-5.0+ 8.1 Xt + 4.3 x ) 3

2

1

A

P

~

=

1 + -logit(p) e

Probability estimates are obtained by solving the logit equation for p for each (x x ). 1(

2

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 "l

13

Neural Nets: Beyond the Prediction Formula Manage missing values. Handle extreme or unusual values. Use non-numeric inputs. Account for nonlinearities. Interpret the model.

131


— — — \ — JHHES&ati! _. Neural Nets: Beyond the Prediction Formula Manage missing values. ^

Handle extreme or unusual values.

^

Use non-numeric inputs.

Account for nonlinearities.

15

Training a Neural Network This demonstration illustrates using the Neural Network tool.

16

132


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.