Validating Models of Cognition
Terrence C. Stewart Centre for Theoretical Neuroscience University of Waterloo
Cognitive Modelling ●
Computer simulation of cognition
●
How do we know when we're right? –
●
Some sort of match between the output of the model and the empirical data
What sort of match?
2
Matching to Empirical Data ●
●
Lots of different data –
Many different tasks
–
Many different conditions in each task
Example: ACT-R –
Mental arithmetic, driving a car, learning word pairs, parsing English sentences, playing rockpaper-scissors, dialing a phone, air traffic control
–
Same components, same parameter values (different background knowledge, different sensory data) 3
Matching to Empirical Data ●
Lots of different types of data –
Accuracy, reaction time
–
fMRI, spike recording, neural connectivity, etc
4
Wait a second... ●
What do we mean by a match?
●
How do we say how good it is?
●
How do we handle these large numbers of different kinds of measures?
5
What not to do â—?
Correlation
6
What not to do â—?
Correlation
7
What not to do â—?
Mean squared error
8
What not to do ●
Mean squared error
–
Need to account for confidence intervals!
–
What can we safely conclude about this model? 9
What not to do ●
Mean squared error given what we know, the model is unlikely to be wrong by more than this amount
–
Need to account for confidence intervals!
–
What can we safely conclude about this model? 10
Multiple measures ●
How do we handle the many different conditions? –
Data from my entry in the 2009 Technion Choice Prediction tournament ●
Behavioural economics model
11
Multiple measures ●
Worst-case scenario –
Do not take the mean! Then you could improve the model by adding conditions that it's good at!
–
Take the worst over all conditions
12
Multiple measures ● ●
Instead, remove measures it is bad at Forces you to be explicit about what your model can and cannot account for –
Extreme conditions ●
–
Something missing from the model (strategy shift)
Anomalous data ●
● ●
Might not be something missing from the model Might be incorrect empirical data! With 60 measures, ~3 will actually be outside the 95% confidence intervals!
13
Implications for parameter fitting â—?
RMSE approach
â—?
Equivalence approach (scaled so <1 is inside empirical CI range)
14
Implications for parameter fitting â&#x2014;?
RMSE approach
Would not have won
â&#x2014;?
Equivalence approach (scaled so <1 is inside empirical CI range)
Won the competition
15
Importance of Confidence Intervals â&#x2014;?
Dynamic Stocks and Flows task
16
Measures other than the mean
17
Conclusion ● ●
●
Use many conditions, tasks, and types of measure Largest likely difference between model and empirical data Scaled by size of empirical CI –
●
So <1 means no statistical difference
Maximum across all measures –
Explicitly remove measures model is bad at ●
–
Guides future research
Consider median, s.d., and others as well
Stewart & West, 2011. Testing for Equivalence: A Methodology for Computational Cognitive Modelling. Journal of Artificial General Intelligence, 2(2).
18