José Luis BERMUDEZ — University Texas A&M — How deep is Deep Learning? by Université Côte d'Azur

How deep is deep learning? SOPHIA Summit on AI Université Côte d’Azur November 8, 2018 José Luis Bermúdez Texas A&M University

Overview 1 Deep learning â&#x20AC;&#x201C; the questions from cognitive science 2 Critiques: Old and new 3 Functional equivalence: Causal models and intuitive theories 4 Mechanisms of deep learning: The significance of reinforcement learning

Undisputed achievements of deep learning • Extraordinary success on tasks hitherto thought to be beyond the scope of AI for the forseeable future • Image recognition (e.g. ImageNet competition) • Speech recognition • Higgs boson machine learning challenge • Natural language processing • Translation • Predicting next word in a sequence • Drug discovery (predicting how drugs will interact with target molecules in the body

Deep learning and human cognition Basic question: Is deep learning genuinely intelligent? Sub-questions: (1) Functional equivalence question: Can deep learning algorithms replicate sophisticated human cognitive abilities? (2) Mechanism question: How closely do deep learning algorithms capture how the brain actually learns?

Strong vs. weak AI Project of weak AI = to build artificial agents that can pass the Total Turing Test (TTT) â&#x20AC;&#x201C; weak AI aims at functional equivalence in output Project of strong AI = to build artificial agents that are genuinely intelligent â&#x20AC;&#x201C; the only way to operationalize this so that it comes out distinct from functional equivalence is in terms of parity of mechanism

Definition of strong AI Strong AI = Functional equivalence + parity of mechanism *** Focus in this talk will be on the potential role for deep learning in the enterprise of strong AI

Looking backwards • Modern DL algorithms are modern reincarnations of artificial neural networks that came to prominence (briefly) in the 1980’s • Many DL algorithms share basic features of ANNs • Feedforward • Hidden layers • Learn through backpropagation

Looking forwards • Many breakthroughs in analyzing information that can be presented in a grid format come from convolutional neural networds (ConvNets) • ConvNets have distinctive features including sparse connectivity and shared weights • Some of the most powerful DL methods are capable of reinforcement learning (neither supervised nor unsupervised)

Historical objections to ANNs Functional equivalence

• Can do pattern recognition, but not symbolic, rule-based processing • Not a genuine alternative to symbolic AI, but at best an implementation • Argument associated with Fodor & Pylyshyn)

Parity of mechanism

• Little similarity to genuine neural networks • too homogenous • neurons not as connected as in ANNs • No evidence of backprop in the brain • Learning sets too large

The new landscape: Functional equivalence • The critical focus is more on model-based reasoning, than on than symbolic cognition • Reflects a broader shift in cognitive science towards Bayesian modeling and domain-specific core knowledge • Issues about compositionality remain in the background, however

The new landscape: Parity of mechanism • Some concerns about biological plausibility have been headed off by large-scale analogies in type of processing • Important parallels between the multi-layered architecture of representation learning in DL and the general structure of informationprocessing in the visual cortex • Reinforcement learning takes DL much closer to human learning

New version of old objection (Lake et al. 2017) • Deep learning is not capable of modeling the distinctive capacities of human intelligence • Key parts of human intelligence depend upon model-based learning • Deep learning algorithms, which are basically algorithms for pattern recognition, are not capable of model-based learning

Model-based learning: The background Much (most? all?) human cognition is domain specific Core domains include: Number Biology Physics Spatial layout of environment Social interaction

From an early age, humans develop intuitive theories for each of these core domains Intuitive theories incorporate a basic ontology and principles governing how objects in the domain interact These intuitive theories are refined through proto-scientific forms of hypothesis-testing

The bigger picture: Causal modeling â&#x20AC;˘ Most intuitive theories are causal â&#x20AC;˘ These theories are supposed to contain causal principles that help subjects navigate the causal structure of physical and social environments â&#x20AC;˘ This needs to be viewed against the background of a much broader movement in statistics and how statistics are used in the social sciences

Causal inference in statistics â&#x20AC;˘ Critique of classical statistics as being able only to summarize data, rather than interpret it â&#x20AC;˘ Claim that causal models are needed to interpret data Pearl proposes a calculus of causation: causal diagrams + an algebra for representing queries and interventions

“One aspect of deep learning does interest me: the theoretical limitations of these systems, primarily limitations that stem from their inability to go beyond rung one of the Ladder of Causation. . . Like the prisoners in Plato’s famous cave, deep learning systems explore the shadows on the cave wall and learn to accurately predict their movements. They lack the understanding that the observed shadows are mere projections of three-dimensional objects moving in three-dimensional space. Strong AI requires this understanding.” (The Book of Why p. 362)

Exhibit 1: Intuitive physics (Lake et al. 2017) • Intuitive physics is analogous to inference over a physics software engine. • Represents the objects in the environment in terms of physically relevant properties (e.g. mass, elasticity, and surface friction), and forces acting on them (e.g. gravity, friction, or collision impulses). • These representations support simulations that predict how objects will behave and interact

Exhibit 2: Intuitive psychology (Lake et al. 2017) • Intuitive psychology employs explicitly mentalistic concepts (e.g.`goal,' `agent,' `planning,' `cost,' `efficiency,' and `belief’ and assumes that agents are approximately rational planners who choose the most efficient means to their goals. “Planning computations may be formalized as solutions to Markov Decision Processes (or POMDPs), taking as input utility and belief functions defined over an agent's state-space and the agent's state-action transition functions, and returning a series of actions the agent should perform to most efficiently fulfill their goals (or maximize their utility). By simulating these planning processes, people can predict what agents might do next, or use inverse reasoning from observing a series of actions to infer the utilities and beliefs of agents in a scene.”

Three different claims here (1) Successful scientific theories of the world are causal theories

Probably true

(2) Everyday interactions with the world constantly exploit causal theories and causal hypotheses

Probably false

(3) Causal reasoning cannot be understood as pattern recognition

Quite possibly false

An alternative perspective on intuitive psychology • Many types of social interaction and social coordination do not exploit any kind of explicit prediction of what others will do • Social interaction and coordination that does involve prediction need not exploit psychological concepts such as belief and desire • Prediction and explanation that does involve concepts such as belief and desire does not have to exploit a causal model

Theory-driven intuitive psychology incorporating causal model Interaction/coordination without prediction or explanation • emotional regulation/affect attunement • reciprocity rules for repeated interactions Prediction without psychological concepts • scripts and routine interactions • simple behavior-reading Psychological concepts without causal theory • running decision-making processes off-line

Three hypotheses 1) Explicit causal theorizing plays a relatively minor role in social interactions 2) The overwhelming majority of human social interactions are governed by lower-level mechanisms, 3) These lower-level mechanisms basically involve recognition of social patterns

A quick thought about causality • How can Pearl et al. be so confident that there is such a sharp distinction between causal reasoning and pattern recognition • David Hume: ” A cause is an object, followed by another, where all the objects similar to the first are followed by objects similar to the second” • Hume’s approach to causality (subsequently developed by Michotte and others) identifies causation with patterns of spatio-temporal continguity and repeated association

Neural parallels for representation learning Representation learning is a set of methods that allows a machine to be fed with raw data and to automatically discover the representations needed for detection or classification. Deep-learning methods are representation-learning methods with multiple levels of representation, obtained by composing simple but non-linear modules that each transform the representation at one level (starting with the raw input) into a representation at a higher, slightly more abstract level. (LeCun, Bengio, and Hinton 2015)

Reinforcement learning • Reinforcement learning differs from supervised learning – no task-specific error signal and no labeled training set. • It differs from unsupervised learning (e.g. in convolutional neural networks), because it is doing more than detecting patterns • The feedback signal in reinforcement learning is a reward signal. • The job of a network incorporating a reinforcement learning algorithm is to maximize the reward. But it is not told how to do that. It has to work out for itself which outputs are most profitable, so that it can repeat and/or adapt them.

Why is this significant? â&#x20AC;˘ Reinforcement learning is the most widespread and powerful form of learning in the animal kingdom. â&#x20AC;˘ Behaviorists may not be correct that all learning is reinforcement learning, but many, if not most, intelligent behaviors in humans and animals are the result of an exquisite sensitivity to action-reward contingencies. â&#x20AC;˘ So, incorporating reinforcement learning is a big step towards parity of mechanism

The reward prediction error hypothesis (Montague, Dayan, and Sejnowski 1996)

• Behavior of dopaminergic neurons in the VTA signals how the actual reward differs from the predicted reward • Increase in dopamine indicates that reward is better than expected, while decrease indicates worse than expected • So, learning mechanisms in the brain incorporate mechanisms found in reinforcement learning algorithms (e.g. temporal difference learning)

Concluding thoughts • Strong AI = functional equivalence + parity of mechanism • Objections to functional equivalence for deep learning from modelbased reasoning seem very exaggerated • Reinforcement learning algorithms point in interesting directions towards parity of mechanism

So â&#x20AC;&#x201C; how deep is deep learning? Quite possibly very deep indeed Thank you!