ERCM News 125 - Brain-Inspired Computing

Back-propagation Now Works in Spiking Neural Networks! by Timothée Masquelier (UMR5549 CNRS – Université Toulouse 3) Back-propagation is THE learning algorithm behind the deep learning revolution. Until recently, it was not possible to use it in spiking neural networks (SNN), due to non-differentiability issues. But these issues can now be circumvented, signalling a new era for SNNs. Biological neurons use short electrical impulses called “spikes” to transmit information. The spike times, in addition to the spike rates, are known to play an important role in how neurons process information. Spiking neural networks (SNNs) are thus more biologically realistic than the artificial neural networks (ANNs) used in deep learning, and are arguably the most viable option if one wants to understand how the brain computes at the neuronal description level. But SNNs are also appealing for AI, especially for edge computing, since they are far less energy hungry than ANNs. Yet until recently, training SNNs with back-propagation (BP) was not possible, and this has been a major impediment to the use of SNNs. Back-propagation (BP) is the main supervised learning algorithm in ANNs. Supervised learning works with examples for which the ground truth, or “label”, is known, which defines the

desired output of the network. The error, i.e., the distance between the actual and desired outputs, can be computed on these labelled examples. Gradient descent is used to find the parameters of the networks (e.g., the synaptic weights) that minimise this error. The strength of BP is to be able to compute the gradient of the error with respect to all the parameters in the intermediate “hidden” layers of the network, whereas the error is only measured in the output layer. This is done using a recurrent equation, which allows computation of the gradients in layer l-1 as a function of the gradients in layer l. The gradients in the output layer are straightforward to compute (since the error is measured there), and then the computation goes backward, until all gradients are known. BP thus solves the “credit assignment problem”, i.e., it finds the optimal thing to do for the hidden layers. Since the number of layers is arbitrary, BP can work in very

Figure 1. (Top) Example of spectrogram (Mel filters) extracted for the word “off”. (Bottom) Corresponding spike trains for one channel of the first layer. ERCIM NEWS 125 April 2021

deep networks, which has led to the widely talked about deep learning revolution. This has motivated us and others to train SNNs with BP. But unfortunately, it is not straightforward. To compute the gradients, BP requires differentiable activation functions, whereas spikes are “all-or-none” events, which cause discontinuities. Here we present two recent methods to circumvent this problem. S4NN: a latency-based backpropagation for static stimuli The first method, S4NN, deals with static stimuli and rank-order-coding [1]. With this sort of coding, neurons can fire at most one spike: most activated neurons first, while less activated neurons fire later, or not at all. In particular, in the readout layer, the first neuron to fire determines the class of the stimulus. Each neuron has a single latency, and we demonstrated that the gradient of the loss with respect to this latency can be approximated, which allows estimation of the gradients of the loss with respect to all the weights, in a backward manner, akin to traditional BP. This approach reaches a good accuracy, although below the state-of-the-art: e.g., a test accuracy of 97.4% for the MNIST dataset. However, the neuron model we use, non-leaky integrate-and-fire, is simpler and more hardware friendly than the one used in all previous similar proposals. Surrogate Gradient Learning: a general approach One of the main limitations of S4NN is the at-most-one-spike-per-neuron constraint. This constraint is acceptable for static stimuli (e.g., images), but not for those that are dynamic (e.g., videos, sounds): changes need to be encoded by additional spikes. Can BP still be used in this context? Yes, if the “surrogate gradient learning” (SGL) approach is used [2].

ERCM News 125 - Brain-Inspired Computing

Articles inside

ERCM News 125 - Brain-Inspired Computing: Introduction to the Special Theme

Security Management and the Slow Adoption of Blockchains

Graph-based Management of Neuroscience Data Representation, Integration and Analysis

ACCORDION: Edge Computing for NextGen Applications

The ICARUS Ontology: A General Aviation Ontology

Trick the System: Towards Understanding Automatic Speech Recognition Systems

Human-like AI

NeuroAgents – Autonomous Intelligent Agents that Interact with the Environment in Real Time

What Neurons Do – and Don’t Do

NEUROTECH - A European Community of Experts on Neuromorphic Technologies

Uncovering Neuronal Learning Principles through Artificial Evolution

Touch in Robots: A Neuromorphic Approach

E = AI

Fulfilling Brain-inspired Hyperdimensional Computing with In-memory Computing

Memory Failures Provide Clues for more Efficient Compression

Reentrant Self-Organizing Map: Toward Brain inspired Multimodal Association

Neuronal Communication Process Opens New Directions in Image and Video Compression Systems

Brain-inspired Learning Drives Advances in Neuromorphic Computing

Self-Organizing Machine Architecture

Fast and Energy-efficient Deep Neuromorphic Learning

Effective and Efficient Spiking Recurrent Neural Networks

The BrainScaleS Accelerated Analogue Neuromorphic Architecture

BrainScaleS: Greater Versatility for Neuromorphic Emulation

ERCIM “Alain Bensoussan” Fellowship Programme

Back-propagation Now Works in Spiking Neural Networks!

ERCIM-JST Joint Symposium on Big Data and Artificial Intelligence

Brain-inspired Computing