Back-propagation Now Works in Spiking Neural Networks!

from ERCM News 125 - Brain-Inspired Computing

ERCM News 125 - Brain-Inspired Computing: Introduction to the Special Theme

Back-propagation Now Works in Spiking Neural Networks!

by Timothée Masquelier (UMR5549 CNRS – Université Toulouse 3)

Back-propagation is THE learning algorithm behind the deep learning revolution. Until recently, it was not possible to use it in spiking neural networks (SNN), due to non-differentiability issues. But these issues can now be circumvented, signalling a new era for SNNs.

Biological neurons use short electrical impulses called “spikes” to transmit information.The spike times, in addition to the spike rates, are known to play an important role in how neurons process information. Spiking neural networks (SNNs) are thus more biologically realistic than the artificial neural networks (ANNs) used in deep learning, and are arguably the most viable option if one wants to understand how the brain computes at the neuronal description level. But SNNs are also appealing for AI, especially for edge computing, since they are far less energy hungry than ANNs. Yet until recently, training SNNs with back-propagation (BP) was not possible, and this has been a major impediment to the use of SNNs.

Back-propagation (BP) is the main supervised learning algorithm in ANNs. Supervised learning works with examples for which the ground truth, or “label” , is known, which defines the desired output of the network. The error, i.e., the distance between the actual and desired outputs, can be computed on these labelled examples. Gradient descent is used to find the parameters of the networks (e.g., the synaptic weights) that minimise this error. The strength of BP is to be able to compute the gradient of the error with respect to all the parameters in the intermediate “hidden” layers of the network, whereas the error is only measured in the output layer. This is done using a recurrent equation, which allows computation of the gradients in layer l-1 as a function of the gradients in layer l. The gradients in the output layer are straightforward to compute (since the error is measured there), and then the computation goes backward, until all gradients are known. BP thus solves the “credit assignment problem” , i.e., it finds the optimal thing to do for the hidden layers. Since the number of layers is arbitrary, BP can work in very deep networks, which has led to the widely talked about deep learning revolution.

This has motivated us and others to train SNNs with BP. But unfortunately, it is not straightforward. To compute the gradients, BP requires differentiable activation functions, whereas spikes are “all-or-none” events, which cause discontinuities. Here we present two recent methods to circumvent this problem.

S4NN: a latency-based backpropagation for static stimuli The first method, S4NN, deals with static stimuli and rank-order-coding [1]. With this sort of coding, neurons can fire at most one spike: most activated neurons first, while less activated neurons fire later, or not at all. In particular, in the readout layer, the first neuron to fire determines the class of the stimulus. Each neuron has a single latency, and we demonstrated that the gradient of the loss with respect to this latency can be approximated, which allows estimation of the gradients of the loss with respect to all the weights, in a backward manner, akin to traditional BP. This approach reaches a good accuracy, although below the state-of-the-art: e.g., a test accuracy of 97.4% for the MNIST dataset. However, the neuron model we use, non-leaky integrate-and-fire, is simpler and more hardware friendly than the one used in all previous similar proposals.

Surrogate Gradient Learning: a general approach One of the main limitations of S4NN is the at-most-one-spike-per-neuron constraint. This constraint is acceptable for static stimuli (e.g., images), but not for those that are dynamic (e.g., videos, sounds): changes need to be encoded by additional spikes. Can BP still be used in this context? Yes, if the “surrogate gradient learning” (SGL) approach is used [2].

Figure 1. (Top) Example of spectrogram (Mel filters) extracted for the word “off” . (Bottom) Corresponding spike trains for one channel of the first layer.

Back-propagation Now Works in Spiking Neural Networks!

Next Article

ERCM News 125 - Brain-Inspired Computing: Introduction to the Special Theme

More articles from this publication:

ERCM News 125 - Brain-Inspired Computing: Introduction to the Special Theme

Security Management and the Slow Adoption of Blockchains

Graph-based Management of Neuroscience Data Representation, Integration and Analysis

ACCORDION: Edge Computing for NextGen Applications

The ICARUS Ontology: A General Aviation Ontology

Trick the System: Towards Understanding Automatic Speech Recognition Systems

Human-like AI

NeuroAgents – Autonomous Intelligent Agents that Interact with the Environment in Real Time

What Neurons Do – and Don’t Do

This article is from:

ERCM News 125 - Brain-Inspired Computing