SARLEM

Page 1


Making AI more transparent

The Sarlem project aims to create a new form of artificial intelligence that brings together reinforcement learning and model predictive control. This new form of Artificial Intelligence will combine the ability to learn from experience with a greater degree of transparency over how decisions are reached, as Dr Sebastien Gros explains.

The challenge of how to maximise rewards while also respecting the safety boundaries of a system is common across many areas of modern society, from the management of logistics chains to individuals playing computer games. Reinforcement learning (RL), a form of artificial intelligence (AI) that originated in academic computer science research, can help guide people and businesses towards the optimal decision in a variety of different circumstances. “The goal with reinforcement learning (RL) is to discover how you should operate a given system in order to maximise rewards,” outlines Dr Sebastien Gros, Head of the Engineering Cybernetics Department at the Norwegian University of Science and Technology (NTNU). If this learning process is successful then RL-based AI can provide recommendations on the best course of action in certain situations, but the underlying reasons may not be entirely clear, which Dr Gros says is a significant problem.

“You have no idea if it’s even safe to follow the recommendation. The technology may tell you that you need to turn left; ok, why should I turn left? What’s going to happen? What will be the consequences? You simply don’t know,” he stresses.

Sarlem project

This is an issue Dr Gros is investigating in the Sarlem research project, an initiative backed by the Research Council of Norway. One of the key aims in the project is to add the element of explainability to RL methods, so enhancing transparency and safety. “When a system returns a decision that you should do x, we try to attach the reasons why it wants you to do that,” explains Dr Gros. The absence of explainability is one of the main reasons why RL is not yet widely used in practical applications, believes Dr Gros. “Industry in general has very cold feet about using these RL methods. They like to know why they are being advised to take a certain course of action. Why is this decision being recommended? What does the future look like?” he says. “This is missing from pure RL currently, which I would argue is one reason

why RL is not actually much used in practice at the moment – it’s mainly limited to lab experiments and academic demonstrations. Industry will not take up methods that they don’t understand.”

As part of his work in Sarlem, Dr Gros is now looking to combine RL with model predictive control (MPC) methods, which he describes as a very traditional decision-

a new form of AI, where MPC effectively takes the place of the more commonly used artificial neural networks (ANN).

“An ANN would receive data from your system and then make a recommendation, but you don’t know what happened in that machinery. The MPC model would perform the same operation as an ANN but, alongside providing a recommendation, it

emissions, it could be energy minimisation,” he outlines. Energy management is one area in which this technology could be applied. “We are typically very interested in applications where you have a lot of uncertainty, so you cannot be sure what’s going to happen in the system, in the environment. The energy system is full of uncertainties, as there are lots of things that cannot really be predicted,” continues Dr Gros. “We have worked on energy building management, smart homes, and energy communities – a further interesting application is the offshore wind industry. These are some of the applications that we’ve looked at in the project.”

FATE in AI

The backdrop to this research is the increasing pervasiveness of AI in everyday life, with a huge number of new tools entering the market. These are mainly generative AI tools however, for example those which produce text, sound and images, whereas AI-based decision-making tools have not yet taken off to the same extent. “This is largely because of these outstanding questions around safety and explainability,” says Dr Gros. As part of his work Dr Gros is promoting the acronym FATE (Fairness, Accountability, Transparency and Explainability), four topics which he believes need to be addressed before AI can be used

more intensively across society. “We cannot work with mysterious, alchemic tools that simply produce results that we don’t understand,” he says. “This doesn’t mean that neural networks cannot be used in AI, but there is a lot of work to do to explain and prove how FATE can be achieved with neural networks, and plan to conduct further research around the FATE question in the future.”

This work holds particular relevance to the energy transition, and the possibility of using AI to manage the limited supply generated by renewable sources. Achieving FATE is essential if AI tools are to be used in sharing out or allocating critical resources like energy, believes Dr Gros. “We need to achieve FATE, otherwise we won’t know if decisions made by AI make any sense, and if they are fair to us or not,” he points out. A greater degree of transparency is also essential to the operation of digital twins, in silico machinery designed to mimic reality, another topic that Dr Gros plans to address.

“Digital twins are typically developed to help people take good and safe decisions in the real world, they allow you to assess the impact of decisions on a computer before you implement them in reality,” he explains. “We want to explain how digital twins should be built for decisionmaking, which is not well understood at the moment.”

foto: m.c.herzog / visualis-images

“Industry in general has very cold feet on using these reinforcement learning methods. They like to know why they are being advised to take a certain course of action. Why

is this decision being recommended?”

making tool. An MPC model can look at a system as it stands, and it will imagine a sequence of actions to take on it over a given time horizon. “The MPC will then predict how the system will respond to that. It uses a model, and it will try to optimise this sequence of actions against a reward that you want to maximise,” says Dr Gros.

The project team is now looking to bring together MPC methods with RL to create

would give you a complete explanation of what’s going on,” explains Dr Gros.

This research was initially largely theoretical in nature, but now Dr Gros and his colleagues in the project are starting to look towards the potential applications of this technology. The main areas of interest in these terms are those where what we need to optimize is very clear, says Dr Gros. “It could be money, it could be reducing CO2

SARLEM

Safe Reinforcement Learning using Model Predictive Control

Project Objectives

Reinforcement learning is a potentially powerful tool in the artificial intelligence field, helping people and businesses operate systems in line with their own priorities, yet industry is wary of applying it. as the reasoning behind RL-based recommendations is not clear. The aim in the SARLEM project is to add the element of explainability to RL methods, developing a new form of AI which is both more transparent and safer. This could lead to AI being used more widely to address major contemporary issues, such as managing the energy transition.

Project Funding

This project is funded by The Research Council of Norway (RCN-NFR), Grant number 300172. 1 Million euros.

Project Partners

The company DNV is partner in the project. It is a huge company which cares about trust and safety in AI as part of its business. https://www.dnv.no

Contact Details

Prof. Sebastien Gros

Head of Department Dept. of Eng. Cybernetic Faculty of Information Technology NTNU, Gløshaugen NO-7491 Trondheim, Norway

T: +47 459 17 969

E: sebastien.gros@ntnu.no

W: https://www.ntnu.edu/employees/ sebastien.gros

Sebastien Gros has obtained his PhD from EPFL in 2008. After a bike trip from Switzerland to the Everest base camp, and a brief industrial experience, he has joined KU Leuven as a postdoc. He became assistant Prof. at Chalmers in 2013. He is currently full Prof. and Head of Department of Cybernetics at NTNU.
Sebastien Gros
Illustration of the a condition for a Digital Twin to enable optimal decisions. This Figure is frequently used in lectures and seminars on AI for decision by S. Gros.

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.