7 minute read

Wisdom and Teamwork Open

WISDOM AND TEAMWORK OPEN-SOURCED

Mava’s the Made-inAfrica Multi-Agent Reinforcement Learning Framework

Many of humanity’s greatest achievements arose from our ability to work together. The complex and distributed problems the world collectively faces now call for a new wave of sophisticated AI cooperation strategies. Responding to that call, an all-Africa-based team at InstaDeep created Mava.

“If you want to go quickly, go alone, an African proverb counsels. If you want to go far, go together ”

You can hear the wisdom of

generations about the value of teamwork reverberating through these words. As we face challenges such as managing scarce resources under pressure due to climate change, ensuring critical supply routes keep flowing or enlist robots for remote rescue and exploration missions, weaving teamwork strategies into AI tools is crucial.

That’s why InstaDeep created Mava: a research framework specifically designed for building scalable, high-quality Multi-Agent Reinforcement Learning (MARL) systems. Mava provides components, abstractions, utilities, and tools for MARL. It can easily scale with multi-process system training and execution while providing a high level of flexibility and new creative possibilities.

“At InstaDeep, we have a real passion for innovation, consistent with our mission to build an AI-first world that benefits everyone,” said Karim Beguir, InstaDeep’s CEO and Co-Founder.

Several frameworks have emerged in the field of single-agent reinforcement learning (RL), including Dopamine, RLlib , and Acme to name just a few. These aim to help the AI community build effective and scalable agents. However, a limitation of these existing frameworks is that very few focus exclusively on MARL – an increasingly active research field with its own set of challenges and opportunities. InstaDeep aims to fill this gap with Mava. By focusing on MARL, Mava leverages the natural structure of Multi-Agent problems. This ensures Mava remains lightweight and flexible while at the same time providing tailored support for MARL.

InstaDeep’s decision to opensource Mava stems from its passion for contributing to the development of MARL, supporting open collaboration, and a commitment to helping develop the wider community, especially across Africa. InstaDeep itself has also benefited from open-source software and wants to give back.

“We’re proud to open-source Mava, a world-class framework entirely designed and built by an all-African, all-star team of InstaDeepers,” Beguir said.

Mava is the latest in a flurry of 2021 open-source releases by InstaDeep, including three massive bio data repositories as part of DeepChain Apps in May and a natural language processing (NLP) model for the Tunisian dialect, an under-resourced African language, in June.

“Working on Mava has been a wonderful experience and a true team effort in collaboration with our African offices in South Africa, Nigeria and Tunisia,” said Arnu Pretorius, the InstaDeep AI Research Scientist who leads the team in Cape Town.

“It really showcases the talent we have on the continent. Not only have we begun to enter the conversation of AI,” Pretorius said, adding, “but we are now starting to take ownership of key technologies, helping to shape the future and contributing to making the world a better place using AI.”

Why MARL?

In Xhosa, one of South Africa’s eleven official languages, “Mava” means experience or wisdom. Only by working together, has humanity been able to accomplish some of its greatest achievements. This has never been more true. The problems we face are distributed, complex and difficult to solve and often require sophisticated strategies of cooperation for us to make any progress. From the standpoint of using AI for problemsolving, this drives us to harness and develop useful computational frameworks for decision-making and cooperation. One such framework is MARL.

MARL extends the decision-making capabilities of single-agent RL to the setting of distributed decision-making problems. In MARL, multiple agents are trained to

Distributed training for multi-agent reinforcement learning in Mava)

act as individual decision-makers in some larger system, while learning to work as a team. The key difference between MARL and single-agent RL is that MARL can be applied in situations where the problem becomes exponentially more difficult to solve as it scales.

This could be a problem such as managing a fleet of autonomous vehicles for a growing population, the number of navigation decisions that must be made at any given moment scales exponentially with the number of cars on the road. This quickly becomes intractable for single-agent approaches. For MARL, it’s an opportunity to shine.

Many of humanity’s most pressing practical problems are similar to this one. MARL has enormous potential to be applied across various sectors from health to transportation, from logistics to agriculture. MARL can make problems of this kind manageable, however, it introduces other difficulties such as the need for decentralised coordination. To be fully effective at scale in new situations, we’ll need researchers to develop new strategies and techniques.

A research framework for MARL

Mava offers several useful and extendable components for making it easier and faster to build Multi-Agent systems. These include custom MARL-specific networks, loss functions, communication, and mixing modules. Perhaps the most fundamental component is the system architecture. The architecture defines how information flows between agents in the system. In Mava, several architectural options are available to help design systems, from independent agents to centralised training schemes and networked systems.

Furthermore, several MARL baseline systems have already been integrated into Mava. These serve as examples to showcase Mava’s reusable features and lets developers easily reproduce and extend existing MARL algorithms.

MARL at scale

So how does it all work? At the core of Mava is the concept of a system. By system, the Mava team means a full MARL algorithm specification comprising the following components: an executor, a trainer, and a dataset. The executor is a collection of single-agent actors and is the part of the system that interacts with the environment, that is performs an action for each agent and observes each agent's reward and next observation. The dataset stores all of the information generated by the executor. All data transfer and storage is handled by Reverb. The trainer is a collection of singleagent learners, responsible for sampling data from the dataset and updating the parameters for every agent in the system (see illustrations).

The system executor may be distributed across multiple processes, each with a copy of the environment. Each process collects and stores data that the trainer uses to update the parameters of the actor-networks used within each executor. How we distribute processes is defined by constructing a multi-node graph program using Launchpad. Consequently, Mava can run systems at various levels of scale without changing the underlying system code.

On the shoulders of giants

InstaDeep acknowledges that Mava is indebted to several open-source libraries. In particular, Mava is built on top of DeepMind’s Acme framework and was heavily inspired by its design. It integrates with, and greatly benefits from, a wide range of already existing singleagent RL components made available in Acme. Furthermore, we inherit the same RL ecosystem built around Acme. Most notably, we use Reverb for data flow management and support simple scaling using Launchpad. Mava has also been influenced by, and made use of, other libraries including PyMARL and OpenSpiel as well as environment-specific libraries such as PettingZoo, Flatland, RoboCup, and the Starcraft Multi-Agent Challenge (SMAC).

From research to development

InstaDeep’s engineers tackle some of the toughest real-world problems, not only at a macro level, such as scheduling thousands of trains across a vast network but also at a micro level, such as routing electronic circuit boards in hours, instead of days or months. The collaboration between InstaDeep research teams and engineers has proven to be a key ingredient in these successes. The Mava framework goes one step further by offering a frictionless transition from InstaDeep’s in-house research to product development, creating synergies between its teams. The flexibility of the framework and its capacity to seamlessly scale is a critical ingredient for our research and engineering teams to deliver new products, services, and research breakthroughs that were previously out of reach.

InstaDeep is excited about the future of Mava, its growth, and its ongoing development. This release is only the beginning. Not only for making its research in MARL more efficient and scalable and sharing our efforts with the community but also for using Mava directly in applied InstaDeep projects.

“To me personally, Mava represents a step in an exciting direction in unison with many others on the continent who are seeking to shape AI’s future,” InstaDeep’s Pretorius said. “This is only the beginning and I’m excited about the possibilities ahead.”

You can find more information, source code, and examples in Mava’s Github repository

This article is from: