NEWS
INSTADEEP,
iCompass partner on TunBERT - First AI-based Tunisian Dialect System InstaDeep, an AI startup founded by Tunisian co-founders Karim Beguir and Zohra Slim, and Tunis-based startup iCompass in March jointly revealed a collaborative Natural Language Processing (NLP) project that will lead to the development of a language model for Tunisian dialect, TunBERT. The project will evaluate TunBERT on several tasks such as sentiment analysis, dialect classification, reading, comprehension, and question answering. The partnership aims to apply the latest advances in AI and Machine Learning (ML) to explore and strengthen research in the fast-emerging Tunisian AI tech ecosystem. “We’re excited to reveal TunBERT, a joint research project between iCompass and InstaDeep that redefines state-of-the-art for the Tunisian dialect. This work also highlights the positive results that are achieved when leading AI startups collaborate, benefiting the
Tunisian tech ecosystem as a whole,” said InstaDeep CEO Karim Beguir. Bidirectional Encoder Representations from Transformers (BERT) has become a state-of-the-art model for language understanding. With its success, available models have been trained on Indo-European languages such as English, French, German etc., but similar research for underrepresented languages remains sparse and in its early stage. Along with jointly writing and de-bugging the code, iCompass and InstaDeep’s research engineers have launched multiple successful experiments. iCompass CTO & co-founder Dr Hatem Haddad explained that the collaboration aims to push forward and advance the development of AI research in the emerging and prominent field of NLP and language models. “Our ultimate goal is to empower Tunisian talent and foster an environment where AI innovation can grow, and together our teams are pushing boundaries” said Dr Haddad.
TunBERT is developed based on NVIDIA’s NeMo toolkit which the research team used to adapt and fine-tune the neural network on relevant data to pre-train the language model on a large-scale Tunisian corpus, taking advantage of the BERT model that was optimised by NVIDIA. TunBERT’s pre-training and fine-tuning steps converged faster and in a distributed and optimised way thanks to the use of multiple NVIDIA V100 GPUs. This implementation provided more efficient training using Tensor Core mixed precision capabilities and the NeMo Toolkit. Through this approach, the contextualised text representation models learned an effective embedding of the natural language, making it machine-understandable and achieving tremendous performance results. Comparing the NVIDIA-optimised BERT model results to the original BERT implementation shows that the NVIDIA-optimised BERT-model performs better on the different downstream tasks, while using the same compute power.
INNOVATION
FIRST FON TO FRENCH
Neural Machine Translation Engine launched edAI researchers Chris Emezue and Bonaventure Dossou have launched FFRTranslate, the first Neural Machine Translation engine from Fon — a very low-resource and tonal language — to French and vice-versa. Fon shares tonal and analytical similarities with the Niger-Congo languages which include Igbo, Hausa, Yoruba, and Swahili. The engine will promote better communication in Fon and could enable companies to translate texts and messages from Fon to French and vice versa. Dossou described working on the engine as being “an awesome ride”. “We’re thankful to everybody that supported us especially our beloved Masakhane NLP family, and the broader
14
SYNAPSE | 2ND QUARTER 2021
NLP community. We believe that this is a huge step toward empowering African endangered languages,” he said in a LinkedIn post announcing the launch. Other contributors who worked on the project include Fabroni Yoclounon and Ricardo Ahounvlame.Read more about the translation engine in this Synapse Issue 8 article here.
Left to right: Chris Emezue, Bonaventure Dossou