17 minute read
AI FOR SCIENCE IN AFRICA
/ By Gregg Barrett /
Introduction
There is much fundamental and applied scientific research being carried out at academic and other research institutions in Africa, but little as yet utilises AI. African science needs to take up AI methods, because absent such methods an increasing number of scientific disciplines at African institutions will border on irrelevance. A greater use of AI in scientific research in Africa will bring numerous benefits, deepening African science, broadening global research agendas, and incentivising the location of corporate R&D labs. Ultimately, the use of AI in science will have spillover effects, helping upgrade the capabilities civil society more broadly.
Cirrus and the AI Africa Consortium are a major response to the AI deficit in African science. The aim is to broaden researcher access to compute, data, and engineering, training students, ensuring that AI for science is feasible in numerous academic institutions across the African continent - not just elite academic institutions and large technology firms - and facilitating the commercialisation of research findings. With human capital central to AI, on-line learning and related pedagogies can play an important role in knowledge transfer to Africa.
Prioritising AI for science in Africa
AI enabled scientific research is not yet happening in Africa. Most of the leading corporate research operations are not located in Africa, but operate in Asia, Europe and North America. This is an important barrier to collaborative research and commercialisation efforts at African institutions. The five major technology platform companies spent US $127 billion in R&D in 2021 [Bajpai, 2021]. Data from the QS World University Rankings, since 2012, show that Fortune 500 companies collaborated six times more with the top 50 universities than with those ranked between places 301 and 500, where most African universities are situated [Ahmed, et al. 2020]. This imbalance in collaboration exacerbates the disparities between academic institutions in Africa and top tier academic institutions in the rest of the world. Furthermore, Fortune 500 technology companies and the top 50 universities publish five times as many papers annually per AI conference than universities ranked between the 200th and 500th places. Africa’s highest ranked university in 2021 was the University of Cape Town, in the 220th place.i And while the research budgets of premier academic research institutions like Carnegie Mellon University’s Robotics Institute - at US $90 million in 2019 [Carnegie Mellon University, 2019] - is a fraction of that of the major industrial companies, it is orders of magnitude greater than that of any academic institution in Africa.
While world class research does take place at African institutions, African researchers lack the data, compute infrastructure, and engineering resources to develop and apply the more powerful and critical AI methods. Even for the world’s elite academic institutions and researchers, it is increasingly difficult to work at the frontier of AI research [Sample, 2017]. For example, OpenAI analysed the relationship between the availability of computational resources and 15 relatively well-known breakthroughs in AI between 2012 and 2018 [Amodei, et al. 2019]. Of the 15 developments examined, 11 were achieved by private companies while only four came from academic institutions. When that paper was published, the most recent of the major compute-intensive breakthroughs originating from academia was Oxford’s 2014 release of its VGG image recognition system. According to OpenAI, of the eight breakthroughs that occurred between 2015 and 2018, all came out of private companies.
An important step in developing AI in Africa is to prioritise the capabilities needed by scientists and researchers in academic and research institutions and industry. Governments in Africa already struggle to provide a range of basic infrastructure and other public services. Any discussion of AI must be had in a context of these broader priorities. In this broader context, there is also an urgent need to recognise the AI needs of civil society - given its role as promoter of good governance [Mlambo, et al., 2019] - and related efforts on social justice and human rights. Africa lacks the calibre of governance institutions found in more developed regions of the world, making the role of civil society even more critical [World Bank, 2020]. Strengthening the African AI workforce and infrastructure requires integrating the needs of civil society organisations with academia.
In terms of training and human capital development, universities are fortunate in that the field of AI involves many feasible options to rapidly upskill researchers. This is resulting in a paradigm shift for many in academia who are accustomed to building courseware (an example of what it takes to build and maintain high quality courseware for machine learning is given in Reddi (2021)). Forward thinking universities have steadily been moving towards what is called a ‘flipped classroom’. A flipped classroom format means that learners watch videos and complete in-depth assignments and online quizzes at home, then come to class for discussion sessions. The classes generally culminate in an open-ended final project, which the teaching team usually assists with. The university often uses previously developed high-quality Massive Open Online Courses as the core course material, and then focus on supplementary domainspecific materials, projects and assignments. With this approach, students in developing countries can access courseware used at elite universities, at a fraction of the cost to the student and the university than the previous alternatives.
To deploy AI for science requires a range of new capabilities and leadership
New capabilities and leadership are needed if African research institutions are to harness new AI methods. Such capabilities do not exist in the absence of engineering personnel to prepare data, configure hardware, software and machine learning algorithms, as is the case in most of Africa. In addition, the ad hoc mix of campus computers, and commercial clouds that Africa’s educators and researchers rely on today are inadequate.ii
Simply providing underserved academic and research organisations’ with the data, hardware, software and engineering resources is insufficient. To truly reduce the barriers to AI-enhanced research, underserved institutions need access to experts able to implement best practices in such areas as approaches to problems, methods of teaming, selection of tools for tasks, and optimisation of workflows.
An example is the development of AIready datasets. Although data in science is abundant, in many scientific areas sufficiently large datasets either do not exist and/or are not accessible in forms that permit the application of AI methods. Substantial effort is required to create new datasets, for instance in locating the data, cleaning it, aligning the schemas of disparate data, ensuring machine readability, and providing relevant metadata pertaining to issues such as data provenance, quality, and completeness. This expensive and error prone process, which must be repeated for each analysis, not only becomes a barrier to using data, it also leads to problems of research reproducibility. Furthermore, privacy and security issues need to be addressed from the beginning, rather than after the fact, with integrated assurances and audit capabilities to advance research in the public interest.
Most tool development for datasets happens without public or an interexperiment collaboration in mind, which can lead to duplication of effort. Providing a data management platform to enable efficient AI development and sharing is a priority for Cirrus. Such a platform will enable users to store, manage, share and find data used to develop AI systems. This includes tracking of the data, versioning support for various data formats, and complete metadata, to allow for retraining and understanding models built from the data. Such a platform will drive advances in AI by enabling AI researchers to experiment with existing and new methods in new contexts, and benefit the disciplines in which the datasets are created.
For African academic and research institutions, moving forward on AI also requires efforts to significantly increase the scientific throughput that feeds AI systems. Governments, academic and research institutions in the region must redouble their efforts to generate more and betterquality data, and to make data accessible. The implementation of Findable, Accessible, Interoperable, and Reusable data principles, and the participation in a centralised set of standards for benchmark datasets in scientific domains are needed to govern data storage formats, access and metadata to reduce engineering overhead and lower the barriers to training and comparing model performance. iii A high priority must be to identify and leverage existing and potential scientific data-generating programmes to produce AI-ready data repositories. The liberating of data in a privacy-preserving manner must extend across science, from earth observation to healthcare. Doing so will not only support science but will also aid in the use of AI for a more diverse set of pressing social problems.
Also needed are research infrastructures and resources that can provide continuous data collection where AI is eventually integrated to allow for active learning, where the AI system itself decides what data to collect next, and supports dynamic control and decision making (an example is the case of automated experiments..
Cirrus and the AI Africa Consortium
Cirrus was initiated in 2017 out of a need to use AI in scientific research taking place between the synchrotrons and Wits University. The Wits University leadership then decided that Cirrus should benefit not only Wits, but all academic and research institutions in Africa.
Over the course of 5 years the legal groundwork has been laid to operationalise Cirrus and the AI Africa Consortium. Cirrus and the AI Africa Consortium are ambitious by African standards. The resources involved, plans and modus operandi of Cirrus and the AI Africa Consortium are described in the rest of this section. Some activities have already begun including the rollout of machine learning for embedded devices. Full implementation of all the components of Cirrus and the AI Africa Consortium will commence following the placement of the Strategic Founding Partners (SFP’s).
Cirrus
Cirrus is designed to provide data, dedicated compute infrastructure, and engineering resources at no cost to academic and research institutions, through the AI Africa Consortium.
Providing dedicated compute infrastructure will be an enormous contribution. Based solely on hardware costs, it is more cost-effective to own infrastructure when computing demand is close to continuous. Estimates comparing commercial cloud services to a dedicated high-performance computing cluster show that commercial cloud services are more expensive per compute cycle [Villa et al. 2020]. While the initial costs of subsidizing cloud credits might be less than building public infrastructure, studies show that relying on commercial cloud services would likely be much more expensive in the long term [Wang et al. 2021].
Cirrus will help to attract corporate research (and associated venture capital activity) to Africa. For the private sector, participation will provide:
Returns as Limited Partners in the Cirrus FOUNDRY Fund.
Returns from equity ownership in startups within the Cirrus FOUNDRY
Admission to the Cirrus Partner Programme, which is geared to support collaborative R&D as well as technology transfer and sharing.
The goal is that Cirrus will ultimately be owned, through equity, by around 15 to 25 multinational corporations, termed Strategic Founding Partners. The financial contribution from these corporations will be in the region of 7 to 20 million dollars each. The diversity in ownership should bring with it a diversity of research interests and help avoid AI research focused only on a narrow set of ideas and methods biased to the interests of a particular private sector participant. The research mission of Cirrus is also isolated from political influence, from changes in political administrations, and from politically appointed administrators. Allocation of Cirrus resources to the AI Africa Consortium will occur through a mix of peerreview, lottery and equitable distribution criteria. As a private-sector entity, Cirrus is also not encumbered by the intellectual property constraints that ensnare research commercialisation efforts at publicly funded universities.iv This provides Cirrus with the flexibility to support a range of commercialisation options.
Cirrus has three components. First, Cirrus will house the cooperation programmes, the state-of-the-art computing and data infrastructure, engineering personnel and the open learning programmes. Second, the Cirrus FOUNDRY is equipped with everything needed to address the challenge of turning insights from scientific research into start-ups and eventually larger commercial applications. Third, the Cirrus FOUNDRY Fund is the in-house fund to support start-ups in the Cirrus FOUNDRY to reduce dependence on outside capital. The Cirrus FOUNDRY Fund has a target capitalization of 35 million dollars and will undertake preseed and seed stage investments.
The physical infrastructure and operations for Cirrus are to be housed at Wits University in Johannesburg, South Africa. Wits University was selected as the host institution for four primary reasons:
1. South Africa is the most scientifically advanced country on the African continent [Mouton, et al. 2019].
2. Wits is one of Africa’s leading academic research institutions.v
3. Wits is situated geographically in the highest concentration of economic, academic and research activity in Africa.
4. Wits has the land available to house the necessary infrastructure, including for necessary energy generation and storage (construction of the physical infrastructure, like the energy generation, and high-performance computing centre is yet to start).
The AI Africa Consortium
The Africa AI Consortium is currently establishing collaboration agreements with numerous parties in the African R&D ecosystem to help identify research priorities, to spread AI research resources, and to engage African research talent (for an overview of why consortia can catalyse open science see Cutcher-Gershenfeld (2017)). The Consortium aims to create a leading AI research capability by developing skills and recruiting researchers across Africa and other skilled personnel, and pairing these with the capabilities provided through Cirrus.
The Consortium will:
Help and encourage researchers to interact and collaborate beyond disciplinary or institutional silos.
Reduce redundancy of effort and cost as new research projects will not have to build capabilities or collect new data from scratch each time.
Accelerate discovery and improve reproducibility through sharing of data sets, metadata, models, software, hardware and other resources.
Reduce the cost for individual research programmes involved in integrating capabilities and/or comparing their work with others’.
Foster a co-design culture where teams of scientific users, engineers, and instrument providers can help to develop new and broadly applicable capabilities and tools.
Supporting a research ecosystem that understands the full context for AI solutions.
The organisational structure of the Consortium is set out in figure 1.
At the time of writing the outstanding component of the schema is the appointment of the lead investment bank for the solicitation of the SFP’s. Following the placement of the SFP’s the Partner, Affiliate and Co-development programmes will be rolled out.
Efforts already underway within the AI Africa Consortium include:
TinyML4D: The rollout machine learning on embedded devices targeted at developing countries. It includes the provision of free hardware kits, workshops, courseware and a network of research and collaboration opportunities. vii
MLCommons: Fostering African participation in the development of science benchmarks, particularly those of relevance to African researchers.viii
Remote Excellence Fellowships: A remote internship system for talented Masters students to allow students to connect with leading researchers in Europe.
Conclusion
Fundamental and applied research and development at academic and research institutions in Africa are at risk of marginalisation because resources essential to AI - compute, hardware, software, accessible data, and machine learning engineering - are out of reach. The growing imbalance in AI resources and innovation between Africa and the rest of the world requires an unprecedented response. The establishment of Cirrus and the AI Africa Consortium is one of Africa’s responses, aiming to help spread opportunity more widely, supporting students and researchers at universities and research institutions across Africa, activating the talent of researchers once they have access to AI infrastructure and other resources, and creating fertile ground for commercialisation through entrepreneurship.
For science in Africa, Cirrus and the AI Africa Consortium afford a major opportunity to develop and exploit AI techniques and methods. This will improve not only the efficacy and efficiency of science but also the operation and optimisation of scientific infrastructure (because system scale and complexity demand AI-assisted design, operation and optimisation).
The strengthening of science in Africa by AI methods will broaden global research agendas and elevate African research. To accomplish this Africa must also act collectively and collaborate to grow the scientific output needed to exploit the opportunities presented by AI. The goals described in this essay are challenging and the proposed solutions will require significant investment. However, the potential return on that investment is enormous: new types of data analysis, improved and even autonomous operations and performance of scientific instruments, innovative commercial products emerging from science, with even the potential for new industries, and an opportunity for Africa to become a producer of AI for science and not merely a consumer of the resulting breakthroughs.
References
Ahmed, N. and M.Wahed (2020), “The De-Democratization of AI: Deep Learning and the Compute Divide in Artificial Intelligence Research”. Retrieved from https://arxiv.org/pdf/2010.15581.pdf.
Amodei, D and D.Hernandez (2019), “AI and compute”, OpenAI Blog, May 16. Retrieved from https://openai.com/ blog/ai-and-compute Bajpai, P. (2021). Which Companies Spend the Most in Research and Development (R&D)?, Nasdaq Website, June 21. Retrieved from https://www.nasdaq.com/articles/whichcompanies-spend-the-most-in-researchand-development-rd-2021-06-21 Carnegie Mellon University. (2019). Hebert Named Dean of Carnegie Mellon's Top-Ranked School of Computer Science, Carnegie Mellon University Website, August 8. Retrieved from https://csd.cmu. edu/news/hebert-named-dean-carnegiemellons-top-ranked-school-computerscience
Cirrus AI. (2022). Schematic view of the organisational layout of Cirrus and the AI Africa Consortium.
Cutcher-Gershenfeld, J. et al (2017), “Five ways consortia can catalyse open science”, Nature 543, 615–617. Retrieved from https://doi.org/10.1038/543615a
Mlambo, V. et al. (2019). Promoting good governance in Africa: The role of the civil society as a watchdog, Journal of Public Affairs, March 16. Retrieved from https:// doi.org/10.1002/pa.1989
Mouton, J et al. (2019). The state of the South African research enterprise. Retrieved from http://www0.sun.ac.za/ crest/wp-content/uploads/2019/08/ state-of-the-South-African-researchenterprise.pdf
OECD (2021), “Recommendation of the OECD Council concerning Access to Research Data from Public Funding”, OECD Website. Retrieved from https:// www.oecd.org/sti/recommendationaccess-to-research-data-from-publicfunding.ht
QS World University Rankings. (2021). QS World University Rankings Website. Retrieved from https://www. topuniversities.com/university-rankings/ world-university-rankings/2021
Reddi, J.V. et al. (2021), “Widening Access to Applied Machine Learning with TinyML”. Retrieved from https://arxiv.org/ pdf/2106.04008.pdf
Rubiera, C. (2021). AlphaFold 2 is here: what’s behind the structure prediction miracle, Oxford Protein Informatics Group Blog, July 19. Retrieved from https://www. blopig.com/blog/2021/07/alphafold- 2-is-here-whats-behind-the-structureprediction-miracle/November 1. Retrieved from https:// www.theguardian.com/science/2017/ nov/01/cant-compete-universitieslosing-best-ai-scientists
South African Government. (2010). Intellectual Property Rights from Publicly Financed Research and Development Act: Regulations. Retrieved from https://www.gov.za/ documents/intellectual-propertyrights-publicly-financed-research-anddevelopment-act-regulations-1
The Times. (2022). The Times Higher Education Emerging Economies University Rankings Website. Retrieved from https://www. timeshighereducation.com/worlduniversity-rankings/2022/emergingeconomies-university-rankings
World Bank. (2020). Worldwide Governance Indicators Website. Retrieved from https://info.worldbank. org/governance/wgi/
Villa, J and D. Troiano (2020), “Choosing Your Deep Learning Infrastructure; The Cloud vs. On-Prem Debate”, Determined AI Blog, July 30. Retrieved from https://determined.ai/ blog/cloud-v-onprem/
Wang, S and M. Casado (2021), “The Cost of Cloud, a Trillion Dollar Paradox”, Andreessen Horowitz Website, May 27. Retrieved from https://a16z.com/2021/05/27/costof-cloud-paradox-market-cap-cloudlifecycle-scale-growth-repatriationoptimization/
Endnotes
i For the full list of rankings, see: QS World University Rankings. (2021).
ii For commentary on the engineering skills that went into developing AlphaFold 2, see: Rubiera. (2021).
iii For recommendations concerning access to research data from public funding, see: OECD. (2021).
iv For the regulations governing intellectual property rights from publicly financed research in South Africa, see: South African Government. (2010).
v See The Times Higher Education Emerging Economies University Rankings. (2022).
vi Schematic view of the organisational layout of Cirrus and the AI Africa Consortium, see https://aiafrica.ac.za/
vii For information on TinyML4D, see http://tinyml.seas.harvard.edu/4D/
viii For information on the MLCommons Science Working Group, see https:// mlcommons.org/en/groups/researchscience/