7 minute read

SADiLaR - ENSURING A TRANSFORMED PARTICIPATIVE DIGITAL FUTURE FOR OUR OFFICIAL LANGUAGES

Prof Langa Khumalo, Chief Director, SADiLaR

The South African Centre for Digital Language Resources (SADiLaR), led by Chief Director Professor Langa Khumalo, is a national research infrastructure (RI) established in 2016 by the Department of Science, Technology, and Innovation. Fully operational since 2019 and hosted by North-West University, it serves as a hub for a network of research nodes across institutions. SADiLaR plays a pivotal role in supporting research and development in language technologies and humanities studies, focusing on all official South African languages.

SADiLaR facilitates the creation, management, and distribution of digital language resources, including freely available research software. Its stakeholders range from academic scholars and language professionals to businesses and industries benefiting from language technology advancements. The centre aligns with the constitutional imperatives and the Use of Official Languages Act to promote equal language development.

As a human-centred research infrastructure, SADiLaR contributes to a digitally transformed future by unlocking knowledge in all official South African languages. It also support implementing the new Language Policy Framework for Public Higher Education Institutions. Through its work, SADiLaR ensures that South African languages gain relevance and accessibility in digital spaces, fostering inclusivity in academic and technological advancements.

SADiLaR ensures that South African languages gain relevance and accessibility in digital spaces, fostering inclusivity in academic and technological advancements.

SADiLaR’s Three (3) Core Programmes:

1. Digitization Programme – Develops digital text, speech, and multimodal resources for all official languages, alongside natural language processing tools for research and development. This ensures that indigenous languages are not only preserved but also modernized for technological integration, creating opportunities for linguistic diversity in digital platforms.

2. Digital Humanities Programme – Enhances research capacity in humanities and social sciences through digital data and innovative methodologies. Researchers are encouraged to adopt new tools and frameworks that make language analysis and preservation more efficient and adaptable.

3. Higher Education Sector Support Programme – Provides targeted support to universities by facilitating resources that advance multilingualism and digital access to language content. By fostering collaboration between academia

and technology, SADiLaR helps bridge the gap between traditional language studies and modern digital tools.

Language Resources and Technologies

SADiLaR curates, develops, and distributes language resources such as:

• Electronic text and speech data (word lists, dictionaries, translation memories, multilingual corpora)

• Multimodal resources to support language diversity in the digital realm

• Tools and platforms for processing and developing new language technologies, allowing researchers and educators to integrate linguistic advancements into various applications

SADiLaR also hosts a repository of freely accessible tools, including text analysis platforms, corpus portals, machine translation services, computational morphology demonstrators, and spelling checkers for South African languages. These technologies support researchers, language practitioners, and institutions in their efforts to digitize and develop indigenous languages for contemporary digital environments.

PRESERVING LANGUAGES

Open, Free, and Accessible Knowledge for All

The SADiLaR-Wikipedia-PanSALB (SWiP) project is a collaboration between SADiLaR, Wikimedia South Africa (Wikimedia ZA), and the Pan South African Language Board (PanSALB). This initiative enhances the presence of South African languages on Wikipedia, equipping language communities with skills to create, edit, and manage digital content. Practical workshops across universities have significantly increased digital authorship in underrepresented languages.

Since its September 2023 launch, SWiP has notably advanced isiNdebele’s presence on Wikipedia. Previously confined to the Wikipedia Incubator due to a lack of contributors, isiNdebele is now officially on Wikipedia’s main platform. Within a year, the number of isiNdebele editors grew from five to 30, leading to over 140 articles in the language. This initiative is part of a broader strategy to expand language access across digital spaces, ensuring that all South African languages are well-represented in global online knowledge repositories.

By strengthening online representation, SWiP enables linguistic communities to take ownership of their digital heritage. It also provides an opportunity for students and researchers to develop new linguistic data, contributing to academic scholarship while enhancing open-access knowledge.

Phases of the SWiP Project

Phase One (Sep 2023 - Oct 2024): Conducted workshops across six regions and eleven universities, training participants in Wikipedia authorship. A writing competition from July to August 2024 further incentivized contributions. This phase helped create momentum for the longterm sustainability of South African language representation on Wikipedia.

Phase Two (Nov 2024 - Oct 2027): Expands on previous efforts by introducing Train-The-Trainer sessions, targeted edit-a-thons, and tailored language support. The goal is to create a sustainable network of editors dedicated to developing official South African languages on Wikipedia.

Train-The-Trainer sessions will empower participants to conduct Wikipedia authorship workshops independently, ensuring long-term sustainability. Regular edit-a-thons will further engage contributors in refining and expanding content. This phase aims to develop a culture of digital language advocacy where indigenous languages are maintained and developed by their respective linguistic communities.

Additionally, SWiP aims to introduce collaborative partnerships with educational institutions, encouraging digital authorship in curriculum frameworks. This strategic approach not only supports language preservation but also promotes digital literacy and research.

Significant Impact on Official Languages Representation on Wikipedia

The SWiP project has trained 318 participants, resulting in 638 new articles and 2,730 edited articles across various languages. Over 291,000 words, 1,830 references, and 122 images have been added to Wikipedia Commons, leading to approximately 22.9 million article views within a year.

These figures demonstrate the project's far-reaching impact in promoting and preserving South African languages in digital spaces. By developing content in languages historically underrepresented in digital environments, SWiP ensures that these languages are not only documented but are also actively used in knowledge-sharing platforms worldwide.

A Collaborative Effort in Digital Language Preservation

Wikimedia ZA President Bobby Shabangu highlighted how SWiP enhances the digital presence and credibility of South African languages.

PanSALB CEO praised the initiative, emphasizing the importance of language development through active usage. PanSALB remains committed to supporting the growth of indigenous languages, aligning with national goals for multilingualism and digital accessibility.

Language digitization efforts like SWiP demonstrate how collaborative initiatives can bridge the digital divide and bring historically marginalized languages into mainstream digital spaces. Digital transformation in language preservation is not just about archiving languages but actively making them functional and widely available for everyday use.

SWiP represents a landmark collaboration in digital language preservation, fostering language nativism and preventing digital language extinction. By securing the digital future of isiNdebele and other official languages, it ensures continued visibility and accessibility on global digital platforms.

In addition to Wikipedia representation, SADiLaR is continuously working on expanding digital language tools that enhance the accessibility and usability of South African languages in various contexts. By integrating digital resources, artificial intelligence applications, and linguistic research, SADiLaR is paving the way for a multilingual digital future that supports inclusivity and accessibility.

By developing content in languages historically underrepresented in digital environments, SWiP ensures that these languages are not only documented but are also actively used in knowledge-sharing platforms worldwide.

Looking Ahead: The Future of Digital Language Research

SADiLaR remains committed to fostering digital participation in all South African languages. Through continued partnerships, technological innovation, and community-driven engagement, the organisation envisions a future where South African languages are fully integrated into the digital landscape. This commitment ensures that all official languages contribute meaningfully to education, business, and global knowledgesharing platforms.

Language is an essential part of cultural identity, and through SADiLaR's work, these identities are preserved, adapted, and evolved within modern digital spaces. The success of the SWiP project is a testament to the power of collaboration in language preservation and the growing importance of digital literacy in contemporary society.

For more details vist SADiLaR or contact info@sadilar.org.

Learn more about SWiP here: www.sadilar.org/en/swip/

Contact details for more information:

South African Centre for Digital Language Resources (SADiLaR) North-West University South Africa

Juan Steyn

Tel: +27 18 285 2750

Email: info@sadilar.org

Physical address:

Building A7, North-West University, Potchefstroom Campus, Potchefstroom, South Africa

Postal address:

SADiLaR, Internal Box 340 Private bag X6001 Potchefstroom, South Africa, 2520

This article is from: