While our long-term vision is synthetic intelligence, we are currently working on critical infrastructure for African languages. We believe these technologies must be built with communities, not extracted from them.
We are collaborating with partners across Rwanda, South Africa, and Eswatini to build a large-scale, open speech dataset for siSwatiβa language spoken by millions but dramatically underrepresented in AI systems.
siSwati (also known as Swazi) is a Bantu language of the Nguni group, spoken primarily in Eswatini and South Africa. Despite being a national language with millions of speakers, it remains severely underrepresented in speech recognition technology.
"We don't just collect dataβwe build capacity. Local researchers and communities are partners in every stage of our work."
All speech datasets released under permissive licenses. Free to use for research, education, and commercial applications.
Training scripts, evaluation frameworks, and preprocessing tools available on GitHub. Fully documented and reproducible.
Standardized evaluation metrics for African language ASR. Enabling fair comparison and progress tracking.
"All datasets, code, and benchmarks we create will be released openly under permissive licenses. We believe African language technologies must be built with communities, not extracted from them."
Standardized automatic speech recognition benchmarks for African languages. Enabling researchers worldwide to measure progress and compare models.
Coming 2026Comprehensive evaluation tools that account for linguistic diversity, dialectal variation, and real-world usage patterns.
Coming 2026Fine-tuned ASR models for under-resourced African languages, built on open foundations and freely available.
Coming 2027Expanding beyond siSwati to other under-resourced African languages, prioritized by community needs and partnership opportunities.
OngoingAcademic Partner
Rwanda NLP Initiative
South Africa Partners
Interested in partnering with us? Get in touch β
Follow our journey as we build open infrastructure for African languages. We share updates on datasets, benchmarks, and research findings.