“Sorry, I didn’t catch that.”
For millions of Africans, this robotic apology from Siri or Alexa isn’t just a minor annoyance.
When a common phrase like “No worry, e go better” gets transcribed as “No war eagle butter,” or the name “Chukwuebuka” becomes “Check wheelchair baker,” the promise of voice technology; the hands-free shortcut that makes life easier in the rest of the world, remains a frustrating mirage on the continent.
This week, Nigerian AI startup Intron released its latest salvo aimed at fixing that gap. With the launch of Sahara v2, the company claims it isn’t just catching up to Silicon Valley, but leapfrogging it, at least when it comes to understanding how Africa actually speaks.
But in a market suddenly crowded with identical solutions from Google to Toronto-based Cohere, the question is no longer just who has the best algorithm, but who will win the race to build the underlying infrastructure for Africa’s eventual billion voice users.
Intron’s new model is a significant technical feat. Trained on over 50,000 hours of audio from 40,000 speakers across 30 countries, Sahara v2 now supports 57 languages, including 24 new additions like Hausa, Swahili, Yoruba, and Zulu.

Unlike global models trained on pristine studio audio, Intron built its dataset in the wild, capturing the chaos of busy Nigerian clinics, Kenyan call centres, and South African courtrooms where background noise and overlapping speech are the norm.
The results, per the company’s benchmarks, are striking. Intron claims Sahara v2 performs 68.6% better than leading models like GPT-4 and Gemini on transcribing African names, organisations, and locations. In noisy environments, it boasts a 36.5% improvement in “hallucination robustness”—tech speak for making things up when it can’t hear clearly.
***
Yet, the most telling feature is the debut of the world’s first bilingual Swahili-English ASR model, developed with Kenyan healthcare provider Penda Health. This model handles “code-switching”—the instinctive habit of bouncing between languages mid-sentence that defines everyday conversation across Africa’s urban centres. Global AI typically chokes on this; Intron is banking on it being its competitive moat.
“We built for the hardest environment first,” Tobi Olatunji, Intron’s CEO and a former physician, said during the launch, referencing the startup’s origin story in overstretched Nigerian hospitals.
But Intron’s timing is precarious. The window for being the only player focused on African linguistics is closing fast. Just weeks before Intron’s announcement, Toronto-based Cohere launched “Tiny Aya,” a suite of multilingual models supporting over 70 languages, specifically designed to run on local devices in regions with spotty infrastructure.
Similarly, Microsoft Research introduced Paza, an initiative that includes a benchmark for low-resource African languages, while Google dropped WAXAL, an open speech dataset covering 21 Sub-Saharan languages.
This flurry of activity validates Intron’s thesis, but it also threatens to commoditise it. If Google and Microsoft are releasing open data and benchmarks, the barrier to entry for other startups lowers, and the pricing power for incumbents erodes.
Intron is trying to stay ahead by going deeper into the “plumbing.” Sahara v2 is being deployed to cut transcription times in Ogun State courts in Nigeria and reduce patient documentation errors at C-Care hospitals in Uganda. For enterprises like ARM Investments, the draw is the ability to accurately transcribe complex financial jargon and Nigerian currency amounts that foreign models mangle.
***
Perhaps most critically for a continent wary of data privacy, Sahara v2 now offers offline deployment via a partnership with Nvidia, allowing sovereign governments and sensitive industries to run the AI behind their own firewalls.
“We’ve seen significant improvement in transcription and summaries,” said Ayo Oluleye, Head of Data at ARM Investments, in a statement. Meanwhile, Audere’s CPO Sarah Morris noted the APIs achieved “99%+ success rates” on Southern African accents during testing.
Voice is widely seen as the next great interface for the internet, particularly in regions where literacy rates vary or typing in local languages is cumbersome. If AI cannot understand the user, the user remains locked out of the digital economy.
Intron is proving it can build a model that outperforms the giants on its home turf. But as the infrastructure for African language AI shifts from “if” to “how,” the real challenge will be whether a startup with a team of under 20 can outrun the data centres of Big Tech and the open-source armies of academia.