India launches AI app to save its many tribal languages before they vanish
Sign up now: Get ST's newsletters delivered to your inbox
Sarna tribespeople with traditional weapons at a rally in Jharkhand.
PHOTO: JANAJATI SURAKSHA MANCH
Follow topic:
- India launched Adi Vaani, an AI translation platform for tribal languages like Gondi and Santali, to bridge the digital gap.
- The initiative aims to preserve endangered languages by improving access to online information and government services.
- Experts believe that while promising, Adi Vaani needs enhanced accuracy and more focus on local language education.
AI generated
NEW DELHI – For more than a decade, Mr Sugdu Potai taught Gondi at a government primary school in central India. But over the past year, the 50-year-old’s command of the Dravidian language, spoken by around three million people across various parts of India, has been put to the test.
Mr Potai has had to translate complex Hindi sentences and find Gondi equivalents for words such as “discipline” and “set-up”, which are now more commonly expressed by Gondi speakers in Hindi.
With no exhaustive dictionary on hand, he had to dig deep into his memory and even ask community elders for the Gondi equivalents of such words.
“It was challenging and very different from teaching children,” said Mr Potai.
His efforts are part of a ground-breaking government initiative to use artificial intelligence (AI) to protect and promote India’s tribal languages – many of which risk being wiped out by dominant languages – while also helping tribal language speakers better connect to the wider online world.
India is one of the world’s most linguistically diverse countries, with over 2,800 languages and dialects. Yet, it has lost about 250 languages since 1961.
Experts fear that as many as 400 – most of them spoken by marginalised and tribal groups – could vanish within several decades.
On Sept 1, the Ministry of Tribal Affairs launched the beta version of Adi Vaani, India’s first AI-powered translation platform for tribal languages, to secure a more diverse linguistic future for the country. It currently offers real-time text and speech translation between Hindi, English and four tribal languages: Santali, Bhili, Mundari and Gondi.
Even though these languages are spoken by millions across different parts of the country, including in central and eastern India, they remain grossly underrepresented online. A similar service for the Kui and Garo languages is also in the works.
Adi Vaani’s services are currently available on its website and on its app that can be downloaded on Google Play.
This real-time translation tool is expected to help tribal communities better access online knowledge in their own languages and become more equal participants in the country’s growing digital economy.
While Adi Vaani’s translations still need to be vastly improved, the platform has been welcomed as an important first step in revitalising the country’s marginalised languages for an increasingly digital world.
Adi Vaani’s creators hope to make it easier for tribal communities to access government services, including healthcare, in their own languages, and to support mother tongue-based learning in schools by helping to create bilingual primers and digital resources.
“Instead of forcing them to learn Hindi or English, we want to communicate with them in their own language,” said Professor Radhika Mamidi, a computational linguist from the International Institute of Information Technology (IIIT) Hyderabad, one of several partners that created Adi Vaani.
Another key goal is cultural preservation. An AI tool such as Adi Vaani can prove useful in transcribing as well as translating the rich repertoire of tribal folklore, much of which exists in oral form, ensuring it is documented for posterity.
Stemming language loss with AI
Language loss has been a growing concern in India. It has been caused by various factors, including a lack of government patronage, inadequate primary education in local languages, and the shift of speakers towards dominant tongues like Hindi for economic and other purposes.
With Adi Vaani, the hope is to help reverse this trend. For more than a year, four leading Indian technical education institutions, including IIIT Hyderabad, have worked with tribal research institutions and local language experts such as Mr Potai to create datasets to train the AI language models that underpin Adi Vaani.
IIIT Hyderabad, which developed the Santali feature of Adi Vaani, collaborated with Santali translators in the eastern Odisha state who translated an estimated 100,000 Hindi sentences into their language. Another 100,000 sentences in Santali were taken from freely available sources online.
Adi Vaani is India’s first AI-powered translation platform for tribal languages.
PHOTO: ADI VAANI
The institute, which has an extensive background in developing machine translation systems for Indian languages such as Hindi and Telugu, used this Santali corpus of around two million words to develop the initial language model of Adi Vaani’s Santali feature.
This was further refined through feedback from translators.
“It is like a baby learning a language,” Prof Mamidi told The Straits Times. “You have to build and then give more and more data to fine-tune it.”
Similar efforts have been undertaken overseas. In New Zealand, for example, charitable media organisation Te Hiku Media has developed AI tools that transcribe Te Reo Maori, the country’s indigenous language, give pronunciation feedback, and turn text into speech.
Unlike major Indian languages such as Hindi or Bengali that have been mastered relatively well by machine learning models, the tribal ones that Adi Vaani targets are “low-resource languages”.
These are languages with insufficient digital data, linguistic resources and computational tools – all essential for developing natural language processing applications and machine learning models.
As part of its effort to build a diverse and inclusive tech ecosystem that addresses India’s linguistic diversity, the government launched the Bhashini project in 2022 to develop datasets to train AI models for Indian languages, including low-resource ones.
Bhashini has so far built AI models and translation services in more than 22 languages.
Services such as ChatGPT claim to offer limited Santali and Mundari translation, but their results are riddled with inaccuracies.
Foundational models
Indian Institute of Technology Bombay’s Professor Pushpak Bhattacharyya, who contributed to the Bhashini project until his death on Oct 5, told ST in July that the cultural and linguistic diversity of India is not captured by the large language models.
“So we have to build our own foundation models or take the large language models and make them Indian culture-sensitive,” he said.
However, this is no easy task, especially for low-resource languages that often require translators and tech experts to create large usable datasets from scratch and keep fine-tuning them. Unsurprisingly, Adi Vaani remains a work in progress, with versions that deliver more accurate translations expected in the coming months.
Ms Gayatri Netam, joint director of Chhattisgarh’s Tribal Research and Training Institute, which helped develop the Gondi feature of Adi Vaani, said its accuracy in the language remains low. Wrong translations are produced as often as seven out of 10 times. “A lot of work has yet to be done,” she added.
Adi Vaani will also have to incorporate the linguistic diversity of the four tribal languages it currently covers.
For instance, Gondi and Santali are spoken in various states, where they have morphed into distinctly different speech forms under the influence of dominant local languages. But the platform is currently based on only one chosen variant of these languages.
Professor Udaya Narayana Singh, a linguist and former director of the government-run Central Institute of Indian Languages, noted that significant expert human intervention is needed to create usable datasets for smaller languages in India before AI can meaningfully help protect India’s marginalised languages.
The 2011 census records as many as 2,843 “rationalised mother tongues”, including hundreds of languages that are each spoken by just a few thousand people.
“A substantial number of India’s mother tongues are spoken by fewer than 5,000 individuals, and they lack prior linguistic documentation, recorded data or translated government materials,” noted Prof Singh.
He emphasised that this absence of resources poses an “almost insurmountable challenge” for developing AI-driven language technologies.
Adi Vaani currently offers real-time translation between Hindi, English and four tribal languages – Santali, Bhili, Mundari and Gondi.
PHOTO: ADI VAANI
Advancing computational linguistics must also go hand in hand with strengthening linguistic anthropology, he said, so that India can train specialists to document and analyse its marginalised languages through a fresh linguistic survey.
And it is not just AI that needs to be leveraged. Mr Hercules Singh Munda, who launched TriLingo, a language learning platform for tribal languages in 2021, told ST that even simpler technologies have yet to be fully utilised in efforts to protect tribal languages and cultures.
For instance, the multi-volume Encyclopaedia Mundarica – an exhaustive work that documents the Munda people’s language and culture – is accessible online only in a scanned version, not a digital user-friendly format.
“I cannot just do Ctrl + F and type ‘mango’ to find it, and that makes it unusable. There’s zero utility value,” said Mr Munda, a native Mundari speaker.
Mundari is an Austroasiatic language spoken by over 1.5 million Munda tribespeople across the eastern Indian states of Jharkhand, Odisha and West Bengal, as well as in parts of Bangladesh.
More importantly, protecting marginalised languages also requires the central as well as state governments to contribute greater resources to ensure these languages are taught in schools – something that has so far failed to gain countrywide traction.
Dr Ajit Munda, an assistant professor at the Ram Lakhan Singh Yadav College in Ranchi in Jharkhand state, welcomes efforts such as Adi Vaani, to which he contributed as a Mundari language expert.
But he said the government must also ensure Jharkhand’s tribal languages are taught in the state’s schools.
“If we are able to inculcate an interest in the importance of one’s mother tongue right from childhood, then they will surely not abandon it later,” added Dr Munda.

