$70m S’pore AI initiative to develop first large language model with South-east Asian context

The large language model could eventually be used as the basis of various text-to-speech or text-to-image generative programmes. PHOTO: REUTERS

SINGAPORE – Singapore is pioneering the development of an artificial intelligence (AI) model trained to understand and generate human language that incorporates the diverse cultures and languages of South-east Asia.

This large language model (LLM) could eventually be used as the basis of various text-to-speech or text-to-image generative programmes.

It fills a current gap in the scene, as most LLMs originate from Western countries, whose cultures, values and norms differ from those in Singapore and the rest of the region.

Examples of well-known programmes based on LLMs include OpenAI’s ChatGPT, Microsoft’s Bing and Google’s Bard.

The Infocomm Media Development Authority (IMDA) is partnering AI Singapore and the Agency for Science, Technology and Research (A*Star) to launch the National Multimodel LLM Programme over the next two years.

This $70 million initiative, funded by the National Research Foundation, will contribute to Singapore’s capabilities in AI research and innovation, the three agencies said in a joint statement on Dec 4.

The programme is also in line with Singapore’s National AI Strategy 2.0, which was launched by Deputy Prime Minister Lawrence Wong at the Singapore Conference on AI held at The Ritz-Carlton, Millenia Singapore hotel on Dec 4.

The LLM project is meant to build skilled AI talent in Singapore by providing funding and access to high-end computing for local researchers and engineers, as well as foster a thriving AI industry to develop LLM-enabled solutions for greater productivity and create new opportunities for businesses.

It will also enable Singapore to build a trusted environment for the use of AI as it will provide a deeper understanding of how LLMs work, and will further the research in AI governance, said the agencies.

Dr Ong Chen Hui, assistant chief executive of IMDA’s Biztech group, said: “This national effort underscores Singapore’s commitment to become a global AI hub. Language is an essential enabler for collaboration. 

“By investing in talent and investing in large language AI models for regional languages, we want to foster industry collaboration across borders and drive the next wave of AI innovation in South-east Asia.”

The initiative will build on the early outcomes of AI Singapore’s South-east Asian Languages in One Network (Sea-Lion) model, which is an open-source LLM that is more representative of the region’s cultural contexts and linguistic nuances.

Sea-Lion was designed to be relatively smaller, faster and more flexible than the commonly used LLMs in the market today, and is a relatively inexpensive and efficient option compared with those currently available.

The new national initiative will further develop Sea-Lion to be 30 billion to 50 billion parameters in size, and extend it into a multimodal speech-text model.

More well-known LLMs are larger – for example, GPT-4, created by OpenAI, has about 1,700 billion parameters, and Llama-2, created by Meta, has 70 billion parameters. 

The Sea-Lion model was trained on 11 languages used in the region, including English, Chinese, Indonesian, Malay, Thai and Vietnamese.

The three agencies said the new LLM will enable users “to understand the context and values related to the diverse cultures and languages of South-east Asia, for example, managing context-switching between languages in multilingual Singapore”.

Join ST's WhatsApp Channel and get the latest news and must-reads.