Damo Academy, the research arm of Alibaba Group Holding, has introduced Southeast Asia LLM (SeaLLM), a specialized large language model (LLM) for Southeast Asian languages.
According to the company, SeaLLM underwent pre-training on datasets comprising Vietnamese, Indonesian, Thai, Malay, Khmer, Lao, Tagalog, and Burmese, surpassing other open-source models in linguistic and safety tasks.
The introduction of Damo’s new LLM underscores the ongoing commitment of Chinese companies to the generative AI trend initiated by OpenAI’s ChatGPT last year.
Damo claims that the SeaLLM outperforms other LLMs, including ChatGPT, in tasks involving non-Latin languages, demonstrating the ability to interpret and process non-Latin text up to nine times longer.
SeaLLM also exhibited superior results in translating between English and low-resource languages, such as Lao and Khmer, which have limited data for training conversational AI systems.
Bing Lidong, the director of the language technology lab at Damo, emphasized that SeaLLM can embrace the cultural richness of Southeast Asia, suggesting that innovation is poised to empower communities historically under-represented in the digital realm.