Alibaba Cloud has released open-source AI models and chatbots with similar capabilities to OpenAI’s ‘GPT-4’ and Google’s ‘Bard’.
This is the first Chinese model that can provide relevant answers when given an image input. Baidu’s ‘Ernie Bot’ is also known to have multimodal capabilities, but its official release has been delayed due to government approval issues.
The AI models ‘Qwen-VL’ and ‘Qwen-VL-Chat’ can understand images and engage in complex conversations. They’re fine-tuned from Alibaba’s large language model released in April, the ‘Tongyi Qianwen,’ which has 7 billion parameters.
In particular, the Qwen-VL-Chat is capable of comparing multiple image inputs, performing tasks such as creating stories, generating images, and solving mathematical equations depicted in the photos based on the input pictures.
For instance, imagine a foreign tourist who doesn’t speak Chinese visiting a hospital for treatment and taking a photo of the floor directory. They can then ask Qwen-VL-Chat something like, “Which floor is the orthopedics department on?” Based on the image information, Qwen-VL-Chat can provide a textual response.
Alibaba Cloud’s two AI models are currently being distributed for free from the company’s repository called ‘ModelScope’, and they can be used for commercial purposes. It’s speculated the move is intended to expand the company’s user base in light of the initial public offering of the cloud division. This comes as the Chinese tech giant is gearing up for its major reshuffle slated for September.