Alibabas Qwen team releases AI models that can control PCs and phones
techcrunch.com
Chinese AI lab DeepSeek might be getting the bulk of the tech industrys attention this week. But one of its top domestic rivals, Alibaba, isnt sitting idly by.Alibabas Qwen team on Monday released a new family of AI models, Qwen2.5-VL, that can perform a number of text and image analysis tasks. The models can parse files, understand videos, and count objects in images, as well as control a PC similar to the model powering OpenAIs recently launched Operator.Per the Qwen teams benchmarking, the best Qwen2.5-VL model beats OpenAIs GPT-4o, Anthropics Claude 3.5 Sonnet, and Googles Gemini 2.0 Flash on a range of video understanding, math, document analysis, and question-answering evaluations.Image Credits:AlibabaQwen2.5-VL, which is available to test in Alibabas Qwen Chat app and to download from AI dev platform Hugging Face, can analyze charts and graphics, extract data from scans of invoices and forms, and comprehend multiple-hours-long videos, the Qwen team says. Qwen2.5-VL can also recognize IPs from film and TV series, as well as a wide variety of products, per the team suggesting that the models mightve been trained in part on copyrighted works.Qwen2.5-VL, being AI developed by a Chinese company, has certain restrictions on the topics it will discuss at least in Qwen Chat. When I asked the largest and most capable Qwen2.5-VL model, Qwen2.5-VL-72B, to talk about Xi Jinpings mistakes, Qwen Chat threw an error message.Chinas internet regulatorbenchmarksmany models developed in the country to ensure their responses embody core socialist values. ManyChinese AI systemsdeclineto respond to topics that might raise the ire of regulators, such as Taiwans autonomy.One of Qwen2.5-VLs more interesting features is its ability to interact with software both on PCs and mobile devices. A video posted on X by Philipp Schmid, a technical lead at Hugging Face, Qwen2.5-VL launching the Booking.com app for Android and booking a flight from Chongqing to Beijing.In the video below, a Qwen2.5-VL model controls apps on a Linux desktop but doesnt seem to accomplish much beyond switching tabs. Perhaps tellingly, Qwens benchmarking shows Qwen2.5-VL scoring poorly on OSWorld, a benchmark that tries to mimic a real computer environment.The two smaller, less sophisticated models in the Qwen2.5-VL series, Qwen2.5-VL-3B and Qwen2.5-VL-7B, are available under a permissive license. The flagship Qwen2.5-VL-72B, however, is under Alibabas custom license, which requires that companies and devs with more than 100 million monthly active users request permission from Qwen/Alibaba before deploying the model commercially.
0 التعليقات ·0 المشاركات ·77 مشاهدة