Insider Brief
- Alibaba has introduced the Qwen-Robot Suite, a set of robotics foundation models designed to help robots navigate environments, manipulate objects and predict how the physical world will respond to their actions.
- The suite includes Qwen-RobotNav for navigation and mobility, Qwen-RobotManip for robotic interaction and object handling, and Qwen-RobotWorld, a world model designed to anticipate the outcomes of actions across tasks such as manipulation, navigation and driving.
- Alibaba said the models can operate independently or together as a robotics software stack, with demonstrations including autonomous navigation, object manipulation, long-horizon task execution and a Chat2Robot interface that allows users to issue natural-language commands to robots.
Alibaba has introduced a new suite of robotics foundation models designed to help robots navigate, manipulate objects and predict how the physical world will respond to their actions.
“The Qwen family of foundation models already gives strong perception and reasoning about the physical world,” the company wrote in a blog post. “But seeing is not acting: the gap between vision and language understanding and physical control remains the central bottleneck for embodied intelligence.”
The release, called the Qwen-Robot Suite, consists of three models:
Qwen-RobotNav: A navigation model designed to help robots move through the physical world. Alibaba said it can handle tasks including following natural-language instructions, locating objects, tracking targets and autonomous driving using a single model.
Qwen-RobotManip: A manipulation model focused on physical interaction. Trained on more than 38,000 hours of robotics and human demonstration data, it is designed to help robots grasp, move and manipulate objects while transferring skills across different robot platforms.
Qwen-RobotWorld: A world model that predicts how environments will change in response to actions. Alibaba said the system allows robots to anticipate the consequences of their actions before they occur, supporting tasks such as manipulation, navigation and driving.
According to Alibaba, the three models are intended to address the challenge in robotics of translating the perception and reasoning capabilities of large AI models into physical actions. While modern multimodal AI systems can understand images, language and spatial relationships, turning that understanding into reliable movement and object manipulation remains difficult.
Alibaba said the models can operate independently or be combined into larger robotic systems. The company also described the suite as a foundation for agentic robotics applications in which AI systems can plan tasks, navigate environments and carry out physical actions with limited human intervention.
As part of the release, Alibaba demonstrated the models performing tasks such as robotic navigation, object manipulation, autonomous exploration and long-horizon task execution. The company also introduced an experimental “Chat2Robot” interface that allows users to issue natural-language commands to a robot through a web browser.
Alibaba said the Qwen-Robot Suite is its first full robotics software stack built around that approach, combining navigation, manipulation and world modeling under a common framework.
Feature image credit: Alibaba