Alibaba says its new AI model rivals DeepSeeks’s R-1, OpenAI’s o1

Friday March 7, 2025. 02:43 AM , from ComputerWorld

Alibaba Cloud on Thursday launched QwQ-32B, a compact reasoning model built on its latest large language model (LLM), Qwen2.5-32b, one it says delivers performance comparable to other large cutting edge models, including Chinese rival DeepSeek and OpenAI’s o1, with only 32 billion parameters.

According to a release from Alibaba, “the performance of QwQ-32B highlights the power of reinforcement learning (RL), the core technique behind the model, when applied to a robust foundation model like Qwen2.5-32B, which is pre-trained on extensive world knowledge. By leveraging continuous RL scaling, QwQ-32B demonstrates significant improvements in mathematical reasoning and coding proficiency.”

AWS defines RL as “a machine learning technique that trains software to make decisions to achieve the most optimal results and mimics the trial-and-error learning process that humans use to achieve their goals. Software actions that work towards your goal are reinforced, while actions that detract from the goal are ignored.”