Navigation
Search
|
China's Moonshot Launches Free AI Model Kimi K2 That Outperforms GPT-4 In Key Benchmarks
Tuesday July 15, 2025. 12:50 AM , from Slashdot
![]() The model's standout feature is its optimization for 'agentic' capabilities -- the ability to autonomously use tools, write and execute code, and complete complex multi-step tasks without human intervention. In benchmark tests, Kimi K2 achieved 65.8% accuracy on SWE-bench Verified, a challenging software engineering benchmark, outperforming most open-source alternatives and matching some proprietary models. On LiveCodeBench, arguably the most realistic coding benchmark available, Kimi K2 achieved 53.7% accuracy, decisively beating DeepSeek-V3's 46.9% and GPT-4.1's 44.7%. More striking still: it scored 97.4% on MATH-500 compared to GPT-4.1's 92.4%, suggesting Moonshot has cracked something fundamental about mathematical reasoning that has eluded larger, better-funded competitors. But here's what the benchmarks don't capture: Moonshot is achieving these results with a model that costs a fraction of what incumbents spend on training and inference. While OpenAI burns through hundreds of millions on compute for incremental improvements, Moonshot appears to have found a more efficient path to the same destination. It's a classic innovator's dilemma playing out in real time -- the scrappy outsider isn't just matching the incumbent's performance, they're doing it better, faster, and cheaper. Read more of this story at Slashdot.
https://developers.slashdot.org/story/25/07/14/1942209/chinas-moonshot-launches-free-ai-model-kimi-k...
Related News |
25 sources
Current Date
Jul, Tue 15 - 06:51 CEST
|