OpenAI GPT-4.1 models promise improved coding and instruction following

Tuesday April 15, 2025. 06:24 PM , from InfoWorld

OpenAI has announced a new family of models, GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano, which it says outperforms GPT-4o and GPT-4o mini “across the board.”

In conjunction with the launch of the GPT-4.1 family, OpenAI also announced that it is deprecating GPT-4.5 Preview in the API. GPT-4.5 Preview will be turned off completely on July 14, 2025, because GPT-4.1 offers similar or better performance for many functions at lower cost and latency, the company said.

OpenAI said that the new models have significantly larger context windows than their predecessors—one million tokens, compared to GPT-4o’s 128,000—and offer improved long-context comprehension. Output token limits have also been increased from 16,385 in GPT-4o to 32,767 in GPT-4.1.

However, GPT-4.1 will be available only via the API, not in ChatGPT. OpenAI explained that many of the improvements have already been incorporated into the latest version of GPT-4o, and more will be added in future releases.

OpenAI says it worked in close partnership with the developer community to optimize the models to meet their priorities. For example, it improved the coding score on SWE-bench verified by 21.4% over that of GPT-4o.

Better at coding and complex tasks

The company specifically touts the performance of the GPT-4.1 mini and GPT-4.1 nano models.

“GPT‑4.1 mini is a significant leap in small model performance, even beating GPT‑4o in many benchmarks. It matches or exceeds GPT‑4o in intelligence evals while reducing latency by nearly half and reducing cost by 83%,” the announcement said. “For tasks that demand low latency, GPT‑4.1 nano is our fastest and cheapest model available. It delivers exceptional performance at a small size with its 1 million token context window, and scores 80.1% on MMLU, 50.3% on GPQA, and 9.8% on Aider polyglot coding—even higher than GPT‑4o mini. It’s ideal for tasks like classification or autocompletion.”

These improvements, OpenAI said, combined with primitives such as the Responses API, will allow developers to build more useful and reliable agents that will perform complex tasks such as extracting insights from large documents and resolving customer requests “with minimal hand-holding.”

OpenAI also said that GPT-4.1 is significantly better than GPT-4o at tasks such as agentically solving coding tasks, front-end coding, making fewer extraneous edits, following diff formats reliably, ensuring consistent tool usage, and others.

It is also less expensive. The company said it costs 26% less than GPT-4o for median queries, and the prompt caching discount is increasing from 50% to 75%. Additionally, long context requests are billed at the standard per-token price. The models may also be used in OpenAI’s Batch API at an additional 50% discount.

Analysts raise questions

However, Justin St-Maurice, technical counselor at Info-Tech Research Group, is looking askance at some of the claims.

“This announcement definitely brings up some questions, especially when it comes to efficiency, pricing, and scale,” he said. “If the 83% cost reduction is true, it could be a big deal, especially with major enterprises and cloud providers looking closely at value per watt. That said, it doesn’t mention what baseline or model this is being compared to.”

But St-Maurice still thinks that, despite the price reduction, the models are premium offerings.

“OpenAI’s focus on long-context performance and more efficient variants like mini or nano aligns with current conversations around MCP [Model Context Protocol] servers and agentic systems,” he said. ”Being able to process up to a million tokens opens the door for more complex workflows and real-time reasoning, but the $2 per million input tokens and $8 per million output make it more of a premium offering, especially when compared to other options like Llama, which are increasingly being deployed for cost-sensitive inference at scale.”

That being the case, St-Maurice said, “if OpenAI can prove these cost and performance gains, then it will strengthen its position for efficient, scalable intelligence. However, for stronger enterprise adoption, they’ll need to be more transparent with practical benchmarks and pricing baselines.”