Evolving the Windows AI platform

Thursday May 22, 2025. 11:00 AM , from InfoWorld

In May 2024, Microsoft announced the Windows Copilot Runtime. Targeted at its then-new Copilot+ PCs, it provided a mix of innovations, from on-device small language models to improvements to Windows’ support for NPU (neural processing unit) hardware, new artificial intelligence APIs for Windows applications, and an Arm port of a popular AI development and tuning tool. Delivering that took much of the past year, with the last of the key features, the AI APIs, finally reaching a stable release.

The Windows Copilot Runtime is a loose collection of tools to build Windows AI-based applications that run on your PC, leveraging Microsoft’s Phi Silica small language model and a collection of other models to offer computer vision and audio services. There’s no need to use expensive cloud-based models. Windows Copilot Runtime applications are intended to use only local resources. Hence the requirement for at least 40 trillion operations per second of dedicated AI accelerator chips.

With that first release, those capabilities were limited to a small subset of PCs. Now, they’re pretty much standard. Intel’s and AMD’s latest generation of silicon provide the same NPU capabilities as the Arm hardware used in the first Copilot+ hardware. They’re now a lot more affordable. The recently launched Surface devices use a cheaper NPU-based Qualcomm chipset and start at $799.

If there’s one thing we’ve learned in the past few years of the AI transition, it’s that a year is a very long time. New technologies have arrived that make it easier to ground AI in real-world data, and the number of open source models that are endpoint-ready has grown dramatically. Microsoft is bundling new small language models in new releases of Windows, though not always with developer access.

With low-cost PCs now capable of developing, tuning, and running complex AI models, Microsoft needs to bring the various components of Windows AI development into the same type of platform as Azure’s unified AI Foundry and make it easy to build into your Windows development tool chain.

Introducing Windows AI Foundry

One year later, Microsoft is evolving Windows Copilot Runtime into a new set of tools that delivers an AI development platform (and rebranding it as Windows AI Foundry). At the same time, Microsoft is adding new abstractions to DirectML to make it easier to run models on a wider range of PCs (and rebranding it as Windows ML). A key component is Foundry Local, an AI runtime that downloads and runs the right AI models for your hardware. Foundry Local is now in preview on Windows and macOS.

Unlike the Windows Copilot Runtime, the Windows AI Foundry is no longer focused on Copilot+ PCs. Instead, it builds on top of the new Windows ML platform to support AI inferencing on CPU, GPU, and NPU. Microsoft will still provide some APIs for Copilot+ PCs for now through the Windows App SDK. Windows AI Foundry builds on the new Foundry Local application as a CLI tool for managing available models, with support for other model catalogs beyond Microsoft’s.

The aim is to make it easier for developers to build AI applications in Windows and for users to install and use them on as wide a range of PCs as possible. Microsoft will be rolling out additional Windows AI APIs in the Windows App SDK for devices with NPUs that ship with its own AI inbox models. As Jatinder Mann, partner director of product, Windows platform and AI runtime, told me, “We’re bringing the full power of AI cloud to client. You can think of it as a system for running, customizing, and building AI models directly on Windows at scale.”

One key new feature in Windows AI Foundry is support for the LoRA (low-rank adaptation) tuning algorithm for the built-in Phi Silica model. You will be able to use the AI Toolkit in Visual Studio Code for tuning. There is also a way to quickly add and evaluate existing LoRA adapters to Phi Silica in the latest version of the growing AI Dev Gallery application from the Windows Store. Tools like this are an important part of giving developers the necessary skills to tune AI models and ensure that results are closer to what your applications need to deliver. Tuning is a useful way to add further grounding to applications, reducing the risks associated with hallucinations by using your own data to direct outputs.

In line with Azure’s adoption of the Model Control Protocol (MCP), Windows will be adopting MCP to integrate AI actions with applications. This will give your AI applications the ability to quickly use on-PC applications as agents and build them into intelligent workflows. Applications will be able to expose specific features as App Actions, which can quickly become MCP servers, providing new building blocks for custom workflows.

Using Foundry Local

I tested Foundry Local on both an x64 PC with an Nvidia GPU and an Arm PC with a Qualcomm NPU. First, I downloaded Phi 4 Mini on both devices. This ran using the GPU on the Nvidia device and the CPU on the Arm machine. It’s important to note that Foundry Local profiles your PC hardware and automatically downloads the best version of a model. If you want to see the current library of models, use the foundry model list command. This shows what ONNX runtimes are supported for what model. For this first test, it turns out that NPU inferencing isn’t supported by the model.

The list of models shows what is supported on the device you’re using. So, to see what has Qualcomm NPU support, you need to run the list on an Arm device. This showed that Phi4-mini-reasoning had NPU support, and I was able to use the NPU performance monitor in Task Manager to see that it was using the accelerator when I gave it a prompt using Foundry Local. Not all models have been optimized for all NPUs, and what is available may differ from architecture to architecture. Hopefully more models will be available for NPU inferencing soon.

The local prompt is an effective way to get started, but it’s more useful to use Foundry Local with its support for its own SDK and the OpenAI SDK. It exposes a REST endpoint for your model that you can then use to write code against. You can use the same REST calls with the local endpoint as you would with a cloud endpoint.

Microsoft has finally delivered the tools we need to build local AI applications. Foundry Local manages the models on our PCs (both for development and for users), providing the necessary APIs to move code from the cloud onto our PCs, taking advantage of dedicated inferencing accelerators and removing load from cloud data centers.

A foundation in Windows ML

Having Windows ML as a common framework for inferencing is another key part of this new Windows AI development platform. It provides a standard way to host and call ONNX models, using the most appropriate runtime environment for the PC that’s handling inferencing. Your code can use the Foundry Local SDK and CLI to check the available model type for your application, download it, and keep it up to date while Windows ML ensures that it runs. You can think of it as the foundation for Windows AI Foundry: Everything sits on top of it, and without it, nothing works.

Foundry Local isn’t the only model library supported by Windows AI Foundry; it works with public catalogs such as Ollama and Hugging Face as well. Keep in mind that Microsoft has done the work to optimize its own library of Foundry Local models for you, and you may need to do additional work to get the right ONNX implementations of other models in place.

The current preview of Foundry Local installs as a standalone application. However, the plan is for it to become a part of Windows, ensuring AI applications written to use its SDKs and APIs can run without requiring users to learn how to use WinGet. This will ensure that Windows ML is part of the platform and will simplify running models from the Foundry Local cache. Since Foundry Local is a Windows inbox application, there’s no need to manage execution providers as they will be selected and downloaded as needed along with the latest versions of models.

Targeting Windows ML should help future-proof your code. As new CPUs, GPUs, and NPUs roll out, it will provide a common hardware-independent environment for endpoint AI code. The result is a cross-platform successor to the original Copilot Runtime that will support all Windows PCs, not only those with GPUs or NPUs.

Connecting with Model Context Protocol

At a pre-Build 2025 event, Microsoft CTO Kevin Scott talked about the development of what he called “the agentic web,” a theme CEO Satya Nadella returned to in his Build keynote. At the heart of this idea is creating new tools based on the Model Context Protocol, which makes it easy to connect applications and data sources to AI applications. Scott describes MCP as the HTTP of a distributed AI platform.

Although I don’t quite agree with him (I consider it to be closer to technologies like CORBA), he makes an important point: We need ways to ground AI applications in real-world data without having to build complex and expensive embedding-based vector indexes, and we need to give applications that are part of an AI-powered workflow the necessary tools to control what data they share with agent queries, all wrapped up in a standard set of services that can be plugged into AI workflows.

It’s not surprising to see the Windows AI platform delivering its own MCP framework that will connect agents to Windows applications to form an Agentic Windows. This will allow code to expose specific features to applications, avoiding the complexity of having a generic operator model that would require UI access. Divya Venkataramu, director of product marketing for Windows Developer, described it as offering “a standardized framework for agents to interact with Windows-native apps via their MCP servers.”

Building MCP into the OS requires more than tools to add servers to APIs, it needs a way to allow agents to discover the available agents, ready for inclusion in agentic workflows. This comes in the shape of an MCP registry that is tied to enhanced security models that wrap the servers in least-privilege controls and auditable operations. Windows will now expose its own MCP servers for specific features. Initially it will only offer a subset of its API surface, but over time more features will be exposed via MCP.

One interesting scenario from my conversation with Venkataramu was using an agent to build out a development environment on a PC, ready to start coding, “We start by going to GitHub Copilot in VS Code and asking the agent to set up the environment for you.” This connects to WSL’s MCP server securely, “It will be able to install the right Linux distribution for you. If I want to go install my WSL environment, it takes a long time for me to even figure out the latest version of the distro that I want, then locating that distro and installing all the packages. Now, with enabling the agents to do that, it lets developers focus on getting ready to code by getting that environment itself set up.”

Add MCP servers to applications with App Actions

You can use available MCP SDKs to expose specific application features to agents, which work with existing APIs. Alternatively, Microsoft is introducing App Actions, a new feature for application developers that wraps specific application features as MCP Servers, much like adding a webhook to a service and exposing the behaviors you want to use in agents. Unlike webhooks, App Action definitions include the semantic descriptions to build AI-powered agents.

At the heart of an App Action is the concept of an entity. These are the objects passed to Actions and returned from them, things like simple variables, more complex sets of results, or documents, photos, or text. Entities are JSON objects, so you can use familiar techniques to build and deliver App Actions, and Windows will include new APIs in WinRT ready to help you quickly add entity support to your code. It’s currently supported by a preview release of the Windows SDK, which you need to declare as part of your project.

Building an App Action starts with the entity JSON, which defines how the action is described in an agent builder, followed by inputs and outputs, and then how it’s invoked, for example via a COM GUID. You do need to write the right handlers in your code to work with the action, using an Action Provider class, which implements an asynchronous interface. Your provider uses the name defined in the App Action JSON to route inputs to the right code, sending a response back when the action has completed.

Microsoft has an App Actions Playground to test your actions. Once they’re registered with Windows, you’re able to see them in the playground. You can then send the input entities, using the playground to see the application response. Applications can then set the availability of their actions, for example, toggling them on and off when you launch and shut down your code. This ensures that applications and agents only run under user control.

With MCP support in tools like Visual Studio Code, via its GitHub Copilot, there’s the prospect of having an AI-driven toolchain that links at a feature level between code tools and design tools, or code tools and cloud services. What’s needed is for tools to start adding their own servers so we can build them into intelligent workflows.

A platform for AI on Windows

It’s clear that Microsoft hasn’t forgotten what made Windows successful: the fact that it is a platform. At the heart of this new platform, Windows AI Foundry and the associated tools are designed to build AI into Windows applications, using any and all kinds of models (especially open source ones) and providing ways to link the code on your PC to a local instance of the wider agentic web. This means Microsoft is protecting users by providing the necessary tools to manage and secure MCP servers.

It is nice to see that much of what was announced at Build 2025 is ready to download and run, though it’s clear there’s still a lot to come as the tools are refined and folded into Windows. For now, start by finding the bits you need to build and test Windows’ agentic AI future before they’re bundled onto every PC—whether they’re running on CPU, GPU, or NPU. The tools are here and ready for you to get started, so you can ship code as soon as the tools are available to users as part of a future Windows update.