Adding smarts to Azure data lakes with Fabric Data Agents

Thursday April 10, 2025. 11:00 AM , from InfoWorld

Enterprise AI needs one thing if it’s to get around the limitations of large language models and deliver the results businesses need from their agents. It doesn’t matter if you’re building retrieval-augmented generation (RAG) applications, fine-tuning, or using the Model Context Protocol, what you need is data—and lots of it.

Thus Microsoft has been evolving its large-scale data lake platform Fabric to work with its Azure AI Foundry development platform. At its recent FabCon 2025 event, Microsoft announced further integrations between the two, using Fabric to develop agents that work with data and can then be orchestrated and built into applications inside Azure AI Foundry. By mixing familiar data analytics techniques with AI tools, Microsoft is making it easier to access enterprise data and insights and to use them to ground AI agent outputs.

Working with Fabric and AI

Fabric’s data agents are designed to be built and tested outside the Azure AI Foundry. You can use them to explore your data conversationally, as an alternative approach to traditional analytics tools. Here you ask questions about your data and use those answers to refine prompts and queries, ensuring that the prompt returns sensible data that can help guide effective business decisions when built into an application. With data scientists and business analysts using iterative techniques to deliver grounded queries, any risk associated with using an LLM in a business application is significantly reduced.

Fabric data agents work with existing OneLake implementations, giving them a base set of data to use as context for your queries. Along with your data, they can be fine-tuned using examples or be given specific instructions to help build queries.

There are some prerequisites before you can build a data agent. The key requirement is an F64 or higher client, along with a suitable data source. This can be a lake house, a data warehouse, a set of Power BI semantic models, or a KQL database. Limiting the sources makes sense, as it reduces the risk of losing the context associated with a query and keeps the AI grounded. This helps ensure the agent uses a limited set of known query types, allowing it to turn your questions into the appropriate query.

Building AI-powered queries

The agent uses user credentials when making queries, so it only works with data the user can view. Role-based access controls are the default, keeping your data as secure as possible. Agents’ operations need to avoid leaking confidential information, especially if they’re to be embedded within more complex Azure AI Foundry agentic workflows and processes.

Fabric data agents are based on Azure OpenAI Assistant APIs. This ensures that requests are gated through Azure’s AI security tools, including enforcing responsible AI policies and using its regularly updated prompt filters to reduce the risks associated with prompt injection or other AI attacks. As Fabric’s agents are read-only, we need to be sure they can’t be used to leak confidential data.

The queries generated by the agent are built using one of three different tools, which translate natural language to Fabric’s query languages: SQL for relational stores, DAX for Power BI, and KQL for non-relational queries using Kusto. This will allow you to validate any queries if necessary, as they’re designed to be correctly formed. However, in practice, Fabric data agents are intended for business users to build complex queries without needing to write any code.

Tuning an agent with instructions and examples

Microsoft itself suggests that building a Fabric data agent should require much the same level of knowledge as creating a Power BI report. Building an agent takes more than choosing data sources and tables; there’s a key element of the process that takes advantage of an LLM’s use of context.

By adding instructions and sample queries to an agent definition, you can start to improve the context it uses to respond to user queries. Instructions can refine which data sources are used for what type of question, as well as provide added specialist knowledge that might not be included in your data. For example, instructions can define specialized terms used in your organization. It’s not quite fine-tuning, as the instructions don’t affect the model, but it does provide context to improve output and reduce the risk of hallucination.

Having tuning tools built into the agent creation process is important. Microsoft is aiming to make Fabric a single source of truth for organizational data, so keeping the risk of errors to a minimum must be a key requirement for any AI built on that data.

Unlike other agent frameworks, you have to put in the necessary work to first ensure that you choose the right sources for your agent. Then you have to make sure that it’s given enough context to route queries to the appropriate source (for example, if you’re using Fabric to store observability information for applications, then your agent should use KQL). Finally, you have to assemble a set of question-and-answer pairs that train the agent in the types of queries it will work with and how it should respond.

If an agent gives incorrect answers to queries, the most effective way to improve its grounding is to use more examples to improve its context. The more curated examples you use when building a data agent, the better.

You don’t need much coding skill to build a Fabric data agent. The process is very much designed for data specialists—an approach in line with Microsoft’s policy of making AI tools available to subject matter experts. If you prefer to use code to build a data agent, Microsoft provides a Python SDK to create, manage, and use Fabric data agents.

Building your first Azure data agent

Getting started is simple enough. Fabric data agents are now a standard item in the Fabric workspace, so all you need to do is create a new agent. Start by giving it a name. This can be anything, though it’s best to use a name related to the agent’s purpose—especially if you intend to publish it as an endpoint for Azure AI Foundry applications.

Once you give your agent a name, you can add up to five data sources from your Fabric environment. The tool provides access to the Fabric OneLake data catalog, and once you’ve selected a source you can expose tables to the agent by simply using checkboxes to select the data you want. If you need to add other sources later or change your table choice, you can do so from Fabric’s Explorer interface. One useful tip is to ensure that tables have descriptive names. The more descriptive they are, the more accurate the queries generated by the agent.

You can now test the agent by asking questions. Microsoft notes that you can only ask questions about the data; there’s no reasoning capability in the model, so it can’t infer results where there is no data or where it would require access to information that isn’t stored in Fabric. That’s no surprise, as what we’re building here is a traditional RAG application with a deep connection to your data.

Tuning and sharing a data agent

The agent you have at this point has no instructions or tuning; you’re simply testing that it can construct and parse queries against your sources. Once you’ve got a basic agent in place, you can apply instructions and tuning. A separate pane in the design surface lets you add up to 15,000 characters of instructions. These are in English and should describe how the agent should work with your data sources. You can use the agent’s prompting tools to test instructions alongside your proven queries.

Now you can use example queries to tune the model, using the few-shot learning technique. By providing pairs of queries and their expected responses, you create a set of weights that guides the underlying model to produce the answers you expect. Few-shot learning is a useful tool for data-based agents because you can get reliable results with very few query/answer pairs. You can provide examples for all supported data sources, apart from Power BI semantic models.

Once tuned and tested, a Fabric data agent can be published and shared with colleagues or applications. Publishing creates two versions of your agent: one you can continue to change, and one that’s frozen and ready to share. Fabric data agents can be used with Azure AI Foundry as components in the Azure AI Agent Service. Here they are used as knowledge sources, with one Fabric data agent per Azure AI agent. Endpoints can be accessed via a REST API, using your workspace details as part of the calling URL.

Microsoft’s agent tool takes the interesting approach of putting development in the hands of subject matter experts. Here Fabric data agents are a low-code tool for building grounded, data-centric, analytical AI services that use Fabric’s OneLake as a RAG data source. Business analysts and data scientists can build, test, and tune agents before opening them up to the rest of the business via Azure AI Foundry, providing deep access to key parts of business data and allowing it to become part of your next generation of AI-powered business workflows.