How to evaluate AI agent development tools and platforms

Tuesday September 9, 2025. 11:00 AM , from InfoWorld

Do a quick search for AI agent development tools, and it won’t take long to build an extensive list of options. How will your organization decide which tools and platforms to integrate into your development workflow? How well does a platform support the full AI agent development lifecycle, and which ones have the most mature integration, testing, security, and operational capabilities?

Your initial list may include open source frameworks, tools hosted on SaaS platforms, and low-code platforms that incorporate AI agent development capabilities. Hyperscalers, large language model providers, and startups are also marketing AI agent development platforms.

In addition to having a structured process for reviewing AI agent tools and platforms, organizations should consider the architecture rules, data management considerations, developer readiness around AI, and LLM testing strategies when reviewing opportunities to develop AI agents.

I reached out to experts to identify evaluation criteria for tools and platforms used in the AI agent development lifecycle.

1. AI agent development and deployment capabilities

Soham Mazumdar, co-founder and CEO of WisdomAI, says platforms require three core capabilities for developing AI agents:

Configurability that allows teams to customize behavior through prompts, tools, and domain-specific knowledge bases without requiring code changes.

Evaluation frameworks that enable rigorous testing, benchmarking against industry standards, and continuous performance validation across diverse scenarios.

Monitoring and reporting, providing comprehensive operational visibility through detailed logging, real-time analytics, and actionable feedback loops that capture user interactions.

Mazumdar says, “These work together to ensure agents are not only readily deployable but also consistently reliable, contextually adaptable, and positioned for continuous improvement as requirements evolve and user needs change.”

My take: To elevate more experiments into production, organizations will need to consider the testing and operational capabilities of AI agent platforms. Using third-party testing and operational tools may not be ideal if they treat an AI agent as a “black box.” Platforms that possess all three capabilities should facilitate the delivery of more reliable AI agents.

2. Data integration, orchestration, and controls for agentic AI

For organizations looking to extend AI agents into more autonomous agentic capabilities, evaluate platforms for greater data integration and controls. Christian Buckner, SVP of data & AI platform at Altair, says that enterprises need cross-domain context, core agent mechanics, and enterprise-grade controls to establish a real foundation for agentic automation.

Context means more than just data and includes policies, tools, past actions, and regulatory constraints.

Mechanics means building blocks like prompt engineering, data pipelines, multi-agent orchestration, tool registries, LLM connectivity, and access to enterprise systems.

Controls mean treating agents like actors in a production environment, with access governance, observability, outcome evaluation, and escalation baked in.

“Effective agentic AI requires a broad foundation of infrastructure and software, much of which is entirely new to enterprises,” says Buckner.

My take: Whereas AI agent tools focus on build, test, and deploy, platforms supporting agentic AI should make it possible to connect to an ecosystem of data platforms, AI agents, governance capabilities, and tools to support operations.

3. Developer experience

Once organizations have a broad understanding of AI agent development tools and agentic AI capabilities, then dive deeper into their features, form factors, and experiences. I always begin with end-user experiences because if tools are too difficult to learn or use, it slows down progress and leads to workarounds.

“When evaluating agent-building platforms, look for those that combine intuitive development experiences, deep enterprise integration, and built-in governance,” says Dhiraj Pathak, managing director of data and AI at Brillio. “The ability to orchestrate across systems, ensure explainability and compliance, and continuously learn from feedback is what will separate scalable transformation from short-lived pilots.”

My take: Have multiple developers of different skill levels evaluate several AI agent building tools. Time-box their effort to learn and conduct some experiments, then capture feedback about their experiences.

4. Integration capabilities and interoperability

AI agent building tools require connectivity to multiple data sources and the ability to test different models and configurations. So, after validating developer experience, review how easy it is to connect AI agent development tools with the targeted enterprise data sources.

“When selecting a tool for building AI agents, teams must ensure interoperability and governance are in place to enable full transparency, including inputs/outputs, decision-making paths, and external dependencies,” says Steve Lucas, CEO of Boomi. “In addition, teams should evaluate whether a platform integrates seamlessly across their broader digital ecosystem, as well as offering no-code functionality to support accessibility and ease of use.”

Kurt Muehmel, head of AI strategy at Dataiku, adds, “When evaluating AI agent building tools, look for platforms that support persistent integrations with enterprise systems and models, allow flexible LLM swapping, and enable continuous optimization through robust monitoring, debugging, and performance analytics. True agent maturity comes from this full-cycle adaptability.”

My take: One reason technology organizations end up with too many development tools is when selections focus on speed-to-delivery of a few near-term use cases. Platforms that demonstrate integration and interoperability capabilities are more extensible and more likely to be used for ongoing development.

5. Ability to learn from operational experience

AI agents need a ton of data, but it’s their abilities in interpreting information and recommending actions that require evaluation. In an enterprise context, this not only means connecting to data sources; it requires developing usage feedback loops and building intelligence around the business language, roles, and workflows.

“When evaluating AI agent building tools, it’s essential to review their ability to handle customer interactions and support tasks autonomously,” says Rob Scudiere, CTO at Verint. “Brands must determine their learning capability to improve performance over time through interactions and consuming more data. Additionally, consider the solution’s proactive nature in identifying opportunities and potential issues before they arise.”

“Agents need to learn from operational experience, but that learning has to be auditable and consistent,” adds Kenneth Stott, enterprise field CTO of Hasura. “In enterprise environments, you can’t have agents evolving in unpredictable directions based on feedback loops you can’t trace or control.”

Nikola Mrksic, CEO and co-founder of PolyAI, says what really makes an AI agent feel helpful is their ability to speak the customer’s language. “You must be able to include lexicon customization in your AI agent build, which means teaching the AI specific words, trademarked phrases, or jargon that are relevant to your business and customer base,” says Mrksic.

My take: Organizations should develop test plans by building AI agents using a subset of data, allowing beta-test users to validate results, and iteratively improving the model with usage and other data sources. An iterative approach helps grow end-user adoption and trust, while also evaluating the AI agent’s capabilities to improve its results.

6. Adoption of zero-trust security principles

Stephen Manley, CTO of Druva, says trustworthy AI agents will follow zero-trust principles to secure data access, integrate into a secure ecosystem, and protect data. Manley provides these requirements:

Agents must fit into a role-based, least-privileged access architecture, so you can continue to keep your master data safe.

Tools must integrate into your observability ecosystem, so you can monitor what is happening.

Protect the data across the entire AI lifecycle of data collection, preparation, model training, fine-tuning, and deployment.

Jimmy Mesta, co-founder and CTO of RAD Security, also weighed in on security capabilities. “Effective agents must access real-time telemetry, retain short- and long-term memory across interactions, and trigger actions in external systems,” he said.

Devidas Desai, SVP of product management of ASAPP, added, “Your generative agent must behave as an extension of your brand, so insist on deep observability, including live traces, redaction logs, token spend, and CX-level KPIs plus policy-driven levers that let you fine-tune prompts, guardrails, and fallbacks without opening a ticket.”

Stanislas Polu, software engineer and co-founder of Dust, added a final note. “AI agents won’t be tied to specific individuals but will act as full team members, so you will have to provision new permissions for them and enable agents to operate seamlessly across people and products.”

My take: AI agents are a security hotspot because of the enterprise data they access, the automations they enable, and the user roles they assume. Security capabilities will be a key differentiator of enterprise-ready AI agent development platforms.

7. Native and integrated devops capabilities

There are over 40 devops best practices covering the application lifecycle from planning and building through deployment and monitoring. Look for AI agent development tools that have some native capabilities and others that integrate with commercial devops platforms.

“Teams need capabilities like version traceability, testing for response accuracy, and built-in guardrails to ensure the ethical use of AI,” says Gloria Ramchandani, SVP of product at Copado. “Just as with any modern devops workflow, agents must also be deployable across environments with auditability and control. The most effective tools and platforms are those that empower cross-functional teams to ideate, build, test, and deploy agents with security at scale.”

My take: AI agents will require ongoing development, especially as models improve, the underlying data changes and new workflows get integrated. Define a high baseline of devops capabilities, especially around continuous testing, advanced CI/CD, and observability.

8. Non-negotiable operational reporting

A production AI agent may fail in subtle ways, such as responding with obvious hallucinations, or worse, it may take autonomous actions that are detrimental to the business and brand. Monitoring AI agents isn’t a pass-fail SLA-driven operation, and there’s a grey area around response accuracy. Advanced AI agent building tools and platforms will include monitoring and reporting to help users discern model drift and other potential errors.

“Once agents are successfully deployed for real work, it becomes critical to understand better when and why they fail,” says Michael Berthold, CEO of KNIME. “However, tools that allow users to actually trace what and why an agent came to a certain conclusion, or performs a specific action, are still rare.”

“Comprehensive monitoring and analytics are essential for developers to understand how their AI agents perform in production and identify what needs improvement,” adds Saurabh Sodani, CTO of Salesloft. “Without detailed dashboards showing success rates, user satisfaction, and failure points, developers can’t effectively iterate and refine their agents based on real usage.”

A new discipline of agent operations, or agentops, is emerging that combines capabilities from devops and modelops, including observability and monitoring, that help track the accuracy, reliability, and performance of AI agents.

Maryam Ashoori, head of product at IBM’s watsonx.ai, says, “Agentops is a fast-growing discipline, and to excel at it, AI builders need the right tools to optimize, deploy, and monitor agent behaviors at scale. Agentops tools should act as a central nervous system for orchestrating agents once they’re deployed, governing and securing them, so their autonomy doesn’t turn from asset to liability.”

My take: Agentops is an emerging capability that will likely incorporate aspects of traditional monitoring tools, AIops platforms, and AI agent development tools.

Conclusion

Many organizations are still in the early stages of developing AI agents, while tech companies continue to introduce new development capabilities such as agent integration protocols like Agent2Agent (A2A) and Model Context Protocol (MCP). Companies considering developing AI agents should have both a short-term and long-term view of the tools they’ll use to support the full AI agent development lifecycle.