A distributed state of mind: Event-driven multi-agent systems

Tuesday January 28, 2025. 10:00 AM , from InfoWorld

While large language models are useful for chatbots, Q&A systems, translation, and other language tasks, their real power emerges when they can act on insights, automating a broader range of problems. In other words, we unlock their greatest potential when we tap their reasoning capabilities.

Reasoning agents have a long history in artificial intelligence research—they refer to a piece of software that can generalize what it has previously seen to apply in situations it hasn’t seen before. It’s like having a decision-making robot that can adapt based on what’s happening around it.

But the real excitement comes when reasoning agents work together in multi-agent systems.

The power of multi-agent systems

Imagine assembling a dream team, where each member has a unique skill set but collaborates toward a shared goal. Multi-agent systems enable this kind of teamwork, relying on networks of agents that communicate, share context, and coordinate actions. These systems excel at solving complex challenges too big for any single agent—or person—to handle.

Of course, with great power comes great complexity.

Coordinating multiple agents presents challenges familiar to anyone who has ever worked on a group project. There’s miscommunication, overlapping responsibilities, and difficulty aligning toward a common objective. Now, scale that to dozens—or hundreds—of autonomous agents, each acting independently but needing to stay in sync.

This article explores how event-driven design—a proven approach in microservices—can address the chaos, creating scalable, efficient multi-agent systems. If you’re leading teams toward the future of AI, understanding these patterns is critical.

Let’s dive in.

The challenges of multi-agent collaboration

Managing multi-agent systems introduces unique difficulties:

Context and data sharing: Agents must exchange information accurately and efficiently, avoiding duplication, loss, or misinterpretation.

Scalability and fault tolerance: As the number of agents grows, the system must handle complex interactions while recovering gracefully from failures.

Integration complexity: Agents often work with diverse systems and tools, requiring seamless interoperability.

Timely and accurate decisions: Agents need to make real-time decisions based on fresh, up-to-date data to ensure responsiveness and avoid poor outcomes.

Safety and validation: Guardrails are essential to prevent unintended actions, and stochastic outputs demand rigorous quality assurance.

Overcoming these challenges requires more than just thoughtful coordination—it calls for proven design patterns tailored for multi-agent systems. In the next section, we’ll dive into these patterns and demonstrate how they can be implemented using event-driven design to unlock scalable, reliable, and efficient multi-agent architectures.

Multi-agent design patterns

Multi-agent design patterns define the interaction structures that enable agents to communicate, collaborate, or compete to solve problems. By focusing on the problem domain and the nature of agent interactions, these patterns offer solutions for coordinating autonomous entities in a range of scenarios.

Below, we explore four key patterns: orchestrator-worker, hierarchical agent, blackboard, and market-based. We show how each of these common multi-agent patterns are transformed into event-driven distributed systems, gaining the operational advantages of data streaming applications and removing the need for specialized communication paths for agent orchestration. We’ll describe the event-driven version of these patterns using conceptual models from Apache Kafka. For anyone unfamiliar with Kafka, an accessible tour of its foundations can be found here.

Orchestrator-worker pattern

In this pattern, a central orchestrator assigns tasks to worker agents and manages their execution. This pattern, similar to the Master-Worker Pattern in distributed computing, ensures efficient task delegation and centralized coordination while allowing workers to focus on specific, independent tasks.

Orchestrator-worker pattern
Confluent

Using data streaming, we can adapt this pattern to make the agents event-driven. Data streaming technologies like Apache Kafka offer key-based partitioning strategies, so the orchestrator can use keys to distribute command messages across partitions in a single topic. Worker agents can then act as a consumer group, pulling events from one or more assigned partitions to complete the work. Each worker agent then sends output messages into a second topic where it can be consumed by downstream systems.

The pattern now looks like this:

Event-driven orchestrator-worker pattern
Confluent

While this diagram looks more complex, it dramatically simplifies the operations of the system.

The orchestrator no longer has to manage its connections to worker agents, including managing what happens if one dies or handling more or fewer worker agents. Instead, it uses a keying strategy that distributes work across partitions. For events that should be processed by the stateful worker agent as some previous message, the same key can be used for each event in a sequence. The worker agents gain the benefits of any consumer group.

The worker agents pull from one or more partitions, and the Kafka rebalance protocol assures that each worker has similar workloads even as worker agents are added or removed. In the event of a worker failure, the log can be replayed from a given partition for a saved offset. The orchestrator no longer needs bespoke logic for managing workers. Instead, it simply specifies work and distributes it with a sensible keying strategy. Similarly, the worker agents inherit the functionality of a Kafka consumer group, so they can use common machinery for coordination, scaling, and fault recovery.

Hierarchical agent pattern

In this pattern, agents are organized into layers, where higher-level agents oversee or delegate tasks to lower-level agents. It’s particularly effective for managing large, complex problems by breaking them into smaller, more manageable parts.

Hierarchical multi-agent pattern
Confluent

To make the hierarchical pattern event-driven, we apply the same techniques for decomposing work in the orchestrator-worker pattern recursively in the agent hierarchy such that each non-leaf node is the orchestrator for its respective subtree.

In the example above, Mid Level Agent #1 is itself an orchestrator for its leaf agents. Its entire workflow is functionally encapsulated into its role as a worker orchestrated by the Top Level Agent.

The hierarchical topology depicted in the previous diagram now looks like the image below:

Event-driven hierarchical multi-agent pattern
Confluent

For the event model, note that topics are logical swimlanes for agent-specific functional workloads, so siblings in the tree structure will form consumer groups processing the same topics as depicted above. By making the hierarchical organization event-driven, we make the system asynchronous, greatly simplifying the conceptual model for data flow. Our operations are more resilient as the topography is no longer hard-coded. Agents can be added or removed from sibling groups without the individual agents having to manage this change or faults in the communication paths.

Blackboard pattern

The blackboard pattern provides a shared knowledge base—a “blackboard”—that agents use to post and retrieve information. This pattern enables agents to collaborate asynchronously without direct communication. It is especially useful for solving complex problems requiring incremental, multi-agent contributions.

Blackboard pattern
Confluent

We can adapt this pattern to be event-driven in a straightforward way.

The blackboard becomes a data streaming topic consisting of messages produced from and consumed by the worker agents. If needed, a keying strategy or payload fields can be used to annotate which agent originated the event.

The event-driven version looks like this:

Event-driven blackboard pattern
Confluent

Again, this creates a significant operational simplification and reduces the amount of bespoke logic that must be created outside of the infrastructure. Each worker agent simply produces and consumes events in order to collaborate with the rest of the group.

Market-based pattern

This pattern models a decentralized marketplace where agents negotiate and compete to allocate tasks or resources.

For example, solver or bidding agents can exchange responses with each other to refine their responses. This process is repeated for a fixed number of rounds where a final answer is compiled by an aggregator agent based on the final responses from all agents.

Market-based pattern
Confluent

Financial services have long used data streaming platforms as systems of record for the world’s largest stock exchanges. Data streaming systems like Kafka and Confluent even run many high-throughput over-the-counter securities markets. This is commonly implemented with a topic for bids and another for asks to which each solver agent publishes events. A simple market maker service creates transactions where bids and asks are matched and publishes notifications of these events to a third topic that the solver agents consume.

This is an important simplification as it eliminates the quadratic connections that otherwise occur between the solver agents, which are difficult to manage in the presence of many agents or as agents are added or lost.

The pattern now looks like this:

Event-driven market-based pattern
Confluent

In making each of these patterns event-driven, we’ve operated under the premise that agents are driven by events. Let’s dig into that a bit further next.

A common operating model for coordination and communication

The outlined design patterns depend on a shared operating model for seamless agent coordination, similar to microservices.

At the core of this model is a shared language—a way for agents to exchange information, maintain alignment, and collaborate efficiently. Events serve as this language, acting as structured updates that enable agents to interpret instructions, share context, and coordinate tasks. Think of it as the system’s group chat: keeping agents synchronized and integrating new ones smoothly.

Here’s what this shared language enables agents to do:

Interpret commands: Agents receive clear, standardized instructions, like JSON payloads, guiding their actions.

Share context: Agents broadcast updates consistently, avoiding duplication and ensuring mutual understanding.

Coordinate tasks: Agents perform independent actions aligned toward shared objectives, even in dynamic or unpredictable environments.

This is where interfaces play a critical role. Agents must be designed to react to events and commands rather than act in isolation, ensuring they integrate seamlessly into a larger, event-driven ecosystem.

Specifying the interface for agents

A critical insight that serves as a liberating simplifying assumption is that these agents don’t divine action; rather, they react to upstream events or commands. Operating within dynamic, interconnected environments, agents can be modeled with three components:

Input: Consuming events or commands.

Processing: Applying reasoning or gathering additional data.

Output: Emitting actions for downstream consumers.

This reactive design mirrors microservices, enabling the use of proven design patterns for scalable, efficient system development.

The shift from request/response to event-driven

Drawing again from our connection to event-driven microservices, traditionally, parts of a system interact through a request/response model. While straightforward, this approach struggles with scalability and real-time responsiveness, introducing delays and bottlenecks as systems grow. It’s akin to needing permission for every action, which slows down operations.

The evolution towards an event-driven architecture marks a pivotal shift.

In this model, agents are designed to emit and listen for events autonomously. Events act as signals that something has happened, allowing agents to respond without requiring direct, orchestrated requests. This approach ensures agility, scalability, and a more dynamic system.

Agent interfaces in event-driven systems are defined by the events they emit and consume, encapsulated in simple, standardized messages like JSON payloads. This structured design:

Simplifies how agents understand and react to events.

Promotes reusability of agents across different workflows and systems.

Enables seamless integration into dynamic, evolving environments.

For example, a health monitoring agent could emit alerts when thresholds are breached, effortlessly integrating into workflows without custom dependencies.

Ensuring consistency and coordination

For a distributed system to function harmoniously, maintaining a consistent state across all agents is critical. This is where the concept of an immutable log comes into play. Every event or command an agent processes is recorded in a log that is permanent and unchangeable. Acting as a single source of truth, the log ensures all agents operate with the same context, enabling:

Reliable coordination and synchronization.

Resilience through replayable events, allowing recovery from failures.

Sophisticated consumer models, where multiple agents can respond to the same event without confusion or overlap.

This approach dramatically improves system reliability, ensuring that agents work cohesively to achieve shared goals, even in complex or unpredictable environments.

Key takeaways

Multi-agent systems are redefining what’s possible in AI. But to realize their full potential, we must overcome challenges like scalability, fault tolerance, and real-time decision-making. Event-driven design offers a clear path forward.

As AI applications grow more sophisticated, event-driven multi-agent systems will be crucial for tackling real-world complexity. By adopting this model and standardizing communication between agents, we create a foundation that is resilient, efficient, and adaptable to changing demands, unlocking the full potential of these architectures.

Sean Falconer is AI entrepreneur in residence at Confluent and Andrew Sellers is head of technology strategy at Confluent.

—

Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.