Chapter 17: Multi-Agent Systems in Node.js

Theoretical Foundations

Imagine you are the CEO of a complex organization, and you need to solve a multifaceted problem, such as launching a new software product. You would not attempt to write the code, design the database, negotiate with vendors, and market the product all by yourself. Instead, you would delegate these tasks to specialized departments: Engineering, Design, Procurement, and Marketing. Each department has its own expertise, tools, and internal processes. Your role is not to perform the work but to orchestrate the flow of information and tasks between these specialized units, ensuring they collaborate effectively towards the common goal.

This is the fundamental principle of a Multi-Agent System. In the context of building intelligent applications, a single large language model (LLM) is like a brilliant but generalist consultant. While incredibly capable, it can become overwhelmed by complex, multi-step workflows that require distinct skills, access to different data sources, and the ability to perform actions in the real world. A multi-agent system decomposes this complexity by creating a team of specialized AI agents, each with a defined role, a set of tools, and a specific purpose.

This architectural pattern is often implemented as a Hierarchical Agentic Workflow. This hierarchy is not about superiority but about delegation and structure. At the top, you might have a "Supervisor" or "Orchestrator" agent whose primary job is to understand a high-level user request, break it down into manageable sub-tasks, and assign them to the appropriate specialized "Executor" agents. These executor agents are the specialists—the coder, the researcher, the analyst—and they perform the actual work. They can, in turn, use their own tools (like accessing a database, calling an API, or running a piece of code) to gather information or execute actions. The results are then passed back up the chain to the supervisor, which synthesizes them into a final, coherent response for the user.

The "Why": Overcoming the Limitations of Monolithic AI

The drive towards multi-agent systems is born from the practical limitations of trying to solve every problem with a single, monolithic AI call.

Context Window Saturation: LLMs have a finite "context window"—the amount of text they can consider at once. A complex research task might involve reading dozens of documents, running calculations, and synthesizing a report. A single agent attempting this would quickly exceed its context limit, leading to loss of information and degraded performance. By distributing the work, each agent only needs to focus on a small, manageable piece of the puzzle.
Lack of Specialization and Focus: A generalist model can be prone to "prompt drift," where it might lose sight of its original goal or provide generic answers. A specialized agent, however, is given a very narrow and focused persona and instructions. A "Data Analyst Agent" is prompted to be methodical, skeptical of its data sources, and proficient in statistical reasoning. A "Creative Writer Agent" is prompted to be imaginative and stylistically consistent. This specialization leads to higher-quality, more reliable outputs for each specific task.
Integration with External Systems (Tool Calling): As we explored in Chapter 12: Integrating Tools and APIs with Function Calling, the true power of an AI agent is unlocked when it can interact with the world beyond its training data. In a multi-agent system, this concept is elevated. One agent might be equipped with a tool to query a live database, another with a tool to execute Python code for complex calculations, and a third with a tool to send an email. The supervisor agent doesn't need to know how to query the database; it just needs to know which agent to ask for that information. This is a clean separation of concerns, a principle well-understood in software engineering.
Robustness and Parallelism: A well-designed multi-agent system can be more resilient. If one agent fails or provides a low-confidence response, the supervisor can re-route the task or ask for clarification. Furthermore, tasks that are independent of each other can be executed in parallel by different agents, dramatically speeding up the overall workflow.

Analogy: Multi-Agent Systems as Microservices Architecture

A powerful analogy for understanding multi-agent systems is the architectural shift from monolithic applications to microservices in web development.

Monolithic LLM Call (The Old Way): Imagine a massive, single-block application that handles user authentication, database operations, business logic, and UI rendering all in one giant codebase. This is like asking a single LLM to do everything. It's simple to start but becomes impossible to maintain, scale, or update. A small change in one part can break the entire system.
Multi-Agent System (The Microservices Way): Now, think about a modern web application built with microservices. You have a dedicated service for user authentication (e.g., using OAuth), another for handling product inventory, a separate one for processing payments, and another for sending notifications. Each service:
- Has a single responsibility: It does one thing and does it well.
- Has its own data and logic: The payment service doesn't need to know the internal details of the inventory service.
- Communicates via a well-defined API: Services interact through clear, standardized protocols (like REST or gRPC), not by directly accessing each other's internal code.
- Can be developed, deployed, and scaled independently: You can update the notification service without touching the payment service.

This maps perfectly to our agents: * Supervisor Agent: Acts like an API Gateway or an Orchestrator (e.g., Kubernetes). It receives the initial request (the user's prompt), routes it to the correct microservice (the specialized agent), and aggregates the responses. * Specialized Agents: These are your microservices. A "SQL Agent" is a service that only knows how to query a database. A "Browser Agent" is a service that only knows how to navigate the web and scrape information. A "Code Interpreter Agent" is a service that only knows how to execute code in a sandboxed environment. * Communication Protocol: Instead of HTTP requests, agents use structured messages (often formatted as JSON) to pass tasks and results between each other. LangChain.js provides the framework to manage these communication protocols, ensuring that messages are correctly routed and interpreted.

Just as microservices revolutionized web development by making applications more scalable, resilient, and easier to manage, multi-agent systems provide a similar paradigm for building sophisticated, complex, and reliable AI-powered applications.

The "How": Architectural Components and Communication Flow

Let's break down the mechanics of a hierarchical multi-agent system. The flow is a structured conversation between agents, governed by a clear protocol.

The Supervisor (Orchestrator): This is the brain of the operation. Its core logic is a routing function. It analyzes the user's initial prompt and determines the sequence of sub-tasks required. It maintains a "map" of available agents and their capabilities. For example:
- If the prompt is "Analyze our Q3 sales data and suggest three marketing strategies," the supervisor identifies two tasks: analyze_data and generate_marketing_ideas.
- It knows that the analyze_data task requires the "Data Analyst Agent" and the generate_marketing_ideas task requires the "Marketing Strategist Agent."
The Executor Agents: These are the workers. Each executor has:
- A specific role and persona: e.g., "You are a meticulous data analyst. Your only job is to query databases and provide accurate statistical summaries."
- A set of tools: These are the functions the agent can call. For the Data Analyst, this might be a runSQLQuery tool. For a "Researcher Agent," it might be a browseWebPage tool. (This directly builds on the function calling concepts from Chapter 12).
- A defined output schema: The agent is instructed to return its findings in a structured format (e.g., a JSON object), which makes it easy for the supervisor to parse and use.
The Communication Protocol: This is the lifeblood of the system. In LangChain.js, this is often managed through "Message" objects. The flow looks like this:
- User Prompt -> Supervisor
- Supervisor decides to delegate to Executor A. It sends a structured message: {"task": "runSQLQuery", "params": {"query": "SELECT ..."}}
- Executor A receives the message, uses its tool, and gets a result. It sends a message back: {"result": [...], "status": "complete"}
- Supervisor receives the result, updates its internal state, and may delegate the next task to Executor B.
- This continues until all sub-tasks are complete. The supervisor then synthesizes all the results into a final, user-friendly response.

Visualizing the Hierarchical Workflow

We can visualize this flow using a diagram. The supervisor sits at the top, delegating tasks to the specialized executors below.

A supervisor node at the top of the diagram delegates tasks to specialized executor nodes below, illustrating a hierarchical workflow structure.

Under the Hood: State Management and Tool Calling

The sophistication of a multi-agent system lies in how it manages state and executes tools.

State Management: The supervisor must maintain the "conversation state." This isn't just a history of messages; it's a structured object that tracks the overall goal, the status of each sub-task, and the data collected so far. For example, the state might look like this:

// A conceptual representation of the supervisor's state
interface AgentState {
    goal: string; // "Analyze Q3 sales and suggest marketing strategies"
    tasks: {
        taskId: string;
        description: string;
        assignedTo: 'analyst' | 'researcher' | 'synthesizer';
        status: 'pending' | 'in_progress' | 'complete' | 'failed';
        result?: any; // The data returned by the agent
    }[];
    finalResponse?: string;
}

This state is updated after each interaction. The supervisor's decision-making process (which agent to call next) is based on the current state of this object. If the analyst task is complete, the supervisor checks the state, sees the researcher task is also complete, and then decides it's time to delegate to the synthesizer.

Tool Calling in a Multi-Agent Context: While Chapter 12 focused on a single agent calling a tool, in a multi-agent system, this is layered.

Agent-to-Tool: An executor agent (like the Data Analyst) decides it needs to run a query. It invokes its own internal tool-calling mechanism. The LLM inside the agent generates a function call, the code executes the function (e.g., runSQLQuery(...)), and the result is fed back to the LLM to formulate a final answer.
Supervisor-to-Agent (Delegation as a Tool): The supervisor's "tools" are, in fact, the other agents. When the supervisor decides to delegate, it is effectively "calling a function" where the function is the entire agent. The parameters are the task description and any required data. This is a higher level of abstraction. The supervisor doesn't care how the analyst runs its query; it only cares about the input (the query parameters) and the output (the resulting data).

This layered approach creates a powerful and modular system. You can swap out the analyst agent for a more advanced one, or add a new translator agent, without having to fundamentally re-architect the supervisor's logic, as long as the communication protocol remains consistent. This is the essence of building truly intelligent, scalable, and maintainable AI applications in Node.js.

Basic Code Example

In a SaaS web application context, multi-agent systems are invaluable for handling complex, asynchronous user requests that require multiple steps of reasoning, data fetching, and external tool usage. For example, a user might request: "Analyze the sentiment of our latest customer feedback tickets and draft a summary email for the product team."

To solve this, we need specialized agents: 1. Orchestrator Agent: Receives the initial request and delegates tasks. 2. Research Agent: Fetches data (simulated here via a tool). 3. Analyst Agent: Processes the data and generates a summary.

The following TypeScript code demonstrates a minimal, self-contained multi-agent system using the @langchain/core primitives. This example uses ESM (ECMAScript Modules), the modern standard for Node.js, and simulates the OpenAI API calls to ensure the code runs locally without requiring API keys.

// multi-agent-demo.ts
// To run this file, ensure you have the following dependencies installed:
// npm install @langchain/core zod
// Run with: npx tsx multi-agent-demo.ts

import { z } from "zod";
import {
  BaseMessage,
  AIMessage,
  HumanMessage,
  SystemMessage,
} from "@langchain/core/messages";
import { BaseTool, ToolParams } from "@langchain/core/tools";
import { BaseLLM } from "@langchain/core/language_models/base";
import { CallbackManagerForLLMRun } from "@langchain/core/callbacks/manager";
import { Generation } from "@langchain/core/outputs";

// ==========================================
// 1. MOCK INFRASTRUCTURE (Simulating OpenAI)
// ==========================================

/**
 * A mock implementation of a Large Language Model (LLM).
 * In a real app, this would be replaced by `new ChatOpenAI({ model: "gpt-4o" })`.
 * We extend BaseLLM to satisfy LangChain's type requirements.
 */
class MockLLM extends BaseLLM {
  _llmType(): string {
    return "mock_llm";
  }

  async _generate(
    prompts: string[],
    runManager?: CallbackManagerForLLMRun
  ): Promise<{ generations: Generation[][] }> {
    // Simulate network latency
    await new Promise((resolve) => setTimeout(resolve, 500));

    const responseText = this.simulateReasoning(prompts[0]);

    return {
      generations: [
        [
          {
            text: responseText,
            generationInfo: { finish_reason: "stop" },
          },
        ],
      ],
    };
  }

  /**
   * Hardcoded logic to simulate an LLM's reasoning process.
   * This prevents hallucinations in our demo and ensures predictable output.
   */
  private simulateReasoning(prompt: string): string {
    if (prompt.includes("Researcher") && prompt.includes("fetch_tickets")) {
      return JSON.stringify({
        action: "fetch_tickets",
        action_input: { date_range: "last_7_days" },
      });
    }

    if (prompt.includes("Analyst") && prompt.includes("summarize")) {
      return JSON.stringify({
        action: "final_report",
        action_input: { 
          summary: "Overall sentiment is positive (70%), but users are confused about the new dashboard layout.",
          email_draft: "Subject: Weekly Feedback Summary\n\nHi Team, here is the analysis..." 
        },
      });
    }

    // Default fallback
    return "I am thinking...";
  }
}

// ==========================================
// 2. DEFINING TOOLS (Agent Capabilities)
// ==========================================

/**
 * Tool: Fetch Support Tickets
 * Simulates fetching data from a SaaS database.
 */
class FetchTicketsTool extends BaseTool {
  name = "fetch_tickets";
  description = "Fetches recent customer support tickets from the database.";

  // Zod schema ensures the LLM provides structured input
  schema = z.object({
    date_range: z.string().describe("e.g., 'last_7_days' or 'today'"),
  });

  async _call(input: z.infer<typeof this.schema>) {
    console.log(`   [Tool: FetchTickets] Fetching data for range: ${input.date_range}...`);
    // Simulated database response
    return JSON.stringify([
      { id: 1, text: "Loving the new update!", sentiment: "positive" },
      { id: 2, text: "Dashboard is confusing.", sentiment: "negative" },
      { id: 3, text: "Great support response time.", sentiment: "positive" },
    ]);
  }
}

/**
 * Tool: Send Email
 * Simulates sending a notification via an API (e.g., SendGrid, Resend).
 */
class SendEmailTool extends BaseTool {
  name = "send_email";
  description = "Sends an email to the internal product team.";

  schema = z.object({
    recipient: z.string().email(),
    subject: z.string(),
    body: z.string(),
  });

  async _call(input: z.infer<typeof this.schema>) {
    console.log(`   [Tool: SendEmail] Sending email to ${input.recipient}...`);
    return "Email sent successfully.";
  }
}

// ==========================================
// 3. AGENT CLASS DEFINITIONS
// ==========================================

/**
 * Base Agent Class
 * Encapsulates the logic of taking a prompt, invoking the LLM, 
 * and parsing the output (usually JSON for tool calls).
 */
abstract class BaseAgent {
  protected llm: BaseLLM;
  protected systemPrompt: string;

  constructor(llm: BaseLLM, systemPrompt: string) {
    this.llm = llm;
    this.systemPrompt = systemPrompt;
  }

  /**
   * Core execution loop for the agent.
   * 1. Construct full prompt (System + User input)
   * 2. Invoke LLM
   * 3. Parse JSON response
   */
  async execute(input: string): Promise<any> {
    const fullPrompt = `${this.systemPrompt}\n\nUser Request: ${input}`;

    console.log(`\n[Agent: ${this.constructor.name}] Thinking...`);

    // In a real LangChain agent, this would be `llm.invoke` with output parsing
    const response = await this.llm.invoke(fullPrompt);

    try {
      // Attempt to parse the simulated LLM response as JSON
      const parsed = JSON.parse(response);
      return parsed;
    } catch (e) {
      console.error("Failed to parse LLM response as JSON.");
      return null;
    }
  }
}

/**
 * The Orchestrator Agent (Manager)
 * Decides which sub-agent or tool to delegate tasks to.
 */
class OrchestratorAgent extends BaseAgent {
  constructor(llm: BaseLLM) {
    const prompt = `
      You are an Orchestrator Agent for a SaaS support dashboard.
      Your goal is to route tasks to the correct specialist.

      Available Specialists:
      1. "Researcher": Use this to fetch data (tickets).
      2. "Analyst": Use this to summarize data and draft emails.

      Output ONLY a JSON object with the following structure:
      {
        "next_agent": "Researcher" | "Analyst" | "FINISH",
        "context": "Any data needed for the next agent"
      }
    `;
    super(llm, prompt);
  }
}

/**
 * The Researcher Agent (Worker)
 * Specialized in data retrieval.
 */
class ResearcherAgent extends BaseAgent {
  private tools: BaseTool[];

  constructor(llm: BaseLLM, tools: BaseTool[]) {
    const prompt = `
      You are a Researcher Agent. You have access to tools.
      If the user asks for data, use the "fetch_tickets" tool.
      Output ONLY a JSON object representing the tool call:
      {
        "action": "fetch_tickets",
        "action_input": { "date_range": "last_7_days" }
      }
    `;
    super(llm, prompt);
    this.tools = tools;
  }

  async execute(input: string): Promise<any> {
    const result = await super.execute(input);

    // Check if the LLM wants to use a tool
    if (result && result.action) {
      const tool = this.tools.find((t) => t.name === result.action);
      if (tool) {
        return await tool.invoke(result.action_input);
      }
    }
    return result;
  }
}

/**
 * The Analyst Agent (Worker)
 * Specialized in processing data and generating text.
 */
class AnalystAgent extends BaseAgent {
  constructor(llm: BaseLLM) {
    const prompt = `
      You are an Analyst Agent. You receive raw ticket data.
      Analyze the sentiment and draft a summary email.
      Output ONLY a JSON object:
      {
        "action": "final_report",
        "action_input": {
          "summary": "Brief analysis...",
          "email_draft": "Subject: ...\n\nBody: ..."
        }
      }
    `;
    super(llm, prompt);
  }
}

// ==========================================
// 4. MAIN EXECUTION FLOW (The System)
// ==========================================

/**
 * The Multi-Agent System Orchestrator
 * Manages the state and flow of messages between agents.
 */
async function runMultiAgentSystem() {
  // 1. Initialize Infrastructure
  const llm = new MockLLM();
  const fetchTool = new FetchTicketsTool();

  // 2. Initialize Agents
  const researcher = new ResearcherAgent(llm, [fetchTool]);
  const analyst = new AnalystAgent(llm);
  const orchestrator = new OrchestratorAgent(llm);

  // 3. Define the User Request (SaaS Context)
  const userRequest = "Analyze sentiment of recent tickets and draft an email.";

  // 4. Execution Loop
  console.log("🚀 Starting Multi-Agent System...");
  console.log(`📝 User Request: "${userRequest}"`);

  // --- Step A: Orchestrator decides the first step ---
  const orchestratorDecision = await orchestrator.execute(userRequest);
  console.log(`[Orchestrator] Decided: ${orchestratorDecision.next_agent}`);

  // --- Step B: Delegate to Researcher ---
  if (orchestratorDecision.next_agent === "Researcher") {
    const rawData = await researcher.execute(
      `Fetch tickets for ${orchestratorDecision.context || "last_7_days"}`
    );

    // --- Step C: Delegate to Analyst ---
    // Pass the raw data from Researcher to Analyst
    const analysisResult = await analyst.execute(
      `Analyze this data: ${rawData}`
    );

    // --- Step D: Final Action (Send Email) ---
    if (analysisResult.action === "final_report") {
      const emailTool = new SendEmailTool();
      const emailInput = analysisResult.action_input;

      await emailTool.invoke({
        recipient: "product-team@saas-company.com",
        subject: emailInput.email_draft.split('\n')[0].replace('Subject: ', ''),
        body: emailInput.email_draft,
      });
    }
  }

  console.log("\n✅ Workflow Completed Successfully.");
}

// Execute the system
runMultiAgentSystem().catch(console.error);

Detailed Line-by-Line Explanation

1. Mock Infrastructure (`MockLLM`)

In a production environment, you would use new ChatOpenAI({ apiKey: process.env.OPENAI_API_KEY }). However, for a "Hello World" example that runs immediately without configuration, we create a MockLLM. * extends BaseLLM: LangChain.js relies on abstract base classes. By extending BaseLLM, we ensure our mock object has the same interface as a real LLM. * _generate: This is the core method called internally by LangChain. We simulate a network delay (setTimeout) to represent the latency of an API call. * simulateReasoning: This method acts as a deterministic router. It looks at the text prompt and returns a specific JSON string. This ensures the demo code produces the exact expected output every time, avoiding random hallucinations that might break the subsequent parsing logic.

2. Defining Tools (`FetchTicketsTool`, `SendEmailTool`)

Tools are the hands and feet of an agent—they allow the agent to interact with the outside world. * BaseTool: We extend LangChain's BaseTool. * schema (Zod): We use Zod to define the input structure. This is critical for Type Safety. If the LLM tries to pass a number where a string is expected, Zod will catch it. * _call: This method contains the actual implementation logic. In a real app, this would contain fetch() calls to your database or external APIs.

3. Agent Classes

We define a hierarchy to demonstrate specialization. * BaseAgent: Contains shared logic. The execute method constructs the prompt, sends it to the LLM, and parses the JSON response. Note the try/catch block around JSON.parse—LLMs are text models and do not natively output JSON; parsing must be guarded against malformed responses. * OrchestratorAgent: This is the "Manager." Its system prompt instructs it to output a specific JSON format (next_agent, context). It doesn't perform work itself; it decides who should work. * ResearcherAgent: This agent is specialized in data retrieval. Its prompt forces it to output a tool call structure (action, action_input). Crucially, its execute method checks for a tool match and invokes it immediately. * AnalystAgent: This agent receives data (from the Researcher) and performs synthesis. It outputs a final report structure.

4. Main Execution Flow (`runMultiAgentSystem`)

This function simulates the runtime environment of a Node.js web server handling a request. 1. Initialization: We instantiate the LLM and all agents. In a real SaaS app, these might be singletons or scoped to a user session. 2. Orchestration: * The Orchestrator receives the user's natural language request. * It returns a decision (e.g., { next_agent: "Researcher" }). 3. Delegation: * The system passes control to the Researcher. The Researcher invokes its tool (fetch_tickets), simulating a database query. * The result (raw JSON data) is captured. 4. Chaining: * The raw data is injected into a new prompt for the Analyst. * The Analyst generates a summary and an email draft. 5. Final Tool Execution: The system detects the final report and invokes the SendEmailTool to complete the workflow.

Common Pitfalls

When implementing multi-agent systems in Node.js/TypeScript, watch out for these specific issues:

JSON Parsing Hallucinations
- The Issue: LLMs are text generators, not JSON generators. They often output Markdown code blocks (e.g., json ...) or trailing commas, causing JSON.parse() to throw a syntax error.
- The Fix: Always use a robust output parser. In LangChain, use JsonOutputParser. In manual implementations, use regex to extract JSON content from code blocks before parsing.
Infinite Loops (Agent Cycles)
- The Issue: Agent A calls Agent B, Agent B calls Agent A. Without a termination condition, the system will hang or consume your entire budget.
- The Fix: Implement a max_iterations counter in your execution loop. If the counter is reached, force the next_agent to be "FINISH" or "Human_Takeover".
Async/Await Waterfalls
- The Issue: In a naive implementation, you might await Agent A, then await Agent B, then await Agent C. If Agents B and C don't depend on each other, this wastes time.
- The Fix: Use Promise.all() for independent agents. However, in a sequential workflow (like the one above), await is correct. Be careful not to block the Node.js event loop with synchronous heavy computation inside an agent's tool.
State Management in Stateless Environments
- The Issue: HTTP requests in web apps are stateless. If your multi-agent system spans multiple requests (e.g., waiting for human input), storing conversation history in memory (variables) will fail when the server restarts or scales horizontally.
- The Fix: Persist agent state (message history) to a database (Redis, PostgreSQL) using a unique sessionId or conversationId.
Vercel/Serverless Timeouts
- The Issue: Multi-agent chains can take a long time (LLM latency + tool execution). Serverless functions (like Vercel) often have strict timeouts (e.g., 10s or 30s).
- The Fix: For long-running workflows, do not await the full chain in the HTTP response. Instead:
  1. Accept the request.
  2. Return a 202 Accepted immediately.
  3. Process the agent chain in a background job (e.g., using Inngest, AWS SQS, or a background worker like BullMQ).

Visualizing the Workflow

The following diagram illustrates the flow of control and data between the specialized agents.

This diagram illustrates the orchestration of specialized agents, where a central coordinator dispatches tasks to asynchronous background workers like BullMQ or AWS SQS to handle complex, multi-step workflows.

The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon

Loading knowledge check...

Code License: All code examples are released under the MIT License. Github repo.

All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.