Chapter 13: Collaborative Agents - Researcher & Writer Example

Theoretical Foundations

In the landscape of autonomous agents, individual agents are powerful, but their true potential is unlocked when they collaborate. The core concept of this chapter is the orchestration of specialized agents into a cohesive workflow, where each agent's output becomes the input for another, creating a system that is greater than the sum of its parts. This is not merely about chaining function calls; it's about designing a resilient, stateful, and iterative process that mirrors complex human workflows, such as research and writing.

To understand this, let's first establish a foundational concept from a previous chapter: the StateGraph. In Chapter 12, we introduced the StateGraph as a blueprint for defining how an agent's state evolves over time through a series of nodes (actions) and edges (transitions). Here, we elevate that concept. We are no longer building a single agent that performs a sequence of tasks. Instead, we are building a multi-agent system where the StateGraph orchestrates the communication and control flow between distinct, autonomous agents, each with its own specialized purpose.

The Analogy: A Modern Software Development Team

Imagine a high-functioning software development team. You don't have a single "full-stack" developer who writes the frontend, designs the database schema, writes the backend API, and handles deployment. That would be inefficient and lead to mediocre results across the board. Instead, you have specialists: a UI/UX Designer, a Backend Engineer, a Frontend Developer, and a DevOps Engineer.

The UI/UX Designer (our Researcher) gathers requirements, sketches wireframes, and defines the user experience. They don't write code; they produce a blueprint.
The Backend Engineer (our Writer) takes the blueprint, designs the database, and builds the API logic. They transform abstract requirements into concrete, functional code.
The Frontend Developer (a potential third agent, e.g., a Reviewer) takes the API specifications and the wireframes to build the user interface. They might find that the API doesn't provide all the necessary data, sending feedback to the Backend Engineer for refinement.

This is a collaborative workflow. The output of one specialist is the input for another. The process is often cyclical, not linear. A bug found by the frontend developer might send the backend engineer back to the drawing board. This is precisely the model we are building with LangGraph. The Researcher is our specialist for information gathering and synthesis. The Writer is our specialist for content creation and formatting. The Supervisor (or the graph's control flow itself) is the project manager who directs the work and manages the feedback loops.

Why Orchestration Over a Single Monolithic Agent?

A single, monolithic agent tasked with "research and write a report" would face significant challenges:

Cognitive Load and Focus: The agent would have to juggle multiple, distinct cognitive tasks simultaneously: searching for information, synthesizing it, maintaining a coherent narrative, and checking for factual accuracy. This dilutes its focus, often leading to shallow research and generic writing.
State Management Complexity: The internal state of such an agent would become a tangled mess. How do you differentiate between raw research notes, synthesized key points, and draft paragraphs? How do you track the progress of each sub-task?
Inflexibility and Lack of Iteration: A linear, monolithic process struggles with iteration. If the writing phase reveals a gap in the research, there's no clear mechanism to go back and refine the research without restarting the entire process. This is inefficient and brittle.

By decomposing the problem into specialized agents, we gain several advantages:

Single Responsibility Principle: Each agent has one clear job. The Researcher is optimized for information retrieval and synthesis. The Writer is optimized for language generation and structuring. This leads to higher quality outputs from each specialist.
Shared, Persistent State: Using a shared state, managed by the LangGraph, allows for a single source of truth. The Researcher populates a researchNotes field in the state. The Writer reads from this field. The state acts as a collaborative workspace, like a shared document or a project board, that all agents can access and modify.
Cyclical Control Flows for Iterative Refinement: This is the most critical advantage. We can design conditional edges in our graph that create feedback loops. For example, after the Writer produces a draft, a conditional edge can check its quality. If the quality is low, the graph can route the state back to the Researcher to gather more information or to the Writer to try again with more context. This mimics the human process of drafting, reviewing, and refining.

The Web Development Analogy: Microservices Architecture

Let's map this directly to a web development architecture. A monolithic application is like our single, monolithic agent. It's simple to start but becomes hard to maintain and scale.

A microservices architecture is the perfect analogy for our multi-agent system.

Microservice (e.g., UserService): This is analogous to a specialized agent like our Researcher. It has a single, well-defined responsibility: managing user data. It exposes a clear API (e.g., GET /users/{id}).
API Gateway / Orchestrator: This is analogous to our LangGraph. It doesn't contain the business logic itself but knows how to route requests to the appropriate microservices and combine their responses. It manages the flow of data between services.
Message Queue / Shared Database: This is analogous to our Shared State. When the OrderService needs user information, it doesn't call the UserService directly and wait. It might query a shared cache or a message bus where the UserService has published user data. This decouples the services. Similarly, our Researcher publishes its findings to the shared state, and the Writer consumes them without needing to know how the research was done.
API Contract (e.g., TypeScript Interface): This is analogous to our State Schema. The UserService guarantees that its response will conform to a specific User interface. This is a contract that other services can rely on. In our agent system, the shared state is governed by a strict TypeScript interface, ensuring that the Researcher's output is in a format the Writer can understand.

// This is a conceptual interface, not code for this chapter.
// It defines the "API Contract" for our shared state.

interface ResearcherState {
  topic: string;
  researchNotes: string[]; // Raw data points gathered
  synthesizedSummary: string; // Key findings
  sources: string[]; // Citations
  iteration: number;
}

interface WriterState extends ResearcherState {
  draft: string;
  reviewFeedback: string;
  qualityScore: number;
}

This architectural pattern provides resilience. If the Writer fails to produce a good draft, we can isolate the failure and route the problem back to the Researcher or trigger a retry, just as a microservices orchestrator can handle a failing service.

Deep Dive: The Mechanics of Collaboration

To build this system, we need to understand the underlying mechanics that enable this collaboration. These are the building blocks of our multi-agent workflow.

1. Shared State Management

The shared state is the lifeblood of the multi-agent system. It's not just a simple data object; it's a carefully designed data structure that acts as the central nervous system for the entire workflow. In LangGraph, this is typically defined as a State object with specific keys. The power lies in how this state is updated.

Instead of agents overwriting the entire state, they typically modify specific keys. This is where utility types from TypeScript become indispensable. For example, we might define our state with Readonly properties to prevent accidental overwrites, or use Partial<T> when we want to allow nodes to update only a subset of the state.

// Conceptual State Definition using TypeScript Utility Types

// Base type for all our collaborative agent states
interface BaseAgentState {
  topic: string;
  iteration: number;
}

// The Researcher's state adds specific fields for its output
interface ResearcherState extends BaseAgentState {
  // Readonly ensures these properties can't be replaced, only modified
  readonly researchNotes: string[];
  readonly sources: string[];
  synthesizedSummary: string; // This can be replaced
}

// The Writer's state extends the researcher's state
interface WriterState extends ResearcherState {
  draft: string;
  reviewFeedback: string | null;
}

Why is this important? By using a shared state, we decouple the agents. The Researcher doesn't need to know that the Writer exists. It just needs to know how to populate its designated section of the shared workspace. This makes the system modular. We could swap out the Writer for a more advanced model without changing a single line of the Researcher's code.

2. Cyclical Control Flows and Conditional Edges

Linear workflows are simple but rigid. The real power of LangGraph comes from its ability to model complex, non-linear flows, including cycles. This is achieved through conditional edges.

A conditional edge is a decision point in the graph. Based on the current state, the graph decides which node to visit next. This is the mechanism that enables feedback loops.

Let's consider the workflow:

Researcher Node: Gathers information and updates the state.
Writer Node: Consumes the research and generates a draft.
Decision Node (Conditional Edge): This is not a traditional agent but a logic gate. It inspects the state, specifically the qualityScore or the presence of reviewFeedback.
- Condition A: If qualityScore is high, the flow proceeds to the final output node.
- Condition B: If qualityScore is low, the flow is routed back to the Researcher node to gather more context or back to the Writer node with the feedback.

This creates a cycle: Researcher -> Writer -> Review -> (if needed) Researcher -> .... This cyclical pattern is fundamental for tasks that require iteration and refinement, moving beyond simple request-response paradigms.

3. The Role of the Supervisor and Consensus

In some architectures, a dedicated Supervisor Node acts as a central router. It doesn't perform the task itself but directs the flow of work to the appropriate worker agent. The Supervisor's decision is based on the current state and a predefined set of rules.

This is where the Consensus Mechanism becomes relevant. Imagine a scenario where we have multiple "Writer" agents, each with a slightly different style or expertise. The Supervisor could dispatch the task to all of them simultaneously. Their outputs would be collected in the shared state. A final "Reviewer" node (or the Supervisor itself) would then be responsible for synthesizing these multiple outputs into a single, robust final answer. This pattern is inspired by techniques like Mixture of Experts (MoE) and is crucial for generating high-quality, diverse, and reliable results.

Analogy: Think of a jury. The judge (Supervisor) gives instructions (the task). Each juror (Worker Agent) deliberates independently. The foreperson (Reviewer Node) then compiles the individual opinions into a final verdict (Consensus).

Visualizing the Collaborative Workflow

The following diagram illustrates the cyclical control flow between the Researcher and Writer agents, managed by a shared state and conditional edges.

This diagram illustrates the cyclical control flow between a Researcher and a Writer agent, where the Researcher gathers information, the Writer drafts content, and a Reviewer node compiles individual opinions into a final verdict.

This diagram shows that the workflow is not a straight line. The ReviewNode acts as a router, directing the flow based on the quality of the work produced. This creates a dynamic system that can adapt and refine its output until it meets a certain standard, a capability that is impossible in a simple linear chain.

By mastering these concepts—shared state, cyclical flows, and conditional routing—we move from building simple agents to engineering sophisticated, collaborative multi-agent systems that can tackle complex, real-world problems.

Basic Code Example

In a SaaS (Software as a Service) application, you often need to generate complex, high-quality content dynamically—such as personalized blog posts, marketing copy, or technical documentation. A single AI model might struggle with depth or accuracy. Instead, we can orchestrate a multi-agent workflow where a Researcher agent gathers and synthesizes information, and a Writer agent transforms that raw data into polished prose.

This example simulates a "Content Generation Service" where: 1. The Researcher acts as an information gatherer (simulating an API call or database query). 2. The Writer acts as a content creator, strictly adhering to the Researcher's findings. 3. The Supervisor (or Orchestrator) manages the state and flow using LangGraph.js, ensuring the workflow is cyclical and allows for iterative refinement.

We will use LangGraph.js to define the state, nodes, and conditional edges that control this collaboration.

Visualizing the Workflow

The graph below illustrates the flow of data and logic. Note the conditional edge (review_writing) that allows the Supervisor to send the draft back to the Writer for refinement based on a quality check.

A diagram depicting a workflow where a Supervisor’s conditional edge (review_writing) allows a draft to be sent back to the Writer for refinement based on a quality check. — A diagram depicting a workflow where a Supervisor’s conditional edge (`review_writing`) allows a draft to be sent back to the Writer for refinement based on a quality check.

Complete TypeScript Code Example

This code is fully self-contained. It simulates the LLM calls (to avoid requiring API keys) but demonstrates the exact structure of a LangGraph implementation.

/**
 * Collaborative Agents: Researcher & Writer Example
 * 
 * Context: SaaS Content Generation Service
 * Framework: LangGraph.js (simulated structure for clarity)
 * Language: TypeScript
 */

// 1. Define Shared State
// The state is the "source of truth" passed between nodes.
interface AgentState {
  topic: string;
  researchData: string | null;
  draft: string | null;
  feedback: string | null;
  iterationCount: number;
}

// 2. Mock LLM Service
// In a real app, this would be an API call to OpenAI/Anthropic.
// We simulate deterministic behavior for this example.
const mockLLM = async (prompt: string, role: "researcher" | "writer"): Promise<string> => {
  console.log(`\n[LLM Call - ${role.toUpperCase()}]: Processing...`);
  // Simulate network delay
  await new Promise(resolve => setTimeout(resolve, 500));

  if (role === "researcher") {
    return `Research Summary for "${prompt}": 
    - Key Point 1: LangGraph enables cyclic workflows.
    - Key Point 2: State management is crucial for agent memory.
    - Key Point 3: Conditional edges handle logic branching.`;
  } else {
    // Writer logic depends on context (simulated via prompt string check)
    if (prompt.includes("Needs Revision")) {
      return `Refined Article: A deep dive into LangGraph cyclic workflows. 
      State management ensures continuity. Conditional edges allow dynamic routing.`;
    }
    return `Draft Article: LangGraph is a tool. It enables cyclic workflows.`;
  }
};

// 3. Node Functions
// Each node performs a specific transformation of the state.

/**
 * Researcher Node: Gathers raw data based on the topic.
 */
async function researchNode(state: AgentState): Promise<Partial<AgentState>> {
  const researchData = await mockLLM(state.topic, "researcher");
  return { researchData };
}

/**
 * Writer Node: Creates content based on research data and optional feedback.
 */
async function writerNode(state: AgentState): Promise<Partial<AgentState>> {
  let prompt = "";

  if (state.feedback) {
    // Incorporate feedback into the prompt for iterative refinement
    prompt = `Previous Feedback: "${state.feedback}"\nResearch Data: ${state.researchData}`;
  } else {
    // First pass: simple writing based on research
    prompt = `Research Data: ${state.researchData}`;
  }

  const draft = await mockLLM(prompt, "writer");
  return { draft, iterationCount: state.iterationCount + 1 };
}

/**
 * Supervisor Node: Reviews the draft and decides the next step.
 * This implements the "Consensus Mechanism" (in a simplified form) by 
 * evaluating the output against a standard.
 */
async function supervisorNode(state: AgentState): Promise<Partial<AgentState>> {
  // Simple heuristic: If the draft is short, request a revision.
  // In a real app, this would be an LLM call judging quality.
  const isQualityMet = (state.draft?.length || 0) > 60;

  if (!isQualityMet) {
    console.log("[Supervisor]: Draft too short. Requesting revision.");
    return { feedback: "Please expand on the concepts and ensure the tone is professional." };
  }

  console.log("[Supervisor]: Content approved.");
  return { feedback: null }; // Clear feedback to signal completion
}

// 4. Conditional Edge Logic
// Determines the path based on the state after the Supervisor node.

function shouldContinue(state: AgentState): "writer" | "end" {
  // If feedback exists, loop back to the writer.
  // If feedback is null, the workflow is complete.
  return state.feedback ? "writer" : "end";
}

// 5. Graph Definition (Simulated)
// In a real LangGraph implementation, you would use `new StateGraph(...)`.
// Here, we simulate the graph execution logic to make it runnable without dependencies.

async function runWorkflow(topic: string) {
  console.log(`--- Starting Workflow for Topic: "${topic}" ---`);

  // Initialize State
  let currentState: AgentState = {
    topic,
    researchData: null,
    draft: null,
    feedback: null,
    iterationCount: 0,
  };

  // Execution Loop (Simulating the Graph Runtime)
  // 1. Start -> Researcher
  console.log("\n[Step 1] Executing Researcher Node...");
  const researchUpdate = await researchNode(currentState);
  currentState = { ...currentState, ...researchUpdate };

  // 2. Researcher -> Writer
  console.log("\n[Step 2] Executing Writer Node...");
  const writeUpdate = await writerNode(currentState);
  currentState = { ...currentState, ...writeUpdate };

  // 3. Writer -> Supervisor (and Conditional Loop)
  let loop = true;
  while (loop) {
    console.log("\n[Step 3] Executing Supervisor Node...");
    const supervisorUpdate = await supervisorNode(currentState);
    currentState = { ...currentState, ...supervisorUpdate };

    // Check Conditional Edge
    const decision = shouldContinue(currentState);

    if (decision === "writer") {
      console.log("\n[Loop] Routing back to Writer Node...");
      const revisionUpdate = await writerNode(currentState);
      currentState = { ...currentState, ...revisionUpdate };
      // Loop continues to Supervisor again
    } else {
      console.log("\n[End] Workflow Complete.");
      loop = false;
    }
  }

  // Final Output
  console.log("\n=== FINAL DRAFT ===");
  console.log(currentState.draft);
  console.log("===================");
  console.log(`Total Iterations: ${currentState.iterationCount}`);
}

// 6. Execution Entry Point
// Run the application
(async () => {
  await runWorkflow("The benefits of LangGraph.js");
})();

Detailed Line-by-Line Explanation

1. State Definition (`interface AgentState`)

interface AgentState {
  topic: string;
  researchData: string | null;
  draft: string | null;
  feedback: string | null;
  iterationCount: number;
}

* Why: In LangGraph, the State acts as the memory for the entire workflow. It is passed from node to node. * How: We define a TypeScript interface to ensure type safety. feedback is the critical field that enables the cyclical control flow. If feedback is present, the graph knows to loop back to the Writer. iterationCount prevents infinite loops in production systems.

2. Mock LLM Service (`mockLLM`)

const mockLLM = async (prompt: string, role: "researcher" | "writer"): Promise<string> => { ... }

* Why: To make this example runnable without external API keys, we simulate the LLM. * Under the Hood: It returns deterministic strings based on the role. In a real SaaS app, this function would wrap the @langchain/openai SDK, passing the prompt and handling the response parsing.

3. The Nodes (Worker Agents)

A. Researcher Node

async function researchNode(state: AgentState): Promise<Partial<AgentState>> {
  const researchData = await mockLLM(state.topic, "researcher");
  return { researchData };
}

* Logic: It takes the current state, extracts the topic, and simulates gathering data. * Output: It returns a partial state update. LangGraph automatically merges this into the main state.

B. Writer Node

async function writerNode(state: AgentState): Promise<Partial<AgentState>> {
  let prompt = "";
  if (state.feedback) {
    prompt = `Previous Feedback: "${state.feedback}"\nResearch Data: ${state.researchData}`;
  } else {
    prompt = `Research Data: ${state.researchData}`;
  }
  const draft = await mockLLM(prompt, "writer");
  return { draft, iterationCount: state.iterationCount + 1 };
}

* Logic: This demonstrates Shared State usage. It checks state.feedback. * If feedback exists (from a previous loop), it constructs a prompt that instructs the LLM to revise the content based on that feedback. * If feedback is null, it performs the initial draft. * Iteration Count: We increment iterationCount here to track how many times the writer has worked on the document.

C. Supervisor Node

async function supervisorNode(state: AgentState): Promise<Partial<AgentState>> {
  const isQualityMet = (state.draft?.length || 0) > 60;
  if (!isQualityMet) {
    return { feedback: "Please expand on the concepts..." };
  }
  return { feedback: null };
}

* Logic: The Supervisor acts as the "Reviewer Node" in a hierarchical workflow. It evaluates the draft. * Decision Making: It sets the feedback field. This field is the signal for the conditional edge. * If feedback is a string (truthy), the workflow isn't done. * If feedback is null (falsy), the workflow is approved.

4. Conditional Edge Logic (`shouldContinue`)

function shouldContinue(state: AgentState): "writer" | "end" {
  return state.feedback ? "writer" : "end";
}

* Why: This is the brain of the cyclical flow. * How: In LangGraph, you map this function to edges. If it returns "writer", the graph routes to the Writer node. If it returns "end", it routes to the termination node.

5. Workflow Execution (`runWorkflow`)

async function runWorkflow(topic: string) {
  // ... initialization ...
  while (loop) {
    // ... node execution ...
    const decision = shouldContinue(currentState);
    if (decision === "writer") {
      // ... execute writer again ...
    } else {
      loop = false;
    }
  }
}

* Why: This simulates the LangGraph Runtime. * Under the Hood: 1. Initialization: We create the starting state object. 2. Linear Phase: We execute Researcher -> Writer -> Supervisor sequentially. 3. Cyclical Phase: We enter a while loop. Inside, we check the conditional edge. If the Supervisor rejected the draft, we route back to the Writer node without calling the Researcher again (optimization). 4. Termination: When shouldContinue returns "end", we break the loop and display the final result.

Common Pitfalls

When implementing this pattern in a production SaaS environment using TypeScript and LangGraph, watch out for these specific issues:

State Mutation (The "Reference" Trap)
- Issue: In JavaScript/TypeScript, objects are passed by reference. If you directly mutate the state object inside a node (e.g., state.draft = "new"), LangGraph's history tracking may break, or concurrent runs might interfere with each other.
- Fix: Always return a new object or a shallow copy of the properties you are updating. Use the spread operator: return { ...state, draft: newDraft }.
Async/Await Loop Starvation
- Issue: In the while loop of the execution phase, if an API call hangs or fails, the entire workflow freezes.
- Fix: Wrap LLM calls in try/catch blocks. Implement timeouts for the mockLLM (or real API) calls. In Vercel/serverless environments, ensure the total execution time of the loop stays within the timeout limit (usually 10s for Hobby plans).
Hallucinated JSON / Schema Mismatch
- Issue: When LLMs return data that is supposed to fit into the AgentState, they often return unstructured text instead of the expected string or object.
- Fix: Use Zod or LangChain's withStructuredOutput to enforce the schema at the node level. Do not trust the LLM to return a perfectly formatted string without validation.
Infinite Loops
- Issue: If the Supervisor's logic is flawed (e.g., it never sets feedback to null), the while loop will run indefinitely, consuming tokens and compute time.
- Fix: Always include a hard limit in the state (e.g., maxIterations: 5) and check state.iterationCount < state.maxIterations in your conditional edge or loop break condition.
Vercel/AWS Lambda Timeouts
- Issue: Serverless functions have strict timeouts (e.g., 10 seconds on Vercel Hobby). A multi-step agent workflow involving multiple LLM calls can easily exceed this.
- Fix:
  - For simple flows, keep the graph execution within the single function.
  - For complex flows, use LangGraph Cloud or a persistent backend (like Redis/SQS) to handle the state, allowing the workflow to resume after a timeout or wait for human input.

The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon

Loading knowledge check...

Code License: All code examples are released under the MIT License. Github repo.

All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.