Chapter 16: Building Agents with LangGraph.js

Theoretical Foundations

In the previous chapters, we established the building blocks of intelligent applications. We learned how to structure data validation with Zod, how to call the OpenAI API for generation, and how to orchestrate linear workflows using LangChain.js chains. A standard chain is like a conveyor belt in a factory: raw materials (input) enter at one end, pass through a series of fixed stations (functions or LLM calls), and emerge as a finished product (output) at the other end. Once the belt stops, the process is over. There is no memory of the journey, and if a defect is detected at the final station, the entire belt must be restarted.

LangGraph.js introduces a fundamental shift in this paradigm. It moves us from the conveyor belt to a state machine or a decision loop. Instead of a straight line, we build a graph where nodes represent actions (tools or reasoning steps) and edges represent the flow of control. This allows the agent to make decisions, act, observe results, and decide what to do next—potentially repeating steps, branching based on conditions, or even self-correcting.

To understand this deeply, let's use a web development analogy. Imagine building a complex Single Page Application (SPA) like a dashboard. A simple chain is like a static HTML page with a form that submits and reloads. A LangGraph agent, however, is like a modern React or Vue application with a global state manager (like Redux or Pinia). The application state (the "Graph State") is the single source of truth. User interactions (events) trigger reducers or actions (nodes) that update the state. The UI (the agent's output) reacts to these state changes. The application can persist this state to localStorage (the Checkpointer) and rehydrate it later, allowing the user to resume exactly where they left off.

The Anatomy of the Agentic Graph: Nodes and Edges

At the heart of LangGraph are two primitive concepts: Nodes and Edges.

Nodes: These are the units of work. In our analogy, they are the API endpoints or the service functions in a microservices architecture. A node can be:
- An LLM call (the "brain" that reasons or decides).
- A Tool (an external function, like a calculator, a database query, or a web search).
- A Conditional Branch (a function that returns a string to determine the next edge).
Edges: These define the control flow. They are the routing logic in an API gateway or the middleware pipeline. Edges can be:
- Constant Edges: Unconditionally connect Node A to Node B.
- Conditional Edges: Connect Node A to Node B only if a specific condition is met (e.g., if (toolResult === "error") then go to "correction_node").

This structure creates a cyclical graph. Unlike a chain, the graph can loop back on itself. This is the essence of the "act-observe-think" loop that defines agentic behavior.

The Stateful Brain: Graph State and Zod Schemas

For the agent to be stateful, it needs a memory. In LangGraph, this is the Graph State. This is a shared data structure that is passed between every node and edge in the graph. Every action the agent takes can read from this state and write to it.

This is where our previous work with Zod becomes critical. We don't just use a plain JavaScript object for the state; we define a strict Zod schema for it. This schema acts as a type-safe contract for the entire graph.

Why is this important? In a complex, multi-step agent, you might have dozens of nodes. If one node accidentally corrupts the state (e.g., changes a string to a number), the entire graph could fail silently or behave unpredictably. A Zod schema ensures that the state remains valid at every step of the graph's execution. It's the equivalent of using TypeScript interfaces for your global Redux store—preventing runtime errors by enforcing structure at development time.

Let's visualize a simple agentic graph. This agent will use a tool, and based on the tool's output, it will either finish or ask the LLM to reflect and try again.

The diagram depicts an agent that first uses a tool, then either terminates the process or loops back for the LLM to reflect and retry based on the tool's output.

In this diagram, the loop llm_decision -> tool_execute -> check_result -> llm_reflect -> llm_decision is the key. The agent doesn't just execute once; it can iterate, refining its approach based on the results of its actions.

The Execution Loop: How LangGraph Runs

When you "compile" a LangGraph, you are essentially creating a state machine. The execution loop works as follows:

Initialization: The graph is provided with an initial state. This could be a fresh state object or a hydrated state retrieved from a checkpointer.
Node Execution: The graph identifies the starting node(s) and executes them. A node receives the current state as input and returns a partial update to the state.
State Update: LangGraph merges the node's output back into the main state object.
Edge Evaluation: The graph examines the outgoing edges from the executed node(s). For conditional edges, it evaluates the provided function (which has access to the updated state) to determine the next path.
Loop Until Terminal: This process repeats—moving from node to node, updating the state—until it reaches a node with no outgoing edges (a terminal node like "Finish").

This loop is what gives the agent its "reasoning" capability. The state acts as the agent's working memory, holding the conversation history, intermediate results, and any flags or metadata generated during execution.

Persistent Graph State Hydration: The "Save Game" Mechanism

One of the most powerful features for production agents is the ability to pause and resume. This is where Persistent Graph State Hydration comes into play.

Imagine you are playing a complex video game. You wouldn't want to start from the beginning every time you turn off the console. Instead, you save your progress to a file. Later, you load that file, and you appear exactly where you left off, with all your inventory, health, and quest progress intact.

In LangGraph, the Checkpointer is your save file. It's a storage backend (like SQLite, Postgres, or an in-memory store) that saves the state of the graph after every step.

Hydration is the process of loading that saved state back into the graph. When you call graph.continue() or graph.update() with a saved checkpoint, the graph doesn't start over. It reads the last saved state, reconstructs its internal execution context, and allows you to resume from the exact point of interruption.

Why is this critical? 1. Human-in-the-Loop: An agent might need approval before performing a critical action (e.g., making a purchase). The graph can pause, wait for human input, and then hydrate the state to continue with the approved action. 2. Long-Running Tasks: For tasks that take hours or days (e.g., complex research), you can pause the graph to free up resources and resume it later. 3. Error Recovery: If a node fails due to a transient error (e.g., an API rate limit), you can hydrate the state and retry just that node without re-running the entire workflow.

This is conceptually similar to server-side sessions in web development. The server maintains a session object (the state) for a user, persists it to a database, and rehydrates it on each request. The user's experience is seamless, even if the underlying server processes are stateless.

Under the Hood: The State Manager and the Checkpointer

Let's peek under the hood at how this is implemented. The state is managed by a StateManager that enforces the Zod schema. When a node runs, it doesn't directly mutate the state. Instead, it returns a Partial<State> object. LangGraph's internal reducer logic merges this partial object into the main state.

The checkpointer is an interface. You can implement it for any storage system. The key operations are: * put: Save a checkpoint (state + metadata like next nodes). * get: Retrieve the latest checkpoint for a given thread ID. * list: List all checkpoints for a thread.

When hydrating, the graph loads the state and the next nodes from the checkpoint. It then resumes execution from those nodes, ensuring no steps are lost or repeated unnecessarily.

Connecting to the Broader Ecosystem

This agentic graph doesn't exist in a vacuum. It integrates with the other components we've built: * RAG Pipeline: A node in the graph can be a full RAG pipeline. The agent can decide to retrieve information, synthesize it, and then use that synthesis to plan its next action. * Pinecone Client: A tool node can use the Pinecone client to perform a vector search, updating the state with retrieved documents before passing it to an LLM node for synthesis. * OpenAI API: The LLM nodes are powered by the OpenAI API, but now they are part of a larger, stateful loop rather than a single call.

In essence, LangGraph.js provides the orchestration layer that turns isolated AI capabilities (like RAG or tool use) into a cohesive, intelligent system that can reason over time and adapt to its environment. It's the transition from writing scripts to engineering autonomous systems.

Basic Code Example

In this "Hello World" example, we will build a minimal agentic workflow using LangGraph.js. We will simulate a Worker Agent Pool within a SaaS context: a Code Reviewer Agent that reviews code snippets. The agent will operate in a loop, allowing it to self-correct if its initial review is insufficient.

The architecture relies on a State Graph where: 1. State: Holds the conversation history and the current "review" status. 2. Nodes: Functions that perform actions (e.g., generating a review or verifying it). 3. Edges: Decision points that determine the next step based on the current state.

This example uses the Vercel AI SDK (@ai-sdk/openai) for the LLM interaction and Zod for state validation, framed within a server-side Node.js environment (common for Next.js Server Actions or API routes).

Visualizing the Workflow

The agent follows a cyclical path. It generates a review, checks it, and loops back if necessary.

This diagram illustrates the cyclical workflow where an AI agent iteratively generates a review, evaluates its quality, and loops back to refine the output until it meets the required standards.

The Code Example

This code is fully self-contained. It mocks the LLM response for stability but demonstrates the LangGraph structure, Zod validation, and the cyclical logic essential for agents.

// ==========================================
// IMPORTS
// ==========================================
import { z } from "zod";
import { StateGraph, END, START } from "@langchain/langgraph";

// ==========================================
// 1. STATE DEFINITION (Zod Schema)
// ==========================================
/**
 * Defines the state of our agent.
 * In a SaaS context, this represents the data passed between
 * server actions and the agent's internal memory.
 */
const AgentStateSchema = z.object({
  /**
   * The code snippet provided by the user.
   */
  codeSnippet: z.string(),

  /**
   * The generated review from the LLM.
   */
  review: z.string(),

  /**
   * A flag indicating if the review meets quality standards.
   * This drives the cyclical behavior (the "loop").
   */
  isReviewValid: z.boolean(),

  /**
   * Count of iterations to prevent infinite loops.
   */
  iterations: z.number().default(0),
});

// TypeScript type inference from Zod schema
type AgentState = z.infer<typeof AgentStateSchema>;

// ==========================================
// 2. NODE DEFINITIONS (Worker Agents)
// ==========================================

/**
 * Node 1: Generate Review
 * Simulates calling an LLM (like GPT-4) to generate a code review.
 * In a real app, this would use `generateText` from @ai-sdk/openai.
 */
async function generateReviewNode(state: AgentState): Promise<Partial<AgentState>> {
  console.log("🤖 [Node] Generating Review...");

  // Simulated LLM response (mocking OpenAI API)
  // In reality: const { text } = await generateText({ model: openai('gpt-4'), prompt: ... });
  const mockReview = "The code looks good, but consider adding error handling.";

  return {
    review: mockReview,
    iterations: state.iterations + 1,
  };
}

/**
 * Node 2: Check Review Quality
 * Simulates a validation step. 
 * In a real app, this might use a smaller LLM or a heuristic check.
 * For this demo, we force a "fail" on the first pass to demonstrate the loop.
 */
async function checkReviewNode(state: AgentState): Promise<Partial<AgentState>> {
  console.log(`🔍 [Node] Checking Review (Iteration ${state.iterations})...`);

  // LOGIC: If it's the first iteration, mark it invalid to force a loop.
  // Otherwise, mark it valid to exit.
  const isValid = state.iterations > 1;

  if (isValid) {
    console.log("✅ Review passed validation.");
  } else {
    console.log("❌ Review failed validation. Triggering retry...");
  }

  return {
    isReviewValid: isValid,
  };
}

// ==========================================
// 3. CONTROL FLOW (Edges)
// ==========================================

/**
 * Determines the next step based on the current state.
 * This is the "Brain" of the agent.
 */
function decideNextStep(state: AgentState): string {
  if (state.isReviewValid) {
    return "end"; // Go to END node
  }
  return "generate_review"; // Loop back to generate_review node
}

// ==========================================
// 4. GRAPH COMPILATION
// ==========================================

/**
 * Initializes the LangGraph.
 * This creates the stateful workflow.
 */
function createAgentGraph() {
  // Initialize the graph with the defined state schema
  const workflow = new StateGraph<AgentState>({
    stateSchema: AgentStateSchema,
    // In a real app, we'd pass an initial state object here
  });

  // Add nodes (Worker Agents)
  workflow.addNode("generate_review", generateReviewNode);
  workflow.addNode("check_review", checkReviewNode);
  workflow.addNode("end", () => {
    console.log("🏁 [Node] Workflow Complete.");
    return {};
  });

  // Define Edges (Control Flow)

  // 1. Start -> Generate Review
  workflow.addEdge(START, "generate_review");

  // 2. Generate Review -> Check Review (Always happens after generation)
  workflow.addEdge("generate_review", "check_review");

  // 3. Check Review -> Conditional Edge (Loop or End)
  // We use `addConditionalEdges` to create the cyclical logic.
  workflow.addConditionalEdges(
    "check_review", 
    decideNextStep, 
    {
      "generate_review": "generate_review", // If logic returns "generate_review"
      "end": END, // If logic returns "end"
    }
  );

  return workflow.compile();
}

// ==========================================
// 5. EXECUTION (Main Entry Point)
// ==========================================

/**
 * Runs the agent with a user's request.
 * In a SaaS app, this function would be a Server Action.
 */
async function runAgent() {
  console.log("--- Starting Agent Workflow ---");

  const graph = createAgentGraph();

  // Initial State: User provides code, review is empty, invalid by default.
  const initialState: AgentState = {
    codeSnippet: "function add(a, b) { return a + b; }",
    review: "",
    isReviewValid: false,
    iterations: 0,
  };

  // Stream execution (Simulates a real-time WebSocket or Server-Sent Events)
  const stream = await graph.stream(initialState);

  // Process each step in the graph
  for await (const step of stream) {
    // 'step' contains the state updates for the specific node
    const nodeName = Object.keys(step)[0];
    const stateUpdate = step[nodeName];

    console.log(`--- Step Update: ${nodeName} ---`);
    console.log(JSON.stringify(stateUpdate, null, 2));
  }

  console.log("--- Final State ---");
  // To get the full final state, we can run the graph one last time or accumulate it.
  // For simplicity, we assume the last stream update contains the final state.
}

// Execute the example
runAgent().catch(console.error);

Detailed Line-by-Line Explanation

1. Imports and State Definition

import { z } from "zod";: Imports Zod for schema validation. In an agent context, state integrity is critical. If an LLM hallucinates a malformed object, Zod catches it before it propagates.
import { StateGraph, END, START } from "@langchain/langgraph";: Imports the core LangGraph primitives.
- StateGraph: The container for our nodes and edges.
- START / END: Special symbols defining entry and exit points.
const AgentStateSchema = z.object({ ... }): Defines the "Shape" of our data.
- codeSnippet: The input from the user.
- review: The output from the LLM.
- isReviewValid: A boolean flag used for control flow. This is the key to cyclical behavior.
- iterations: A safety counter to prevent infinite loops (a common risk in agents).

2. Node Definitions (Worker Agents)

async function generateReviewNode(...):
- This represents a Worker Agent. Its sole job is to generate text.
- It accepts the current state and returns a Partial<AgentState>. LangGraph automatically merges this partial update into the global state.
- Under the hood: In a production app, this would wrap await generateText({ model: openai('gpt-4'), ... }).
async function checkReviewNode(...):
- This represents a Validation Agent. It acts as a gatekeeper.
- Logic: It checks state.iterations. If it's the first pass (1), it returns isReviewValid: false. This forces the graph to loop back.
- Why?: This simulates "Self-Correction." The agent realizes the review isn't good enough and tries again.

3. Control Flow (Edges)

function decideNextStep(state: AgentState): string:
- This is a Router. It looks at the state and decides where to go next.
- It returns a string key ("end" or "generate_review") that matches the node names defined in the graph.
workflow.addEdge(START, "generate_review");:
- Connects the entry point directly to the first worker node.
workflow.addConditionalEdges("check_review", decideNextStep, ...):
- This is the most critical line. It tells the graph: "After 'check_review' finishes, run the 'decideNextStep' function. Based on its return value, route to the specific node."

4. Graph Compilation & Execution

workflow.compile():
- Freezes the graph definition into an executable object. It performs validation to ensure all nodes and edges are connected correctly.
graph.stream(initialState):
- Streaming: Agents often take time to run. stream yields results as each node completes, rather than waiting for the entire process to finish. This is essential for UI responsiveness (e.g., showing "Generating..." or "Checking..." in a chat bubble).
The Loop in Action:
1. Start: generate_review runs. Review is created. Iteration becomes 1.
2. Check: check_review runs. Sees iteration is 1. Sets isReviewValid: false.
3. Decide: decideNextStep sees false. Returns "generate_review".
4. Loop: Graph routes back to generate_review.
5. Retry: generate_review runs again. Iteration becomes 2.
6. Check: check_review runs. Sees iteration is 2. Sets isReviewValid: true.
7. Decide: decideNextStep sees true. Returns "end".
8. Finish: Graph routes to END.

Common Pitfalls

When building agents with LangGraph.js in a TypeScript/Web App environment, watch out for these specific issues:

State Mutation & Async/Await Loops
- The Issue: LangGraph relies on immutability. If you mutate the state object directly (e.g., state.review = "new" instead of returning { review: "new" }), the graph's history tracking breaks.
- The Fix: Always return a new object (or partial object) from node functions. Ensure your node functions are async and you await the graph execution to prevent race conditions in the event loop.
Infinite Loops (The "Runaway Agent")
- The Issue: If your conditional edge logic (decideNextStep) never returns "end", the agent will loop forever. This consumes API tokens and can freeze your serverless function (hitting Vercel timeouts).
- The Fix: Always implement a max_iterations check in your state (as shown with iterations). If iterations > X, force the edge to return END.
Zod Parsing Errors (LLM Hallucinations)
- The Issue: LLMs are non-deterministic. If generateReviewNode returns { review: 123 } (a number) instead of a string, the Zod schema validation will throw an error when LangGraph tries to merge the state.
- The Fix: Use strict() mode in Zod schemas or implement a "Sanitization Node" that runs before the main logic to ensure the LLM's output conforms to the expected types.
Vercel/AWS Lambda Timeouts
- The Issue: Serverless functions have strict timeouts (e.g., 10s on Vercel Hobby). A multi-step agent with network calls (LLM + Database) can easily exceed this.
- The Fix:
  - Use stream to return partial results immediately to the client.
  - For long-running agents, use Vercel Background Functions or AWS Step Functions to decouple the execution from the initial HTTP request.

The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon

Loading knowledge check...

Code License: All code examples are released under the MIT License. Github repo.

All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.