Chapter 17: Reflection Agents - Self-Correction Loops

Theoretical Foundations

In the landscape of autonomous agents, the ability to generate a response is only half the battle; the other half is ensuring that response is correct, coherent, and contextually appropriate. The Reflection Pattern addresses this by introducing a self-correction loop. Unlike a linear agent that executes a task and immediately returns the result, a reflection agent incorporates a critical evaluation step. It treats its own output not as a final product, but as a draft subject to scrutiny.

This pattern fundamentally shifts the agent's architecture from a simple "generate-and-return" pipeline to a "generate-critique-refine" cycle. The core mechanism involves an agent (often the same one that generated the initial output, or a specialized "critic" agent) analyzing the draft for flaws. If flaws are detected, the agent re-enters the generation phase with the added context of the critique, aiming to produce an improved version. This loop continues until a quality gate—a predefined set of criteria—is satisfied or a maximum iteration limit is reached.

The "Why": Mitigating Hallucination and Enhancing Reliability

The primary motivation for the Reflection Pattern is the inherent non-determinism and potential fallibility of Large Language Models (LLMs). LLMs can produce "hallucinations" (factually incorrect information), logical inconsistencies, or outputs that simply don't align with the user's intent. In a production environment, deploying an agent that occasionally provides incorrect answers is often unacceptable.

Consider a web development analogy: Unit Testing and Continuous Integration (CI). When a developer writes code, they don't just write it and push it to production. They write unit tests to verify individual functions. They run a CI pipeline that lints the code, runs tests, and checks for build errors. The Reflection Pattern is the agent's equivalent of this CI pipeline. The initial generation is the "code commit," and the reflection step is the "automated test suite." If the tests fail (the critique finds errors), the code must be fixed (regenerated) before it can be deployed (returned to the user).

This is crucial for reliability. In a multi-agent system, if one agent produces faulty output, that error propagates downstream, potentially causing a cascade of failures. By implementing a reflection loop, we create a robustness layer that catches and corrects errors before they contaminate the broader workflow.

The "How": The Cyclical Graph Structure

The Reflection Pattern is implemented using a Cyclical Graph Structure. This is a LangGraph design where an edge points back to a previously executed node. This creates a loop that can iterate multiple times until a termination condition is met.

Let's break down the nodes and edges in a typical reflection graph:

Generator Node (Drafting): This is the initial agent responsible for producing the first draft of the response. It takes the user's prompt and any relevant context as input.
Evaluator/Critic Node (Reviewing): This node receives the draft from the Generator. It evaluates the draft based on specific criteria (e.g., factual accuracy, adherence to format, logical consistency). It outputs a critique and a binary flag indicating whether the draft is "acceptable" or "needs revision."
Router Node (Decision Making): This node acts as the decision point. It checks the output of the Evaluator. If the draft is acceptable, the graph proceeds to the final output node. If it needs revision, the graph loops back to the Generator Node.
Final Output Node (Deploying): Once the draft passes the quality gate, this node formats and returns the final response to the user.

The cyclical nature is what makes this pattern powerful. It's not a one-shot attempt; it's an iterative refinement process. The agent learns from its own mistakes (via the critique) and attempts to correct them, much like a writer revising a manuscript based on editorial feedback.

Visualizing the Cyclical Graph

The following Graphviz DOT diagram illustrates the cyclical structure of a Reflection Agent. Notice the loop from the "Evaluator/Critic" node back to the "Generator" node.

A Graphviz DOT diagram visualizes the cyclical structure of a Reflection Agent by depicting a loop that connects the Evaluator/Critic node back to the Generator node.

Analogy: The Microservices Architecture

To understand the modularity and separation of concerns in the Reflection Pattern, let's use a Microservices Architecture analogy.

In a modern web application, you don't have one giant monolithic server handling every request. Instead, you have specialized microservices: * User Service: Manages user authentication and profiles. * Product Service: Manages product listings and inventory. * Order Service: Handles checkout and payments.

Each service is single-purpose and highly specialized. The Reflection Pattern operates similarly: * Generator Agent (Microservice A): Its sole job is to generate text. It doesn't worry about whether the text is correct; it just produces content based on its training and the prompt. * Evaluator Agent (Microservice B): Its sole job is to critique text. It is trained or prompted specifically to identify errors, inconsistencies, or deviations from guidelines. It doesn't generate content; it only analyzes it. * Orchestrator (API Gateway): The LangGraph acts as the orchestrator, routing requests between these microservices in a specific sequence (Generator -> Evaluator -> Router -> Generator or Final).

This separation of concerns is key. The Generator doesn't need to be burdened with the cognitive load of self-evaluation, which can dilute its creative or generative capabilities. The Evaluator can be a smaller, more efficient model fine-tuned for critique, reducing overall computational cost. This is analogous to using a lightweight, specialized microservice for a specific task rather than a heavy, general-purpose one.

Explicit Reference to Previous Concepts: The Supervisor Node and Worker Agent Pool

In Book 3, Chapter 15: Supervisor Nodes and Hierarchical Control, we introduced the concept of a Supervisor Node managing a Worker Agent Pool. The Supervisor acts as a router, delegating tasks to specialized worker agents (e.g., a Code Generator, a Database Query Agent, a Search Agent) based on the user's request.

The Reflection Pattern can be seen as a specialized, recursive instance of this hierarchy. In a standard Supervisor-Worker setup, the Supervisor routes a task to a worker, and the worker returns a result. In a Reflection setup, the "Supervisor" (or the Router Node) routes a task to a "Generator Worker." However, instead of accepting the result immediately, it routes that result to an "Evaluator Worker" for review. The Evaluator's output (the critique) is then fed back to the Supervisor, which makes a decision: either route the task back to the Generator Worker for another attempt or pass it to the Final Output node.

This demonstrates how the Supervisor pattern can be extended to create more complex, iterative workflows. The Supervisor isn't just a one-way dispatcher; it can be part of a feedback loop that enables self-correction and iterative refinement, adding a layer of quality control on top of the basic delegation mechanism.

The power of the Reflection Pattern lies in its iterative nature. Each loop provides an opportunity for improvement. Let's dissect what happens during each iteration:

Initial Generation: The Generator Node produces a draft. This draft is based on the initial prompt and the model's internal knowledge. It may contain subtle errors or omissions.
Critique Generation: The Evaluator Node analyzes the draft. The critique is typically a structured output (e.g., a JSON object) containing:
- Error Type: (e.g., "Factual Inaccuracy," "Logical Fallacy," "Formatting Error")
- Location: (e.g., "Sentence 3," "Code Block 1")
- Suggested Correction: (e.g., "Change 'Paris is the capital of Germany' to 'Paris is the capital of France'")
Routing Decision: The Router Node evaluates the critique's severity. A simple binary flag ("Pass/Fail") is often sufficient, but more sophisticated systems might use a scoring system (e.g., "If score < 0.8, loop back").
Regeneration with Context: If the draft fails, the Generator Node is invoked again. Crucially, it receives not just the original prompt, but also the critique from the Evaluator. This is the key to improvement. The prompt might be augmented with instructions like: "Your previous draft contained the following errors: [Critique]. Please regenerate the response, ensuring you address these issues."
Termination: The loop continues until:
- The draft passes the quality gate.
- A maximum number of iterations is reached (to prevent infinite loops).
- The critique indicates no further improvement is possible.

This process is analogous to gradient descent in machine learning. Each iteration is a step towards minimizing the "error" (the distance between the current output and the ideal output). The critique acts as the gradient, indicating the direction and magnitude of the error, guiding the next generation step.

Quality Gates: Defining "Good Enough"

A critical component of the Reflection Pattern is the Quality Gate. This is a set of criteria that a draft must meet to be considered acceptable. The definition of "good enough" is highly context-dependent and must be carefully designed.

Examples of Quality Gates: * Factual Accuracy: For a research assistant agent, the gate might require that all factual claims be verified against a trusted knowledge base. The Evaluator might cross-reference statements with external sources and flag any unverified claims. * Code Correctness: For a code generation agent, the gate might involve running the generated code through a linter or a unit test suite. If the code fails to compile or the tests fail, the gate is not passed. * Adherence to Format: For an agent generating structured data (e.g., JSON, XML), the gate might check for valid syntax and adherence to a predefined schema. * Coherence and Tone: For a customer service agent, the gate might evaluate whether the response is polite, clear, and maintains the brand's tone of voice.

The Evaluator Node is responsible for implementing these checks. It can use a combination of LLM-based evaluation (e.g., "Is this response helpful?") and deterministic checks (e.g., "Does this JSON parse correctly?").

Conclusion: Building Trustworthy Autonomous Systems

The Reflection Pattern is a cornerstone of building reliable and trustworthy autonomous agents. By moving beyond a single-shot generation model and embracing an iterative, self-critical approach, we can significantly enhance the quality and accuracy of agent outputs. The cyclical graph structure, inspired by proven software engineering principles like CI/CD and microservices, provides a robust framework for implementing this pattern. As we continue to build more complex multi-agent systems, the ability to self-correct and refine will be essential for deploying agents that can operate effectively in real-world, high-stakes environments.

Basic Code Example

In a SaaS application, such as a Customer Support Chatbot or an AI-Powered Content Generator, providing accurate and high-quality responses is paramount. The Reflection Pattern acts as a quality gate. Instead of generating a response and immediately sending it to the user, the agent first drafts a response, then internally critiques it against specific criteria (e.g., accuracy, tone, completeness). If the draft fails the critique, the agent refines it iteratively. This mimics a human writer's drafting and editing process, significantly reducing errors and hallucinations in the final output.

We will build a simple "AI Assistant" that generates a summary of a user's request. It will use a Reflection node to check if the summary is "too vague" and regenerate it if necessary.

LangGraph.js Visualizer

The following graph illustrates the flow of our Reflection Agent. The generate node creates the initial draft. The reflect node analyzes the draft. The should_refine edge acts as the conditional router based on the reflection.

The reflect node analyzes the draft, and the should_refine edge acts as a conditional router, directing the flow based on the reflection. — The `reflect` node analyzes the draft, and the `should_refine` edge acts as a conditional router, directing the flow based on the reflection.

The Code Example

This TypeScript code is fully self-contained. It simulates the LLM calls using mock functions to ensure it runs without external API keys. It demonstrates the state management, conditional routing, and iterative loops required for the Reflection Pattern.

/**
 * Reflection Agent: Self-Correction Loop
 * 
 * Context: SaaS AI Assistant for generating task summaries.
 * Objective: Ensure the generated summary is not "too vague" before finalizing.
 */

// 1. Define the State Interface
// This represents the data flowing through the graph.
interface AgentState {
    request: string;          // The user's original request
    draftSummary: string;     // The current draft of the summary
    critique: string;         // The feedback from the reflection node
    shouldRefine: boolean;    // Boolean flag for the conditional edge
    finalOutput: string;      // The final polished response
}

// 2. Mock LLM Function (Simulating an API Call)
// In a real app, this would be an OpenAI or Anthropic SDK call.
const mockLLMCall = async (prompt: string): Promise<string> => {
    // Simulate network latency
    await new Promise(resolve => setTimeout(resolve, 100));

    // Logic to simulate different outputs based on the prompt context
    if (prompt.includes("Critique the following summary")) {
        // The "Reflection" LLM call
        if (prompt.includes("Buy milk and eggs")) {
            return "The summary is too vague. It lacks context on why the items are needed or the deadline.";
        }
        return "The summary is clear and actionable.";
    } else {
        // The "Generation" LLM call
        if (prompt.includes("Buy milk and eggs for dinner tonight")) {
            return "Task: Purchase groceries. Items: Milk, Eggs. Deadline: Tonight.";
        }
        // Simulating a vague initial generation
        return "Buy milk and eggs";
    }
};

/**
 * Node A: Generate Initial Draft
 * Takes the user request and creates a first pass at the response.
 */
async function generateDraft(state: AgentState): Promise<AgentState> {
    console.log("🤖 [Node] Generating initial draft...");
    const prompt = `Summarize this request concisely: "${state.request}"`;
    const draft = await mockLLMCall(prompt);

    return {
        ...state,
        draftSummary: draft,
    };
}

/**
 * Node B: Reflect (Critique)
 * Analyzes the draft against quality criteria (e.g., vagueness).
 */
async function reflectOnDraft(state: AgentState): Promise<AgentState> {
    console.log("🔍 [Node] Reflecting on draft quality...");
    const prompt = `Critique the following summary for clarity and specificity: "${state.draftSummary}"`;
    const critique = await mockLLMCall(prompt);

    // Determine if refinement is needed based on the critique text
    const needsRefinement = critique.toLowerCase().includes("vague");

    return {
        ...state,
        critique: critique,
        shouldRefine: needsRefinement,
    };
}

/**
 * Node C: Refine (Regenerate)
 * Uses the critique to generate a better version of the draft.
 */
async function refineDraft(state: AgentState): Promise<AgentState> {
    console.log("✨ [Node] Refining draft based on critique...");
    const prompt = `Original Request: "${state.request}". Previous Summary: "${state.draftSummary}". Critique: "${state.critique}". Please rewrite the summary to address the critique.`;
    const refinedDraft = await mockLLMCall(prompt);

    return {
        ...state,
        draftSummary: refinedDraft,
        // Reset critique for the next iteration (optional, but good practice)
        critique: "", 
    };
}

/**
 * Node D: Finalize Output
 * Prepares the final response for the user.
 */
async function finalizeOutput(state: AgentState): Promise<AgentState> {
    console.log("✅ [Node] Finalizing output...");
    return {
        ...state,
        finalOutput: `Final Summary: ${state.draftSummary}`
    };
}

/**
 * Conditional Edge: Router
 * Determines the next step based on the reflection result.
 */
function router(state: AgentState): "refine" | "finalize" {
    if (state.shouldRefine) {
        return "refine";
    }
    return "finalize";
}

/**
 * Main Execution Loop
 * Simulates the LangGraph execution flow without the library dependencies.
 */
async function runReflectionAgent(userRequest: string) {
    // Initialize State
    let state: AgentState = {
        request: userRequest,
        draftSummary: "",
        critique: "",
        shouldRefine: false,
        finalOutput: "",
    };

    console.log(`\n--- Starting Session: "${userRequest}" ---\n`);

    // 1. Generate Initial Draft
    state = await generateDraft(state);
    console.log(`   Draft: "${state.draftSummary}"`);

    // 2. Reflect on the Draft
    state = await reflectOnDraft(state);
    console.log(`   Critique: "${state.critique}"`);
    console.log(`   Decision: ${state.shouldRefine ? "Refine" : "Finalize"}`);

    // 3. Conditional Loop (Simulating Graph Edges)
    const decision = router(state);

    if (decision === "refine") {
        // Loop back: Refine -> Reflect
        state = await refineDraft(state);
        console.log(`   Refined Draft: "${state.draftSummary}"`);

        // Run reflection again to verify the fix
        state = await reflectOnDraft(state);
        console.log(`   Second Critique: "${state.critique}"`);
    }

    // 4. Finalize
    state = await finalizeOutput(state);

    console.log(`\n--- Final Result ---`);
    console.log(state.finalOutput);
    console.log(`---------------------\n`);
}

// --- Execution ---

// Scenario 1: Initial draft is vague, triggers refinement
runReflectionAgent("Buy milk and eggs for dinner tonight");

// Scenario 2: Initial draft is sufficient (Run this separately to test)
// runReflectionAgent("Please reset my password via email");

Detailed Line-by-Line Explanation

1. State Definition (`interface AgentState`)

Lines 10-17: We define a TypeScript interface to enforce structure. In a LangGraph application, the State is the single source of truth.
shouldRefine: This is the critical boolean flag used by the conditional edge (router) to decide whether to loop back to the refinement node or proceed to the final output.

2. Mock LLM (`mockLLMCall`)

Lines 20-40: This function simulates the asynchronous nature of calling an LLM (like GPT-4).
Context Awareness: It checks the prompt string to return different "mocked" responses. This allows us to demonstrate the logic of the reflection loop without needing a real API key.
- Why: It simulates the "Vague" response for the specific input "Buy milk and eggs", triggering the self-correction loop.

3. The Nodes (Graph Operations)

A. generateDraft (Lines 44-54) * Logic: This is the entry point. It takes the raw user input and formats it into a prompt for the LLM. * Under the Hood: It awaits the mockLLMCall, updates the draftSummary field in the state, and returns a new state object (immutability pattern).

B. reflectOnDraft (Lines 58-73) * Logic: This is the "Reflection" node. It sends the previous draftSummary to the LLM with a specific instruction: "Critique this for clarity." * Quality Gate: It parses the LLM's response. If the critique contains the word "vague", it sets shouldRefine to true. This is a basic form of Tool Use Reflection, where the agent analyzes the observation (the critique) to adjust its internal state.

C. refineDraft (Lines 77-87) * Logic: This node is only triggered if the router decides to refine. It provides the LLM with the original request, the failed draft, and the critique. * Why: Giving the LLM the specific feedback ("The summary is too vague") drastically improves the quality of the second generation compared to just asking it to "try again."

D. finalizeOutput (Lines 91-97) * Logic: A simple wrapper to format the final string for the user interface.

4. The Router (`router`)

Lines 101-107: This function implements the critique-based routing.
Logic: It looks at the shouldRefine flag set by the reflect node.
- Returns "refine": The graph edge loops back to the refineDraft node.
- Returns "finalize": The graph edge moves forward to the finalizeOutput node.

5. Execution Flow (`runReflectionAgent`)

Lines 111-145: This function simulates the runtime of the LangGraph.
Imperative Simulation: While LangGraph is declarative (defined via edges and nodes), here we simulate the execution step-by-step to make the logic explicit for learning purposes.
- Step 1: Generate.
- Step 2: Reflect.
- Step 3: Check Router.
- Step 4 (Conditional): If refinement is needed, execute refineDraft and reflectOnDraft again. This demonstrates the iterative refinement loop.

Common Pitfalls in TypeScript & LangGraph.js

When implementing this pattern in a production SaaS environment (e.g., Vercel, Node.js), watch out for these specific issues:

1. Async/Await Loops and Timeouts * The Issue: Reflection agents often run multiple LLM calls in a single user request (Generate -> Reflect -> Refine -> Generate). If you are hosting on serverless platforms like Vercel, the default timeout is often 10 seconds. * The Risk: If the reflection loop iterates 3-4 times or the LLM is slow, the function times out, leaving the user with an error. * The Fix: Optimize the loop to run a maximum of 2-3 iterations. Use Promise.race to set custom timeouts for each LLM call within the graph.

2. Infinite Loops * The Issue: If your reflect node has a bug where it always returns shouldRefine: true, the graph will loop infinitely. * The Fix: Implement a max_iterations counter in your state. In the router, check if (state.iteration >= 3) force finalize.

3. Hallucinated JSON / Structured Output * The Issue: When passing the critique back into the prompt for refinement, the LLM might occasionally output non-text characters or attempt to format the output as JSON if not instructed otherwise, breaking the string concatenation. * The Fix: Use strict outputSchema definitions (if using tools like Zod) or explicitly prompt the LLM: "Respond only with plain text critique."

4. State Mutability * The Issue: TypeScript objects are mutable by reference. If you modify state directly in one node (e.g., state.draftSummary = "new"), it might cause race conditions in concurrent graph executions. * The Fix: Always return a new object spread from the previous state (as shown in the code: return { ...state, draftSummary: draft }). This ensures functional purity and makes time-travel debugging easier.

The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon

Loading knowledge check...

Code License: All code examples are released under the MIT License. Github repo.

All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.