Chapter 15: Handling Hallucinations and Errors in Plans

Theoretical Foundations

The fundamental challenge in building reliable AI agents is that large language models are probabilistic engines, not deterministic calculators. When an agent plans a sequence of steps to achieve a goal, it is essentially predicting the most likely sequence of tokens that represent a solution. This process is susceptible to two critical failure modes: hallucination (generating factually incorrect information or non-existent tools) and execution error (failing during the runtime of a plan due to invalid parameters, network issues, or logic gaps).

In the context of Microsoft Semantic Kernel, a "Plan" is a structured sequence of functions (native code or semantic functions) that an agent executes to fulfill a user's request. A hallucination in this context might manifest as an agent attempting to call a plugin function that does not exist, or generating arguments that violate the schema of a required function. An execution error occurs when the plan is syntactically valid but fails at runtime—for example, passing a string "twenty" to a parameter expecting an integer.

The Core Problem: The Illusion of Determinism

To understand the necessity of error handling in AI engineering, we must first understand the architecture of an LLM-powered agent. Unlike traditional software where if (x > 5) always evaluates to the same result given the same x, an LLM's output varies based on temperature, context window saturation, and subtle prompt nuances.

Consider the analogy of a Concierge in a Hotel. If you ask a human concierge for a dinner reservation, they might hallucinate a restaurant name that sounds plausible but doesn't exist, or they might try to book a table for 50 people at a cafe that only seats 10. In a traditional software system, a database query would return a hard "no" or an exception. In an agentic system, the LLM (the concierge) often lacks the immediate feedback loop of reality until it attempts to execute the action.

In Semantic Kernel, this manifests when an agent generates a Plan object. The agent might decide to call SearchEnginePlugin.Search with the query "best restaurants in Paris," but due to context limitations, it might hallucinate parameters (e.g., maxResults: "ten" instead of 10) or select a plugin that is not registered in the Kernel.

Theoretical Foundations

The theoretical approach to handling these errors relies on shifting from a "fire-and-forget" execution model to a Validation-First or Self-Correcting model. This requires introducing layers of indirection and verification into the planning process.

1. Structured Output Enforcement (The Schema Contract)

One of the primary sources of hallucination is unstructured text generation. When an LLM generates a plan, it might output a string like: "I will search for weather and then book a flight." This is ambiguous. To mitigate this, we enforce structured output using JSON schemas or specific object types.

Why this matters: By defining a class (e.g., FunctionCallRequest) with required properties and data types, we force the LLM to adhere to a contract. If the LLM attempts to hallucinate a property that doesn't exist or uses the wrong type, the deserialization process fails before execution, allowing for a recovery path. * C# Feature: Records and Init-only properties are ideal here. They represent immutable data contracts that map directly to the JSON structures the LLM must generate.

// Conceptual definition of a structured plan step
public record FunctionCallRequest
{
    public required string FunctionName { get; init; }
    public required Dictionary<string, object> Parameters { get; init; }
    public string? Reasoning { get; init; }
}

2. The Retry Pattern and Idempotency

In Chapter 13, we discussed Plugin Invocation and the immediate execution of functions. However, in a hostile environment where LLMs are unpredictable, we must apply the Retry Pattern. This is a standard software engineering concept (often used for network calls) adapted for AI logic errors.

The core idea is that an execution failure is not necessarily a terminal state. If a function call fails due to a parameter error (a hallucination of data types), the system should not simply abort. Instead, it should capture the exception, feed the error message back into the LLM (along with the original plan and goal), and ask the LLM to generate a corrected plan.

The Analogy: The Chef and the Sous-Chef Imagine a Head Chef (LLM) writing a recipe (Plan). The Sous-Chef (Semantic Kernel Runtime) actually cooks the meal. If the Sous-Chef tries to add "three cups of salt" (a hallucination of quantity) and the pot overflows, the Sous-Chef doesn't throw the pot away immediately. Instead, they shout, "Chef! The salt is overflowing!" The Chef looks at the situation, realizes the error, and writes a new instruction: "Add three teaspoons of salt."

In code, this is implemented using try-catch blocks around the plan execution, specifically catching KernelException or FunctionInvocationException.

3. Reflection and Self-Correction Loops

Reflection in agentic patterns refers to the agent's ability to observe its own output and reasoning. This is a step beyond simple retrying; it involves an explicit validation step.

We can architect a system where the generation of a plan is decoupled from its execution. The agent generates a candidate plan, and then a secondary "Validator Agent" or a "Reflection Step" reviews this plan against a set of constraints (e.g., "Does this plan use only available plugins?", "Are the arguments within safe bounds?").

The Analogy: The Architect and the Structural Engineer An Architect (the planning LLM) designs a building (the plan). Before construction begins, a Structural Engineer (the validation LLM or logic) reviews the blueprints. If the engineer sees a column that is too thin for the load (a logic error or hallucination of physics), they flag it. The architect must then revise the design. Only once the engineer signs off does the construction crew (the execution engine) begin work.

In C#, this pattern is elegantly implemented using Delegates and Middleware. We can wrap the execution of a plan in a middleware pipeline. The middleware acts as the structural engineer, inspecting the plan execution context before allowing it to proceed to the next stage.

4. Fallback Mechanisms and Degradation

Strategy A (High Fidelity): Use a complex, multi-step plan with strict validation. * Strategy B (Degradation): If Strategy A fails after $N$ retries, the system falls back to a simpler "Direct Execution" mode, bypassing the complex planning phase entirely.

This is crucial for Latency vs. Accuracy trade-offs. In a real-time chat application, a user asking "What is the capital of France?" does not need a 5-step plan. If the agent hallucinates a plan to search the web, summarize the result, and then translate it, it wastes time. A fallback mechanism detects the simplicity of the query and executes a direct function call.

Visualizing the Agentic Error Flow

To visualize how these theoretical components interact, consider the flow of data through a resilient Semantic Kernel agent. The diagram below illustrates the "Plan-Execute-Validate" loop.

This diagram illustrates the continuous Plan-Execute-Validate loop where an AI agent plans a task, executes the plan, and then validates the results to detect and correct errors.

Architectural Implications in C

The theoretical foundation of error handling in AI engineering dictates specific architectural choices in C#. We move away from simple imperative code towards Functional Composition and Result Objects.

Instead of methods throwing exceptions that crash the application, we prefer returning Result<T> types (a concept familiar to those who use libraries like LanguageExt or OneOf). This allows the calling code to handle the "Left" (error) case or the "Right" (success) case explicitly.

Furthermore, the use of Interfaces is paramount. As discussed in previous chapters regarding model swapping, we rely on abstractions. For error handling, we define an IPlanValidator interface. This allows us to swap between different validation strategies—perhaps a strict validator for financial transactions and a lenient one for casual conversation—without changing the core execution logic.

// Conceptual Interface for Validation
public interface IPlanValidator
{
    ValidationResult Validate(Plan plan);
}

public record ValidationResult(bool IsValid, string? ErrorMessage = null);

By strictly separating the generation, validation, and execution phases, and by utilizing modern C# features like Records for data contracts and Middleware for pipeline interception, we transform a fragile, hallucination-prone agent into a robust, self-correcting system capable of handling the inherent unpredictability of Large Language Models.

Basic Code Example

Here is a simple, self-contained example demonstrating how to implement a basic validation and retry loop to handle potential hallucinations in an AI agent's plan.

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.AI.ChatCompletion;
using System.ComponentModel;
using System.Text.Json;
using System.Text.Json.Serialization;

// 1. Define the data structure for the plan output.
// We enforce structured output to make validation easier and to reduce hallucination risks.
public class ShoppingPlan
{
    [JsonPropertyName("items")]
    public List<string> Items { get; set; } = new();

    [JsonPropertyName("total_price")]
    public decimal TotalPrice { get; set; }

    [JsonPropertyName("currency")]
    public string Currency { get; set; } = "USD";
}

// 2. Define a plugin with a validation function.
// This simulates a business rule check (e.g., budget constraint).
public class BudgetPlugin
{
    [KernelFunction("validate_budget")]
    [Description("Checks if the total price of the shopping list is within the budget.")]
    public bool ValidateBudget(
        [Description("The total price of the items")] decimal totalPrice,
        [Description("The maximum budget allowed")] decimal budgetLimit)
    {
        return totalPrice <= budgetLimit;
    }
}

// 3. The Main Application Logic
class Program
{
    static async Task Main(string[] args)
    {
        // Initialize the Kernel (mocked for this example to be self-contained).
        // In a real scenario, you would use KernelBuilder.CreateBuilder() with Azure OpenAI or OpenAI services.
        var kernel = new KernelBuilder().Build();

        // Register the budget plugin
        kernel.ImportPluginFromObject(new BudgetPlugin(), "Budget");

        // Mock the AI Chat Completion Service
        // We simulate an AI that might hallucinate a high price or invalid format.
        var mockChatCompletion = new MockChatCompletionService();
        kernel.Plugins.AddFromObject(new MockAIService(mockChatCompletion));

        // Define the budget constraint
        decimal budgetLimit = 50.00m;

        Console.WriteLine($"Starting shopping plan generation with budget: ${budgetLimit}");

        // 4. The Validation Loop
        // We attempt to generate a plan and validate it. If it fails, we retry.
        int maxRetries = 3;
        int attempt = 0;
        bool isValid = false;
        ShoppingPlan? finalPlan = null;

        while (attempt < maxRetries && !isValid)
        {
            attempt++;
            Console.WriteLine($"\n--- Attempt {attempt} ---");

            try
            {
                // Step A: Generate the plan (Simulated AI Call)
                // The AI generates a JSON string representing the shopping list.
                string planJson = await mockChatCompletion.GeneratePlanAsync();
                Console.WriteLine($"AI Generated Plan: {planJson}");

                // Step B: Parse the output
                // We use System.Text.Json to strictly parse the expected structure.
                var plan = JsonSerializer.Deserialize<ShoppingPlan>(planJson);

                if (plan == null)
                {
                    Console.WriteLine("Error: Failed to parse plan structure.");
                    continue; // Retry
                }

                // Step C: Validate the plan using the Kernel Function
                // We invoke the registered plugin to check business logic (budget).
                var result = await kernel.InvokeAsync<bool>("Budget", "validate_budget", new()
                {
                    ["totalPrice"] = plan.TotalPrice,
                    ["budgetLimit"] = budgetLimit
                });

                if (result)
                {
                    isValid = true;
                    finalPlan = plan;
                    Console.WriteLine("Success: Plan is valid and within budget.");
                }
                else
                {
                    Console.WriteLine($"Validation Error: Total price ${plan.TotalPrice} exceeds budget ${budgetLimit}.");
                    // In a real loop, you might pass this error back to the AI to correct its next attempt.
                }
            }
            catch (JsonException ex)
            {
                Console.WriteLine($"Parsing Error: AI output was not valid JSON. Details: {ex.Message}");
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Unexpected Error: {ex.Message}");
            }
        }

        if (finalPlan != null)
        {
            Console.WriteLine($"\nFinal Approved Plan: {string.Join(", ", finalPlan.Items)} for ${finalPlan.TotalPrice}");
        }
        else
        {
            Console.WriteLine("\nFailed to generate a valid plan after max retries.");
        }
    }
}

// --- Mock Services for Demonstration Purposes ---
// These classes simulate the AI behavior without requiring external API keys.

public class MockChatCompletionService
{
    private int _callCount = 0;

    public async Task<string> GeneratePlanAsync()
    {
        await Task.Delay(100); // Simulate network latency
        _callCount++;

        // Simulating Hallucinations:
        // 1. First attempt: Hallucinates a price way over budget.
        // 2. Second attempt: Hallucinates a malformed JSON.
        // 3. Third attempt: Returns a valid, correct plan.

        if (_callCount == 1)
        {
            return """
            {
                "items": ["Laptop", "Mouse", "Keyboard"],
                "total_price": 999.99,
                "currency": "USD"
            }
            """;
        }
        else if (_callCount == 2)
        {
            return "This is not JSON, this is a hallucination."; // Hallucination: Wrong format
        }
        else
        {
            return """
            {
                "items": ["Notebook", "Pen"],
                "total_price": 45.00,
                "currency": "USD"
            }
            """;
        }
    }
}

public class MockAIService(MockChatCompletionService service)
{
    [KernelFunction("generate_shopping_plan")]
    [Description("Generates a shopping list based on user intent.")]
    public async Task<string> GeneratePlan()
    {
        return await service.GeneratePlanAsync();
    }
}

Detailed Explanation

Data Structure Definition (ShoppingPlan class): * We define a C# class representing the expected output of the AI agent. * [JsonPropertyName(...)] attributes are used to map JSON properties to C# properties. This is crucial because AI models often return JSON with specific key names (e.g., total_price vs TotalPrice). * By defining a strict schema, we create a "contract" that the AI must adhere to. If the AI hallucinates a different structure, the deserialization step will fail, allowing us to catch the error immediately.

Business Logic Plugin (BudgetPlugin class):
- This class acts as a "Validator" or "Guardrail."
- The ValidateBudget method encapsulates a simple business rule: the total cost must not exceed the budget.
- In a production environment, this could be complex logic involving inventory checks, regulatory compliance, or safety guidelines.
- By exposing this as a Kernel Function ([KernelFunction]), we allow the agentic workflow to invoke validation logic programmatically.
Kernel Initialization:
- We instantiate the Semantic Kernel.
- We import the BudgetPlugin into the kernel under the namespace "Budget." This makes the validation logic available to the agent loop.
The Validation Loop (The Core Logic):
- Retry Strategy: We implement a while loop with a maxRetries limit (set to 3). This is a fundamental pattern for handling transient errors or hallucinations.
- Attempt Counter: attempt++ tracks how many times we have tried to generate a valid plan.
- Try-Catch Block: This block wraps the entire generation and validation process.
  - Generation: We call the mock AI service to get a plan.
  - Parsing: JsonSerializer.Deserialize<ShoppingPlan> attempts to convert the string output into our strongly typed object.
    - Why this matters: If the AI hallucinates invalid JSON (e.g., forgetting a comma or using text instead of JSON), this line throws a JsonException. We catch this specifically to inform the user that the format was wrong.
  - Validation: kernel.InvokeAsync<bool> calls the validate_budget function we registered earlier.
    - We pass the totalPrice from the parsed plan and the budgetLimit.
    - If the result is true, we set isValid = true to break the loop.
    - If false, we print an error message. In a more advanced loop, we might feed this error message back into the AI's context window so it can self-correct in the next iteration.
Mock Services:
- MockChatCompletionService simulates an LLM.
- It intentionally introduces errors to demonstrate the code's resilience:
  - Attempt 1: Returns a valid JSON with a price ($999.99) far exceeding the budget ($50.00). This tests the business logic validation.
  - Attempt 2: Returns plain text instead of JSON. This tests the JsonException handling.
  - Attempt 3: Returns a valid JSON with a price ($45.00) within the budget. This represents a successful, non-hallucinated response.

Trusting Unvalidated Output: * Mistake: Assuming the AI's response is structurally correct and logically sound without parsing or validation. * Consequence: If the AI returns "I can't do that" or malformed JSON, and you try to cast it directly to an object or pass it to a database, your application will crash or corrupt data. * Fix: Always use try-catch blocks around JSON deserialization and validate business rules explicitly (like the budget check).

Infinite Loops:
- Mistake: Implementing a retry mechanism without a maximum attempt limit (maxRetries).
- Consequence: If the AI consistently hallucinates or the validation logic is impossible to satisfy (e.g., user asks for a $1000 item with a $10 budget), the loop will run forever, consuming resources and hanging the application.
- Fix: Always set a hard limit on retries and handle the failure state gracefully (e.g., "I cannot fulfill this request within the constraints").
Ignoring Parsing Errors:
- Mistake: Catching only general exceptions or ignoring JsonException.
- Consequence: You might treat a formatting error as a logic error. Knowing the difference helps in debugging—formatting errors require prompt engineering to enforce JSON structure, while logic errors require better constraint handling.
- Fix: Catch specific exceptions (JsonException) to handle format issues separately from business logic failures.

Visualizing the Flow

The following diagram illustrates the control flow of the validation and retry mechanism.

This diagram illustrates the control flow where a specific JsonException is caught to handle format issues separately from business logic failures, enabling a robust validation and retry mechanism. — This diagram illustrates the control flow where a specific `JsonException` is caught to handle format issues separately from business logic failures, enabling a robust validation and retry mechanism.

The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon

Loading knowledge check...

Code License: All code examples are released under the MIT License. Github repo.

All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.