Chapter 11: Delegates - Callbacks for Token Streaming

Theoretical Foundations

The Delegate design pattern in C# serves as the foundational mechanism for implementing asynchronous callbacks, which are essential for handling token streams generated by Large Language Models (LLMs). In AI applications, generation is rarely a synchronous, atomic operation; it is a continuous flow of data (tokens) arriving over time. To manage this without blocking the main execution thread, we rely on delegates to act as type-safe function pointers. These delegates allow the generation logic to "call back" to a consumer (such as a UI component or a logging service) whenever a new token is available or the stream terminates.

At its core, a delegate is a type that represents references to methods with a particular parameter list and return type. In the context of AI token streaming, we utilize delegates to decouple the Producer (the LLM generating tokens) from the Consumer (the UI or logging system). This decoupling is critical because the Producer should not need to know the specific implementation details of how a token is displayed or saved; it only needs to know that it has a mechanism to deliver the data.

To understand this, we must look back at Book 1, Chapter 4, where we established the concept of Interfaces. Just as interfaces allowed us to swap between different AI model providers (e.g., OpenAI vs. Local Llama) by defining a contract for the model's capabilities, delegates allow us to swap or chain different callback behaviors for handling the data those models produce. The delegate defines the contract for the callback: "I accept a string token and return void."

The `Action` and `Func` Delegates

C# provides built-in generic delegates to simplify this process. For token streaming, we primarily use Action<T>, which represents a method that takes one parameter and returns void.

using System;

// A standard Action delegate for handling a single token.
// This is the contract we will use for callbacks.
Action<string> onTokenReceived;

Imagine a busy restaurant (the AI Model) that cannot seat you immediately. Instead of forcing you to stand at the counter waiting (blocking the thread), they give you a pager (the Delegate).

Registration: You provide your phone number or take a pager. In code, this is registering the callback. You are telling the restaurant, "When the table is ready, notify this specific number."
The Wait: The restaurant continues its work (cooking food, cleaning tables). You are free to walk around, check your phone, or talk to friends (the UI remains responsive).
The Notification: When the table is ready (a token is generated), the restaurant presses a button on the pager (invoking the delegate). The pager buzzes (the callback method executes).
The Action: You see the light and walk to the host stand (the UI updates with the new token).

If the restaurant had no pagers (no delegates), you would have to stare at the host continuously, unable to do anything else until your table was ready. This is the difference between synchronous blocking and asynchronous callback-driven architecture.

Implementing Delegates for Token Streaming

In an AI application, we define a custom delegate or use an existing one to handle the stream. The most common pattern involves an event-driven approach where the generation engine fires events as tokens arrive.

Let's define a class responsible for generating text. It will expose a method that accepts a callback delegate. When the generator produces a token, it invokes that delegate.

using System;
using System.Collections.Generic;
using System.Threading.Tasks;

public class LLMGenerator
{
    // We define a delegate type specifically for token streaming.
    // This signature takes a string (the token) and returns void.
    public delegate void TokenCallback(string token);

    // A list of simulated tokens representing the LLM's output.
    private readonly List<string> _simulatedTokens = new List<string> 
    { 
        "The", " quick", " brown", " fox", " jumps", " over", " the", " lazy", " dog." 
    };

    /// <summary>
    /// Generates tokens asynchronously and invokes the callback for each token.
    /// </summary>
    /// <param name="onToken">The delegate to call when a token is generated.</param>
    /// <param name="onComplete">The delegate to call when the stream finishes.</param>
    public async Task GenerateAsync(TokenCallback onToken, Action onComplete)
    {
        foreach (var token in _simulatedTokens)
        {
            // Simulate network latency
            await Task.Delay(100); 

            // INVOKE THE DELEGATE
            // This is the "callback". The generator calls back to the caller
            // with the specific data.
            onToken?.Invoke(token);
        }

        // Signal that the stream is finished
        onComplete?.Invoke();
    }
}

Lambda Expressions: Concise Callbacks

In C#, we rarely define named methods just to pass them as delegates. Instead, we use Lambda Expressions. A lambda expression is an anonymous function that you can use to create delegates or expression tree types. Introduced in C# 3.0, lambdas allow us to write inline logic that matches the delegate signature.

For our AI token stream, using a lambda expression allows us to define exactly what happens to a token at the point of registration, keeping the logic localized and readable.

Syntax: (parameters) => { body }

Usage in AI Context: Instead of creating a separate method void HandleToken(string t) { ... }, we can write:

public class AIClient
{
    public async Task RunQuery()
    {
        var generator = new LLMGenerator();

        Console.WriteLine("Starting generation...");

        // We register a Lambda Expression as the callback.
        // The 'token' parameter matches the delegate signature.
        // The code inside the lambda executes immediately when the delegate is invoked.
        await generator.GenerateAsync(
            onToken: (token) => 
            {
                // Real-time processing of the token
                Console.Write(token);
            },
            onComplete: () => 
            {
                Console.WriteLine("\n[Stream Complete]");
            }
        );
    }
}

Architectural Implications: Decoupling and Reusability

The power of delegates and lambdas in AI systems lies in their ability to compose complex behaviors from simple, reusable parts.

Separation of Concerns: The LLMGenerator class is responsible only for producing tokens. It does not know if the token is being printed to a console, rendered in a Unity text mesh, or saved to a database. This adheres to the Single Responsibility Principle.
Composition: Because delegates are first-class citizens, we can pass them around, store them in lists, or chain them. We can have a LoggingDelegate that writes to a file and a UIDelegate that updates a progress bar. Both can be invoked sequentially for every token.

Advanced Scenario: Chaining Delegates (Multicast Delegates)

C# delegates are multicast by default. This means a single delegate variable can hold references to multiple methods. When invoked, all methods are called in the order they were added.

This is incredibly useful for AI pipelines where one token might need to trigger multiple actions simultaneously.

public class AIPipeline
{
    public delegate void TokenEvent(string token);

    public void RunPipeline()
    {
        var generator = new LLMGenerator();

        // Create a multicast delegate
        TokenEvent pipeline = null;

        // Add a logging delegate (using a lambda)
        pipeline += (token) => { /* Log to file */ Console.WriteLine($"[LOG]: {token}"); };

        // Add a UI update delegate (using a lambda)
        pipeline += (token) => { /* Update UI */ Console.WriteLine($"[UI]: {token}"); };

        // We pass the entire chain as the callback
        // Note: In a real async scenario, we'd need to handle async void carefully,
        // but for demonstration, we simulate the invocation.
        generator.GenerateAsync(
            onToken: pipeline, 
            onComplete: () => Console.WriteLine("Done")
        ).Wait();
    }
}

Visualizing the Data Flow

The following diagram illustrates the flow of control when using delegates for token streaming. Note that the flow is not linear; it involves a "call back" mechanism.

This diagram illustrates the non-linear flow of control in token streaming, where a Wait() call pauses execution and a callback mechanism later resumes processing by invoking a delegate to handle the incoming data. — This diagram illustrates the non-linear flow of control in token streaming, where a `Wait()` call pauses execution and a callback mechanism later resumes processing by invoking a delegate to handle the incoming data.

Edge Cases and Error Handling

When using delegates for asynchronous streams, specific edge cases must be managed:

Null References: Always check if the delegate is null before invoking it (onToken?.Invoke(token)). If the caller does not provide a callback, the generator should not crash.
Exception Propagation: If an exception occurs inside the lambda expression (e.g., a UI update fails), it propagates back to the generator. Because the callback is executed on the generator's thread (or the thread invoking the delegate), an unhandled exception can crash the entire generation loop. It is often safer to wrap delegate invocations in try-catch blocks within the generator or ensure the lambda handles its own exceptions.
Thread Safety: In AI applications, tokens often arrive on background threads (e.g., from Task.Run). If the callback updates a UI element (which is usually bound to the main thread), you will encounter cross-thread access exceptions. While delegates handle the data transfer, the implementation of the lambda must account for thread marshaling (e.g., using Dispatcher.Invoke in WPF or MainThread.Invoke in MAUI).

Summary

In this subsection, we established that Delegates provide the architectural backbone for handling asynchronous token streams in AI applications. By treating functions as data (via delegates) and defining them concisely (via Lambda Expressions), we decouple the generation logic from the consumption logic. This allows for responsive applications that can process high-throughput data from LLMs without freezing the user interface, while simultaneously enabling complex, composable pipelines for logging, analysis, and rendering.

Basic Code Example

Here is a comprehensive guide to the "Basic Code Example" for Delegates and Lambda Expressions in the context of Token Streaming.

The Problem: Real-Time LLM Token Streaming

In the world of Large Language Models (LLMs), generating a response is rarely instantaneous. An LLM generates text one token (a word or sub-word) at a time. If you wait for the entire response to finish before showing it to the user, the experience feels sluggish and disconnected.

Imagine a conversational AI chatbot. When the AI replies, we want the text to appear on the screen as it is being generated, character by character, mimicking a human typing speed. This requires a mechanism to "listen" for each new token and immediately perform an action (like updating the UI) without stopping the main generation process.

This is where Delegates and Lambda Expressions become the architectural backbone. We need a way to say: "Hey, generation engine, every time you produce a token, please call this specific piece of code I'm giving you right now."

The Code Example: The `TokenStreamer`

Below is a C# simulation of this scenario. We define a delegate type that acts as a contract for any method interested in receiving tokens. We then create a generator class that accepts a delegate and invokes it repeatedly.

using System;
using System.Threading;
using System.Threading.Tasks;

namespace AI_Streaming_Delegates
{
    // 1. DEFINING THE DELEGATE
    // This is the "Contract". Any method matching this signature 
    // (void return, takes a string) can be registered as a callback.
    // We use 'delegate' to define a type that represents references to methods.
    public delegate void TokenReceivedHandler(string token);

    public class LLMEngine
    {
        // 2. THE FIELD FOR STORAGE
        // This private field holds the list of subscribers (callbacks).
        // It uses the built-in 'Delegate' class which is multicast-capable.
        private Delegate? _onTokenReceived;

        // 3. REGISTRATION METHOD
        // This allows external code to attach a method to our engine.
        // We restrict the input to our specific delegate type for type safety.
        public void RegisterTokenCallback(TokenReceivedHandler callback)
        {
            // We combine the new callback with any existing ones.
            _onTokenReceived = Delegate.Combine(_onTokenReceived, callback);
        }

        // 4. GENERATION LOGIC (SIMULATED)
        // This simulates the heavy tensor math of an LLM.
        public async Task GenerateResponseAsync(string prompt)
        {
            Console.WriteLine($"\n[Engine]: Processing prompt: '{prompt}'...");

            // Simulated tokens that the LLM would output
            string[] tokens = { "Hello", " ", "User", ", ", "I", " am", " an", " AI." };

            foreach (string token in tokens)
            {
                // Simulate network/processing delay
                await Task.Delay(300); 

                // 5. THE INVOCATION
                // Check if a callback exists, then invoke it.
                // This is the "Event-Driven" moment.
                if (_onTokenReceived != null)
                {
                    // Dynamic Invocation: The delegate calls all attached methods.
                    _onTokenReceived.DynamicInvoke(token);
                }
            }
        }
    }

    class Program
    {
        static async Task Main(string[] args)
        {
            // 6. INSTANCE CREATION
            LLMEngine engine = new LLMEngine();

            // 7. LAMBDA EXPRESSION INTRODUCTION
            // Here we define the callback logic inline.
            // The '(string token)' is the parameter list.
            // The '=>' separates parameters from the body.
            // The '{ ... }' is the execution logic.
            TokenReceivedHandler uiUpdater = (string token) => 
            {
                // In a real app, this would append to a TextBlock in WPF/WinUI
                Console.Write(token); 
            };

            // 8. REGISTERING THE LAMBDA
            engine.RegisterTokenCallback(uiUpdater);

            // --- Alternative: Registering directly without a variable ---
            // You can also pass the lambda directly to the registration method:
            /*
            engine.RegisterTokenCallback((token) => {
                Console.ForegroundColor = ConsoleColor.Cyan;
                Console.Write(token);
                Console.ResetColor();
            });
            */

            // 9. TRIGGERING THE STREAM
            Console.WriteLine("Starting Stream...");
            await engine.GenerateResponseAsync("Tell me a story.");
            Console.WriteLine("\n[Stream Complete]");
        }
    }
}

Architectural Breakdown: How It Works

Here is the step-by-step execution flow and the architectural reasoning behind the code.

The Delegate Definition (TokenReceivedHandler):
- What it is: A delegate is a type-safe function pointer. By defining public delegate void TokenReceivedHandler(string token), we are creating a new type named TokenReceivedHandler.
- Why it matters: This enforces a contract. The LLMEngine guarantees that it will only call methods that accept exactly one string argument and return nothing. If you try to register a method that returns an int, the compiler will throw an error. This prevents runtime crashes in complex systems.
The Registration Mechanism (RegisterTokenCallback):
- What it is: This method accepts an instance of our delegate type.
- Why it matters: This is Decoupling. The LLMEngine knows how to generate tokens (math, tensors, inference), but it does not know how to display them (Console, GUI, Log file). By accepting a delegate, the engine delegates the responsibility of "what to do with the data" to the caller.
The Invocation (_onTokenReceived.DynamicInvoke):
- What it is: We check if the delegate field is null (has subscribers). If not, we call DynamicInvoke.
- Why it matters: In a simple delegate scenario, we could just call _onTokenReceived(token). However, using DynamicInvoke ensures that if multiple callbacks were registered (multicast), they are all called sequentially. If one fails, it handles the exception gracefully without necessarily crashing the whole stream (though in production, you'd wrap this carefully).
The Lambda Expression ((string token) => { ... }):
- What it is: An anonymous function. It allows us to define the method body exactly where we need it, without cluttering the class with named methods like PrintToken or UpdateUI.
- Why it matters: In AI streaming, the logic for handling a token is often simple and specific to the context (e.g., appending text to a specific text box). Lambdas reduce boilerplate code and keep the logic visually close to where it is used.

Visualizing the Data Flow

This diagram illustrates the relationship between the Generator (Producer) and the Callback (Consumer).

A Producer invokes a Generator to produce data, which it then passes to a Consumer via a Callback, illustrating the flow of data between these components.

Common Pitfalls

1. The Null Reference Exception (The "Forgotten Subscription") A frequent mistake is invoking the delegate without checking if it is null.

Bad Code: _onTokenReceived(token);
The Issue: If GenerateResponseAsync is called before anyone calls RegisterTokenCallback, _onTokenReceived is null. Calling a method on a null reference throws a NullReferenceException, crashing the application.
The Fix: Always check if (_onTokenReceived != null) before invoking.

2. Lambda Variable Capture (The "Loop Trap") If you are registering lambdas inside a loop, be careful about variable scope.

The Scenario: Imagine generating a stream of 1000 tokens and logging them with an index.
The Mistake: for(int i=0; i<1000; i++) { engine.RegisterTokenCallback(t => Console.WriteLine($"{i}: {t}")); }
The Issue: In C#, closures capture variables by reference, not by value. By the time the callback executes, the loop might have finished, and i might be 1000 for every callback.
The Fix: Declare a local variable inside the loop: int current = i; then use current in the lambda.

3. Blocking the Stream Delegates in C# are synchronous by default. If your lambda performs a heavy operation (like writing to a slow disk or a complex calculation), it will block the LLMEngine's loop.

The Fix: Keep lambda bodies lightweight. If heavy work is needed, the lambda should trigger an asynchronous task (e.g., Task.Run(() => HeavyWork(token))).

The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon

Code License: All code examples are released under the MIT License. Github repo.

All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.

Chapter 11: Delegates - Callbacks for Token Streaming

Theoretical Foundations

The Action and Func Delegates

Real-World Analogy: The Restaurant Pager System

Implementing Delegates for Token Streaming

Lambda Expressions: Concise Callbacks

Architectural Implications: Decoupling and Reusability

Advanced Scenario: Chaining Delegates (Multicast Delegates)

Visualizing the Data Flow

Edge Cases and Error Handling

Summary

Basic Code Example

The Problem: Real-Time LLM Token Streaming

The Code Example: The TokenStreamer

Architectural Breakdown: How It Works

Visualizing the Data Flow

Common Pitfalls

The `Action` and `Func` Delegates

The Code Example: The `TokenStreamer`