Skip to content

Chapter 7: SignalR - Building Real-Time Chat Channels

Theoretical Foundations

At its core, SignalR is a library for ASP.NET Core that simplifies the process of adding real-time web functionality to applications. Real-time web functionality is the ability to have server-side code push content to clients instantly as it becomes available, rather than the client having to poll for updates. In the context of AI Web APIs, this is not just a convenience; it is a fundamental architectural shift required to handle the streaming nature of modern Large Language Models (LLMs).

To understand why SignalR is indispensable for AI APIs, we must first contrast it with the traditional HTTP request-response model used in RESTful APIs, which was the focus of Book 4: Building AI Web APIs with ASP.NET Core. In a standard REST interaction, the client sends a request (e.g., "Generate a story about a robot") and waits. If the AI model takes 10 seconds to generate a response, the HTTP connection must remain open for 10 seconds. This is inefficient, prone to timeouts by proxies and load balancers, and provides no feedback to the user until the entire payload is ready. It is akin to ordering a complex meal at a restaurant and standing at the counter in silence until the entire meal is plated and handed to you at once.

SignalR changes this dynamic by establishing a persistent, bidirectional connection between the client and the server. Once the connection is established, the server can send data to the client at any time. This is analogous to a dedicated phone line. When you call customer service, the line stays open. The agent (server) can speak to you (client) immediately, provide updates ("Let me check that for you"), and finally deliver the answer, all without you having to redial the phone repeatedly.

The Mechanics of Persistent Connections

SignalR abstracts the underlying transport mechanism. It attempts to use the most efficient transport available, falling back to less efficient ones if necessary. The hierarchy of transports is:

  1. WebSockets: The ideal transport. A full-duplex communication channel over a single TCP connection. It allows the server to push data to the client instantly. This is the "phone line" analogy in action—both parties can talk simultaneously without waiting for the other to finish a sentence.
  2. Server-Sent Events (SSE): A unidirectional connection where the server can push data to the client, but the client cannot push data back over the same connection (it uses standard HTTP). This is like a radio broadcast; the DJ (server) speaks, and listeners (clients) receive, but they cannot talk back on that frequency.
  3. Long Polling: The least efficient fallback. The client sends a request, and the server holds it open until there is data to send. Once the client receives the data, it immediately sends another request. This creates a continuous loop of opening and closing connections, simulating real-time behavior.

For AI applications, specifically those utilizing IAsyncEnumerable (introduced in Book 3: Advanced C# Patterns for AI), WebSockets are critical. When an LLM generates text, it produces a stream of tokens. Using IAsyncEnumerable, we can yield these tokens as they are generated. SignalR transports this stream over WebSockets, allowing the client to render text word-by-word (or token-by-token) as if the AI were typing in real-time.

SignalR Architecture: The Hub

The central abstraction in SignalR is the Hub. A Hub is a high-level pipeline that allows a client and server to call methods on each other directly. In ASP.NET Core, a Hub is a class that inherits from Microsoft.AspNetCore.SignalR.Hub.

Unlike a standard API Controller, which maps HTTP verbs (GET, POST) to methods, a Hub maps logical method names to remote procedure calls (RPC). This means a client connected to a Hub can invoke a method named SendMessage on the server, and the server can invoke a method named ReceiveMessage on the client.

This RPC style is vital for AI chat applications. Consider a user interacting with an AI assistant. The client sends a prompt, but rather than waiting for a single massive response, the client expects a continuous flow of tokens. The architecture looks like this:

  1. Client calls hubConnection.invoke("Ask", prompt).
  2. Server receives the prompt, passes it to the AI model.
  3. Server begins streaming tokens via IAsyncEnumerable<string>.
  4. Server pushes each token to the specific client using Clients.Caller.SendAsync("ReceiveToken", token).

This decouples the transport layer from the business logic. The developer focuses on the Ask and ReceiveToken methods, not on managing WebSocket frames or HTTP keep-alives.

Group Management and Multi-User Context

In complex AI applications, we often need to isolate conversations or create collaborative environments. SignalR handles this through Groups. A group is a collection of connections associated with a unique name. A connection can belong to multiple groups, and groups are scoped to the Hub instance.

Imagine a collaborative document editing tool where an AI assistant helps multiple users. We might create a group named Document_123. When User A asks the AI to "Summarize this paragraph," the AI's response should only be pushed to User A (or perhaps to all users in Document_123 if it's a shared feature). SignalR allows us to target specific groups:

// Server-side logic (conceptual)
await Groups.AddToGroupAsync(Context.ConnectionId, "Document_123");
// Later, broadcast to the group
await Clients.Group("Document_123").SendAsync("ReceiveSummary", summary);

This capability is essential for building AI agents that operate in specific contexts. If we were building a coding assistant that helps with a specific repository, we could assign all users working on that repo to a group named Repo_{RepoId}. This ensures that AI-generated code suggestions are routed correctly without leaking context to other users.

Security and Authentication

Security in real-time systems is often an afterthought, but in SignalR, it is integrated deeply. Since SignalR connections persist, authentication must be handled at the handshake stage. In ASP.NET Core, this is achieved using the same JWT (JSON Web Token) or cookie-based authentication schemes used in standard HTTP APIs.

When a client initiates a connection, it passes an access token. The SignalR Hub Authorize attribute validates this token before the connection is fully established.

[Authorize]
public class AIChatHub : Hub
{
    // Only authenticated users can invoke this method
    [Authorize(Policy = "PremiumUser")]
    public async IAsyncEnumerable<string> Ask(string prompt)
    {
        // ... AI logic
    }
}

This is critical for AI APIs because models often have usage quotas, different capabilities (e.g., GPT-4 vs. GPT-3.5), or sensitive data access. By enforcing authentication at the Hub level, we prevent unauthorized access to expensive compute resources. Furthermore, we can access the user's identity within the Hub to personalize the AI interaction.

public override async Task OnConnectedAsync()
{
    var userId = Context.UserIdentifier; // Extracted from the JWT
    // Log connection or retrieve user-specific AI settings
    await base.OnConnectedAsync();
}

Streaming AI Model Tokens with IAsyncEnumerable

The most powerful feature for AI applications in SignalR is the support for Streaming Hubs. In modern C#, IAsyncEnumerable<T> allows for asynchronous iteration over a sequence of values. When combined with SignalR, this enables the server to stream data to the client as it becomes available, without buffering the entire response in memory.

In the context of AI, this is the difference between a sluggish, high-latency application and a responsive, fluid chat experience. When an LLM generates a response, it does so token by token. If we wait for the full response, the user might stare at a loading spinner for 20 seconds. With streaming, the user sees the text appear instantly, character by character.

The flow is as follows:

  1. Client: Subscribes to the stream.

    // Client-side TypeScript/JavaScript
    const stream = hubConnection.stream<string>("Ask", "Explain quantum physics");
    for await (const token of stream) {
        displayToken(token);
    }
    

  2. Server: Returns an IAsyncEnumerable<string>.

    // Server-side C#
    public async IAsyncEnumerable<string> Ask(string prompt)
    {
        var llmResponse = _aiService.GenerateStream(prompt);
        await foreach (var token in llmResponse)
        {
            yield return token;
        }
    }
    

  3. SignalR: Handles the serialization and transport of each yielded token over the WebSocket connection.

This architecture mimics the natural flow of human conversation. We do not wait for a complete thought to form before speaking; we speak as the thoughts emerge. SignalR enables the AI to "speak" in the same manner.

Visualizing the SignalR Architecture for AI

To visualize the flow of data in a real-time AI chat system using SignalR, consider the following diagram. It illustrates the persistent connection, the role of the Hub, and the streaming of tokens from the AI model to the client.

This diagram illustrates how a client establishes a persistent connection to a SignalR Hub, which then streams individual tokens from an AI model back to the client in real-time.
Hold "Ctrl" to enable pan & zoom

This diagram illustrates how a client establishes a persistent connection to a SignalR Hub, which then streams individual tokens from an AI model back to the client in real-time.

Architectural Implications and Edge Cases

While SignalR provides a robust framework, building AI applications introduces specific challenges:

  1. Backpressure and Buffering: If the AI generates tokens faster than the network can transmit them, the server's memory buffer may fill up. SignalR has internal limits on the number of pending messages. If these are exceeded, the connection may be terminated. Developers must implement logic to handle backpressure, perhaps by pausing the AI generation stream if the client is not acknowledging messages fast enough.
  2. Connection Stability: Mobile networks are unstable. If a connection drops while the AI is streaming a response, the client loses the data. Advanced implementations require a resumable token stream. The server must track the state of the generation (e.g., via a session ID) so that if the client reconnects, it can resume receiving the stream from the last received token ID.
  3. Scalability and Sticky Sessions: In a load-balanced environment with multiple server instances, a client's WebSocket connection must remain connected to the same server instance (sticky sessions) because the Hub maintains in-memory state for that specific connection. If the client connects to Server A but the load balancer routes the next request to Server B, the state is lost. For AI streaming, this is critical; the stream is generated on the server instance holding the connection. Using a backplane like Redis allows multiple servers to coordinate, but it adds latency. For AI APIs, it is often preferable to ensure the user stays on the same server instance for the duration of the chat session.

Theoretical Foundations

SignalR is not merely a "real-time library"; it is the nervous system of a distributed AI application. It bridges the gap between the stateless, request-response nature of HTTP and the stateful, continuous nature of AI conversations. By leveraging Hubs for RPC, Groups for context isolation, Authentication for security, and IAsyncEnumerable for streaming, we transform a static AI API into a dynamic, interactive assistant.

The theoretical foundation rests on the principle of asynchrony. Just as async/await in C# allows a thread to perform other work while waiting for an operation to complete, SignalR allows the server to push data to the client the moment it is available, maximizing throughput and minimizing perceived latency. This is the essential requirement for any AI application that aims to feel "intelligent" and responsive.

Basic Code Example

using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Hosting;
using Microsoft.AspNetCore.SignalR;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using System.Collections.Generic;
using System.Threading.Tasks;

// 1. Define the Hub Interface
// This interface defines the methods that clients can call on the Hub.
public interface IChatClient
{
    Task ReceiveMessage(string user, string message);
}

// 2. Define the Hub
// The Hub is the server-side class that handles communication.
// It inherits from Hub<T> to provide strong typing for client methods.
public class ChatHub : Hub<IChatClient>
{
    // This method is called by the client to send a message to all connected users.
    public async Task SendMessage(string user, string message)
    {
        // Broadcast the message to all clients connected to this hub.
        await Clients.All.ReceiveMessage(user, message);
    }

    // This method is called when a client connects.
    public override async Task OnConnectedAsync()
    {
        // Send a welcome message to the newly connected client.
        await Clients.Caller.ReceiveMessage("System", "Welcome to the chat!");
        await base.OnConnectedAsync();
    }
}

// 3. Define the Program
// This sets up the web application host and configures services and middleware.
public class Program
{
    public static void Main(string[] args)
    {
        // Create a web application builder.
        var builder = WebApplication.CreateBuilder(args);

        // Add services to the container.
        // SignalR requires the SignalR service to be registered.
        builder.Services.AddSignalR();

        // Build the application.
        var app = builder.Build();

        // Configure the HTTP request pipeline.
        // Map the SignalR Hub to the "/chatHub" endpoint.
        // This is where the WebSocket connection is established.
        app.MapHub<ChatHub>("/chatHub");

        // Run the application.
        app.Run();
    }
}

Detailed Explanation

1. The Problem: Real-Time Communication in a Web API

Imagine you are building an AI-powered chat application. A user sends a prompt, and the AI generates a response. In a traditional HTTP request-response cycle, the client sends a request and waits for the server to send the complete response. This creates a poor user experience for long-running tasks (like AI generation) because the user sees nothing until the entire response is ready. They might think the app is frozen.

Real-World Context: You are building a customer support chatbot. A user asks a complex question. The AI needs 10 seconds to process the query and generate a helpful answer. With traditional HTTP, the user waits 10 seconds staring at a loading spinner. With SignalR, the AI can stream the response token-by-token (word-by-word) as it is generated, providing immediate feedback and a much more engaging experience.

2. The Solution: SignalR Hubs

SignalR is a library for ASP.NET Core that simplifies adding real-time web functionality. It uses WebSockets under the hood (falling back to other techniques if WebSockets aren't available) to maintain a persistent, low-latency connection between the server and the client.

The core concept is the Hub. A Hub is a high-level pipeline that allows a server and client to call methods on each other directly.

3. Code Breakdown

Step 1: Defining the Client Interface (IChatClient)
public interface IChatClient
{
    Task ReceiveMessage(string user, string message);
}
  • Why an Interface? In modern C# with SignalR, we use strongly-typed Hubs. This interface defines the contract for methods that the client (e.g., a JavaScript browser app or a .NET desktop app) can call.
  • The Method: ReceiveMessage is a method that the server will invoke on the client. It takes a user and a message.
  • Return Type: It returns a Task. Asynchronous programming is fundamental in SignalR to prevent blocking threads, which is critical for scalability.
Step 2: Implementing the Hub (ChatHub)
public class ChatHub : Hub<IChatClient>
{
    public async Task SendMessage(string user, string message)
    {
        await Clients.All.ReceiveMessage(user, message);
    }

    public override async Task OnConnectedAsync()
    {
        await Clients.Caller.ReceiveMessage("System", "Welcome to the chat!");
        await base.OnConnectedAsync();
    }
}
  • Inheritance: ChatHub : Hub<IChatClient> inherits from the generic Hub<T> class, specifying our IChatClient interface. This enables IntelliSense and compile-time checking for client methods.
  • Hub Context: The Hub class provides a Context property to access connection information (ConnectionId, User, etc.) and a Clients property to communicate with clients.
  • SendMessage Method:
    • This is a public method. When a client calls this method (e.g., from JavaScript), the code inside executes on the server.
    • Clients.All: This targets every single client connected to this Hub instance.
    • ReceiveMessage(...): This invokes the method defined in the IChatClient interface on all connected clients.
  • OnConnectedAsync Method:
    • This is a lifecycle method. SignalR calls this automatically when a client successfully establishes a connection.
    • Clients.Caller: This targets only the client that triggered this event (the one just connecting).
    • We send a system message specifically to the new user to acknowledge their connection.
Step 3: Configuring the Application (Program.cs)
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddSignalR();
var app = builder.Build();
app.MapHub<ChatHub>("/chatHub");
app.Run();
  • Service Registration: builder.Services.AddSignalR() registers the SignalR services with the dependency injection container. This makes the Hub system available to the application.
  • Endpoint Mapping: app.MapHub<ChatHub>("/chatHub") is crucial. It tells the ASP.NET Core routing system that any request to the path /chatHub should be handled by the SignalR pipeline, not by standard MVC controllers or minimal API endpoints. This is where the WebSocket handshake occurs.

4. Architectural Implications

  • Scalability: SignalR is stateful. By default, if you scale out your application to multiple servers (e.g., using Kubernetes or Azure App Service), a client connected to Server A cannot send a message to a client connected to Server B.
  • Backplane: To solve this, you must use a backplane (like Azure SignalR Service, Redis, or SQL Server). The backplane acts as a message broker, ensuring all servers share connection state and messages.
  • Security: While this example is unsecured, production APIs must validate identity. SignalR integrates with ASP.NET Core Authentication. You can decorate the Hub or specific methods with [Authorize] attributes and access Context.User to ensure only authenticated users can send or receive messages.

5. Streaming (Context for AI APIs)

Although not shown in this basic "Hello World," the prompt mentions streaming AI tokens. SignalR supports Server-to-Client Streaming. Instead of returning a single Task<string>, a Hub method can return an IAsyncEnumerable<string>. As the AI model generates tokens, you yield return them. SignalR handles the network transmission, pushing each token to the client as soon as it's available, creating a typewriter effect for AI responses.

Common Pitfalls

  1. Missing Client-Side Library: The server-side code is only half the equation. You must include the SignalR client library on the frontend (e.g., @microsoft/signalr for npm or a CDN script). Without it, the browser cannot establish the WebSocket connection or call Hub methods.
  2. Blocking Hub Methods: Never write blocking code (e.g., Thread.Sleep or synchronous I/O) inside a Hub method. SignalR relies heavily on thread pool threads. Blocking one can lead to thread starvation and severely limit the number of concurrent connections your server can handle. Always use async/await and call asynchronous APIs.
  3. CORS Configuration: If your client (e.g., a React app on localhost:3000) is on a different domain than your API (e.g., localhost:5000), the browser's Cross-Origin Resource Sharing (CORS) policy will block the WebSocket connection. You must configure CORS in Program.cs to allow the specific origin, and importantly, allow the "Upgrade" header (required for WebSockets).
  4. Hub Lifecycle Misunderstanding: Hubs are transient. A new instance of the ChatHub class is created for every Hub invocation (e.g., every method call). Do not store state in instance fields (e.g., private int _count;), as it will not persist between calls. Use Context or a singleton service for shared state if necessary.

Visualizing the Connection Flow

A diagram should show a single, persistent Context or singleton service holding the shared _count state, which is accessed and updated by multiple, independent function calls that otherwise would not retain their own data.
Hold "Ctrl" to enable pan & zoom

A diagram should show a single, persistent `Context` or singleton service holding the shared `_count` state, which is accessed and updated by multiple, independent function calls that otherwise would not retain their own data.

The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon


Loading knowledge check...



Code License: All code examples are released under the MIT License. Github repo.

Content Copyright: Copyright © 2026 Edgar Milvus | Privacy & Cookie Policy. All rights reserved.

All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.