Chapter 11: Consuming OpenAI/Azure APIs with HttpClientFactory

Theoretical Foundations

The consumption of external AI services, such as OpenAI or Azure OpenAI, within an ASP.NET Core application introduces a distinct set of challenges that go beyond simple REST API calls. While the fundamental mechanism is HTTP, the nature of AI workloads—characterized by high latency, large payloads, streaming responses, and strict rate limits—demands a sophisticated approach to connection management, request construction, and resilience. This section establishes the theoretical bedrock for interacting with these services, focusing on the architectural patterns necessary to build robust, scalable, and maintainable AI backends.

The Perils of Naive HTTP Consumption

In early .NET applications, developers often instantiated a new HttpClient for every request or maintained a static singleton instance. Both approaches are fundamentally flawed for high-throughput AI applications.

Socket Exhaustion: Creating a new HttpClient for each request (e.g., inside a controller action) relies on the operating system's socket allocation. Under the load of concurrent AI inference requests, the OS runs out of available sockets, leading to SocketException errors. This is the "fire and forget" anti-pattern applied to network resources.
Stale DNS and Certificate Issues: A static HttpClient instance persists indefinitely. If the external AI service (like Azure OpenAI) rotates its IP addresses or updates SSL certificates, the static client will not pick up these changes, resulting in connection failures until the application restarts.

Analogy: The Single Courier vs. The Fleet Manager Imagine you run a company that constantly sends packages (API requests) to a distant supplier (OpenAI).

The Naive Approach (New HttpClient per request): You hire a new courier and rent a new van for every single package. The hiring agency (OS Sockets) eventually runs out of couriers and vans. It is incredibly wasteful and unsustainable.
The Static Approach: You hire one courier and give them one van, telling them to drive back and forth forever. If the supplier moves their warehouse (DNS change) or the road rules change (Certificate update), your one courier is stuck with outdated maps, unable to adapt.
The IHttpClientFactory Approach: You hire a fleet manager. When you need to send a package, you ask the manager for a van. The manager gives you a van from a pool. The manager ensures the vans are maintained, the maps are updated, and the fleet size scales with demand.

`IHttpClientFactory`: The Connection Lifecycle Manager

IHttpClientFactory is the .NET framework's solution to the socket exhaustion and DNS staleness problems. It does not create a single HttpClient instance; instead, it manages the lifecycle of the underlying HttpMessageHandler (the object that actually handles the HTTP connection).

How it Works: When you request an HttpClient from the factory, it retrieves a handler from a pool. The factory tracks these handlers. By default, a handler is reused for approximately 2 minutes (configurable). After this period, or if the handler is marked as stale due to DNS changes, the factory discards it and creates a new one. This ensures that your application always uses fresh connections without exhausting the socket pool.

Why this is Critical for AI: AI API calls are distinct from standard CRUD operations.

Latency: An OpenAI chat completion request might take 10–30 seconds to complete (or stream). Holding a socket open for this duration is resource-intensive. Efficient pooling ensures that while one request is waiting for a token, the socket can be reused for a subsequent request from the same client (if HTTP/2 is enabled, multiplexing allows multiple requests over a single connection).
Concurrency: AI applications are inherently concurrent. Multiple users might trigger inference simultaneously. IHttpClientFactory allows the creation of named clients with specific configurations (e.g., a client dedicated to the "GPT-4" model with a specific timeout) that share the same underlying pool but maintain distinct configurations.

The Builder Pattern for AI Request Construction

AI APIs are not simple CRUD endpoints. They require complex JSON payloads containing arrays of messages, function definitions, and configuration parameters. Constructing these payloads manually using anonymous types or JObject is error-prone and lacks type safety.

The Builder Pattern is the architectural solution for constructing these complex requests. It separates the construction of a request object from its representation.

Theoretical Application: In the context of AI, we often need to construct a ChatCompletionRequest. This object might include:

A system prompt (defining the AI's behavior).
A history of user/assistant messages.
Temperature settings (creativity).
Function tools (for function calling).

Using a Builder allows us to fluently construct this object:

// Conceptual representation of a Builder Pattern usage
var request = new ChatCompletionBuilder()
    .WithSystemMessage("You are a helpful assistant.")
    .WithUserMessage("What is the capital of France?")
    .WithTemperature(0.7)
    .WithMaxTokens(50)
    .Build();

Why this matters:

Validation: The Build() method can validate the request (e.g., ensuring total token count doesn't exceed the model's limit) before serialization.
Flexibility: It allows for optional parameters without a constructor explosion (e.g., new ChatRequest(prompt, null, null, null, ...)).
Readability: It makes the code self-documenting. When reading the code, you immediately understand the intent of the request configuration.

Handling Streaming Responses (Server-Sent Events)

Unlike standard HTTP responses that return a complete payload, AI models often stream responses token-by-token to reduce perceived latency. This is typically implemented using Server-Sent Events (SSE) or chunked transfer encoding.

Theoretical Challenge: A standard HttpClient.GetAsync() call waits until the entire response stream is downloaded before returning. For a 500-token AI response, this could mean waiting 15 seconds. The user sees nothing until the entire response is loaded.

The Solution: We must consume the HttpResponseMessage as a Stream. The HttpMessageHandler reads bytes from the network as they arrive. In C#, this is exposed via HttpResponseMessage.Content.ReadAsStreamAsync().

The Pipeline:

Request: Send headers and the prompt payload.
Response Headers: Immediately receive HTTP 200 OK and headers (indicating streaming mode).
Streaming: The connection stays open. The application reads from the stream continuously.
Parsing: The stream yields data in specific formats (e.g., OpenAI uses data: {...} JSON lines). The application must parse these lines incrementally.

This requires a shift in mindset from "Request -> Response" to "Request -> Continuous Flow". The IHttpClientFactory ensures the underlying connection remains stable during this long-lived read operation.

Authentication Strategies: API Keys vs. Azure AD

AI services require authentication, but the method varies by deployment.

API Keys (OpenAI/Azure OpenAI):
- Mechanism: A static string passed in the Authorization header (usually Bearer <key> or api-key header).
- Theory: This is simple but risky. If leaked, anyone can use your quota.
- Implementation: We use DelegatingHandler (specifically HttpClientHandler) to inject this header automatically for every request made by a specific named client, keeping the key out of controller logic.
Azure AD (Entra ID):
- Mechanism: OAuth 2.0 flows. The application requests a token from Azure AD using Client Credentials (Service-to-Service) or On-Behalf-Of flows (User-to-Service).
- Theory: This is more secure and allows for auditing and RBAC (Role-Based Access Control).
- Challenge: Tokens expire. A naive implementation might fail if a token expires mid-request.
- Solution: We integrate IHttpClientFactory with the Azure.Identity library. A custom DelegatingHandler intercepts the request, checks if a valid token is cached, retrieves or acquires a new token, and injects it. This abstracts the complexity of token management away from the business logic.

Resilience with Polly: Handling Transient Faults

External AI APIs are not infallible. They suffer from:

Rate Limits (429 Too Many Requests): You are sending requests faster than your quota allows.
Transient Errors (5xx): Temporary server glitches.
Timeouts: The model takes too long to generate a response.

Polly is a .NET resilience library that integrates seamlessly with IHttpClientFactory. It allows us to define policies that wrap our HTTP calls.

Key Policies for AI:

Retry Policy: If a 429 (Rate Limit) or 503 (Service Unavailable) occurs, wait for a specific duration (often derived from the Retry-After header) and try again.
Circuit Breaker: If a service fails repeatedly (e.g., 5 consecutive 5xx errors), "break" the circuit. Subsequent calls fail immediately without hitting the network, giving the external service time to recover.
Timeout: AI calls can hang. A timeout policy ensures the request is canceled if no data is received within a set time (e.g., 60 seconds).

Integration: Polly policies are added to the IHttpClientFactory pipeline. When a request is made, it passes through the Policy pipeline before hitting the network and again after receiving the response.

Architectural Visualization

The following diagram illustrates the flow of data and control when consuming an AI API using these patterns.

The diagram shows a request originating from a client, flowing through a Policy pipeline that handles cross-cutting concerns before and after making a network call to the AI service to process the response.

Theoretical Foundations

By combining these concepts, we establish a robust architecture:

Decoupling: The TypedService (discussed in the next subsection) hides the complexity of the external API. The Controller only knows about a domain interface (e.g., IChatService), not HTTP or JSON.
Efficiency: IHttpClientFactory manages the socket lifecycle, preventing resource exhaustion under the high-latency loads typical of AI.
Safety: The Builder pattern ensures requests are valid and type-safe before serialization.
Security: DelegatingHandlers abstract authentication, allowing seamless switching between API keys and Azure AD without changing business logic.
Stability: Polly policies transform transient network failures into manageable exceptions or automatic recoveries, essential for production-grade AI systems.

This theoretical foundation sets the stage for the implementation details in the following sections, where we translate these patterns into concrete C# code.

Basic Code Example

using System.Net.Http.Json;
using System.Text.Json;
using System.Text.Json.Serialization;

// A simple 'Hello World' example demonstrating how to configure and use
// IHttpClientFactory to call an external AI service (simulated here).
// This approach prevents socket exhaustion and allows for centralized configuration.

// 1. Define the request model (The "Builder" pattern concept)
public record class AiPromptRequest(
    [property: JsonPropertyName("prompt")] string Prompt,
    [property: JsonPropertyName("max_tokens")] int MaxTokens = 50
);

// 2. Define the response model
public record class AiResponse(
    [property: JsonPropertyName("id")] string Id,
    [property: JsonPropertyName("choices")] List<AiChoice> Choices
);

public record class AiChoice(
    [property: JsonPropertyName("text")] string Text
);

// 3. The Typed Client Service
// This service encapsulates the logic for communicating with the AI provider.
public class AiService
{
    private readonly HttpClient _httpClient;
    private readonly ILogger<AiService> _logger;

    public AiService(HttpClient httpClient, ILogger<AiService> logger)
    {
        _httpClient = httpClient;
        _logger = logger;
    }

    public async Task<string> GetCompletionAsync(string prompt)
    {
        try
        {
            // Construct the request payload
            var request = new AiPromptRequest(prompt);

            // PostAsJsonAsync handles serialization automatically
            var response = await _httpClient.PostAsJsonAsync("v1/completions", request);

            // Ensure success status code (throws HttpRequestException on failure)
            response.EnsureSuccessStatusCode();

            // Deserialize the response
            var aiResponse = await response.Content.ReadFromJsonAsync<AiResponse>();

            // Return the first choice's text
            return aiResponse?.Choices?.FirstOrDefault()?.Text ?? "No response generated.";
        }
        catch (HttpRequestException ex)
        {
            _logger.LogError(ex, "HTTP request failed while calling AI service.");
            throw; // Re-throw to let the caller handle the UI feedback
        }
    }
}

// 4. Program Setup (Minimal API style)
// This simulates the Startup/Program.cs configuration.
var builder = WebApplication.CreateBuilder(args);

// CRITICAL: Configure IHttpClientFactory
// We register the Typed Client 'AiService' and configure its HttpClient.
builder.Services.AddHttpClient<AiService>(client =>
{
    // Base address for the external API
    client.BaseAddress = new Uri("https://api.example-ai-provider.com/");

    // Set common headers (e.g., API Key)
    // In a real app, retrieve this from IConfiguration or Azure Key Vault.
    var apiKey = builder.Configuration["ApiKey"] ?? "sk-12345";
    client.DefaultRequestHeaders.Add("Authorization", $"Bearer {apiKey}");

    // Set timeout to prevent hanging indefinitely
    client.Timeout = TimeSpan.FromSeconds(30);
});

// Add logging
builder.Services.AddLogging();

var app = builder.Build();

// 5. Define a simple endpoint to trigger the service
app.MapGet("/chat", async (AiService aiService, string prompt) =>
{
    var response = await aiService.GetCompletionAsync(prompt);
    return Results.Ok(new { response });
});

app.Run();

Line-by-Line Explanation

1. Model Definitions (The Builder Pattern Foundation)

public record class AiPromptRequest(...): We use a record for immutability, which is ideal for data transfer objects (DTOs). The attributes [JsonPropertyName] instruct the JSON serializer (System.Text.Json) how to map our C# properties to the specific snake_case naming conventions often used by external APIs.
public record class AiResponse(...): Similarly, we define the shape of the expected JSON response. This strongly typed approach prevents runtime errors caused by typos in property names (e.g., response["choies"] vs response["choices"]).

2. The Typed Client Service (AiService)

public class AiService: Instead of injecting IHttpClientFactory directly into controllers and managing string-based keys, we use the Typed Client pattern. The DI container injects a pre-configured HttpClient instance directly into this class.
private readonly HttpClient _httpClient;: This instance is unique to AiService but shares the underlying connection pool managed by IHttpClientFactory.
public AiService(HttpClient httpClient, ...): Constructor injection. The HttpClient is provided by the framework.
var request = new AiPromptRequest(prompt);: We instantiate our request model. This is the "Builder" step—constructing the payload.
await _httpClient.PostAsJsonAsync(...): This extension method serializes the request object to JSON and sets the Content-Type header to application/json automatically.
response.EnsureSuccessStatusCode();: A helper method that throws an HttpRequestException if the HTTP response status code is an error (4xx or 5xx). This centralizes error checking.
await response.Content.ReadFromJsonAsync<AiResponse>();: We deserialize the JSON body directly into our strongly typed AiResponse record.

3. Dependency Injection Setup (Program.cs)

builder.Services.AddHttpClient<AiService>(...): This is the core configuration method.
- It registers AiService as a transient service.
- It automatically creates an HttpClient instance for AiService.
- It registers AiService itself as a Typed Client, meaning it can be injected into other classes.
client.BaseAddress: Setting the base address ensures that subsequent requests in the service can use relative paths (e.g., "v1/completions").
client.DefaultRequestHeaders.Add(...): We configure authentication headers globally. This prevents the need to add headers manually in every request method inside AiService.
client.Timeout: We explicitly set a timeout. The default is 100 seconds, which is often too long for interactive AI applications.

4. Endpoint Execution

app.MapGet("/chat", ...): We define a minimal API endpoint.
async (AiService aiService, string prompt): The dependency injection container automatically resolves and injects the configured AiService instance.
return Results.Ok(...): We wrap the AI response in a simple JSON object for the client.

Common Pitfalls

Instantiating HttpClient Manually:
- Mistake: Creating a new instance of HttpClient using new HttpClient() inside a service or controller.
- Consequence: This leads to Socket Exhaustion. HttpClient implements IDisposable, but it does not immediately close the underlying TCP connection. Under load, you will run out of available sockets, causing the application to hang or crash.
- Solution: Always use IHttpClientFactory via AddHttpClient().
Using IHttpClientFactory Incorrectly:
- Mistake: Injecting IHttpClientFactory into a class and calling CreateClient("NamedClient") repeatedly inside a loop.
- Consequence: While better than manual instantiation, this incurs unnecessary overhead. The factory retrieves a pre-configured handler from a cache; however, excessive calls still add minor GC pressure.
- Solution: Prefer the Typed Client pattern (as shown in the example) where the HttpClient is injected directly into the service's constructor. The instance is created once per service scope.
Swallowing Exceptions:
- Mistake: Wrapping the HTTP call in a generic try-catch that simply returns null or a default value without logging.
- Consequence: Debugging becomes impossible. You won't know if the API is down, the key is invalid, or the network failed.
- Solution: Log the exception (using ILogger) and either re-throw it or return a result object that explicitly indicates failure (e.g., Result<T, Error>).
Ignoring Timeouts:
- Mistake: Relying on the default 100-second timeout for an AI API that should respond in milliseconds.
- Consequence: Under high load or network latency, threads wait indefinitely, exhausting the thread pool.
- Solution: Set client.Timeout to a reasonable duration (e.g., 30s) and handle TaskCanceledException specifically.

Architectural Visualization

The following diagram illustrates the flow of a request through the IHttpClientFactory architecture. Note how the factory manages the handler pool, while the Typed Client provides a clean abstraction for the application logic.

A diagram illustrates how the IHttpClientFactory manages a pool of HttpMessageHandler instances to efficiently handle requests, while the Typed Client provides a clean, dependency-injected interface for the application's business logic. — A diagram illustrates how the `IHttpClientFactory` manages a pool of `HttpMessageHandler` instances to efficiently handle requests, while the Typed Client provides a clean, dependency-injected interface for the application's business logic.

The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon

Loading knowledge check...

Code License: All code examples are released under the MIT License. Github repo.

All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.

Chapter 11: Consuming OpenAI/Azure APIs with HttpClientFactory

Theoretical Foundations

The Perils of Naive HTTP Consumption

IHttpClientFactory: The Connection Lifecycle Manager

The Builder Pattern for AI Request Construction

Handling Streaming Responses (Server-Sent Events)

Authentication Strategies: API Keys vs. Azure AD

Resilience with Polly: Handling Transient Faults

Architectural Visualization

Theoretical Foundations

Basic Code Example

Line-by-Line Explanation

Common Pitfalls

Architectural Visualization

`IHttpClientFactory`: The Connection Lifecycle Manager