Chapter 11: Consuming OpenAI/Azure APIs with HttpClientFactory
Theoretical Foundations
The consumption of external AI services, such as OpenAI or Azure OpenAI, within an ASP.NET Core application introduces a distinct set of challenges that go beyond simple REST API calls. While the fundamental mechanism is HTTP, the nature of AI workloads—characterized by high latency, large payloads, streaming responses, and strict rate limits—demands a sophisticated approach to connection management, request construction, and resilience. This section establishes the theoretical bedrock for interacting with these services, focusing on the architectural patterns necessary to build robust, scalable, and maintainable AI backends.
The Perils of Naive HTTP Consumption
In early .NET applications, developers often instantiated a new HttpClient for every request or maintained a static singleton instance. Both approaches are fundamentally flawed for high-throughput AI applications.
- Socket Exhaustion: Creating a new
HttpClientfor each request (e.g., inside a controller action) relies on the operating system's socket allocation. Under the load of concurrent AI inference requests, the OS runs out of available sockets, leading toSocketExceptionerrors. This is the "fire and forget" anti-pattern applied to network resources. - Stale DNS and Certificate Issues: A static
HttpClientinstance persists indefinitely. If the external AI service (like Azure OpenAI) rotates its IP addresses or updates SSL certificates, the static client will not pick up these changes, resulting in connection failures until the application restarts.
Analogy: The Single Courier vs. The Fleet Manager Imagine you run a company that constantly sends packages (API requests) to a distant supplier (OpenAI).
- The Naive Approach (New
HttpClientper request): You hire a new courier and rent a new van for every single package. The hiring agency (OS Sockets) eventually runs out of couriers and vans. It is incredibly wasteful and unsustainable. - The Static Approach: You hire one courier and give them one van, telling them to drive back and forth forever. If the supplier moves their warehouse (DNS change) or the road rules change (Certificate update), your one courier is stuck with outdated maps, unable to adapt.
- The
IHttpClientFactoryApproach: You hire a fleet manager. When you need to send a package, you ask the manager for a van. The manager gives you a van from a pool. The manager ensures the vans are maintained, the maps are updated, and the fleet size scales with demand.
IHttpClientFactory: The Connection Lifecycle Manager
IHttpClientFactory is the .NET framework's solution to the socket exhaustion and DNS staleness problems. It does not create a single HttpClient instance; instead, it manages the lifecycle of the underlying HttpMessageHandler (the object that actually handles the HTTP connection).
How it Works:
When you request an HttpClient from the factory, it retrieves a handler from a pool. The factory tracks these handlers. By default, a handler is reused for approximately 2 minutes (configurable). After this period, or if the handler is marked as stale due to DNS changes, the factory discards it and creates a new one. This ensures that your application always uses fresh connections without exhausting the socket pool.
Why this is Critical for AI: AI API calls are distinct from standard CRUD operations.
- Latency: An OpenAI chat completion request might take 10–30 seconds to complete (or stream). Holding a socket open for this duration is resource-intensive. Efficient pooling ensures that while one request is waiting for a token, the socket can be reused for a subsequent request from the same client (if HTTP/2 is enabled, multiplexing allows multiple requests over a single connection).
- Concurrency: AI applications are inherently concurrent. Multiple users might trigger inference simultaneously.
IHttpClientFactoryallows the creation of named clients with specific configurations (e.g., a client dedicated to the "GPT-4" model with a specific timeout) that share the same underlying pool but maintain distinct configurations.
The Builder Pattern for AI Request Construction
AI APIs are not simple CRUD endpoints. They require complex JSON payloads containing arrays of messages, function definitions, and configuration parameters. Constructing these payloads manually using anonymous types or JObject is error-prone and lacks type safety.
The Builder Pattern is the architectural solution for constructing these complex requests. It separates the construction of a request object from its representation.
Theoretical Application:
In the context of AI, we often need to construct a ChatCompletionRequest. This object might include:
- A system prompt (defining the AI's behavior).
- A history of user/assistant messages.
- Temperature settings (creativity).
- Function tools (for function calling).
Using a Builder allows us to fluently construct this object:
// Conceptual representation of a Builder Pattern usage
var request = new ChatCompletionBuilder()
.WithSystemMessage("You are a helpful assistant.")
.WithUserMessage("What is the capital of France?")
.WithTemperature(0.7)
.WithMaxTokens(50)
.Build();
Why this matters:
- Validation: The
Build()method can validate the request (e.g., ensuring total token count doesn't exceed the model's limit) before serialization. - Flexibility: It allows for optional parameters without a constructor explosion (e.g.,
new ChatRequest(prompt, null, null, null, ...)). - Readability: It makes the code self-documenting. When reading the code, you immediately understand the intent of the request configuration.
Handling Streaming Responses (Server-Sent Events)
Unlike standard HTTP responses that return a complete payload, AI models often stream responses token-by-token to reduce perceived latency. This is typically implemented using Server-Sent Events (SSE) or chunked transfer encoding.
Theoretical Challenge:
A standard HttpClient.GetAsync() call waits until the entire response stream is downloaded before returning. For a 500-token AI response, this could mean waiting 15 seconds. The user sees nothing until the entire response is loaded.
The Solution:
We must consume the HttpResponseMessage as a Stream. The HttpMessageHandler reads bytes from the network as they arrive. In C#, this is exposed via HttpResponseMessage.Content.ReadAsStreamAsync().
The Pipeline:
- Request: Send headers and the prompt payload.
- Response Headers: Immediately receive HTTP 200 OK and headers (indicating streaming mode).
- Streaming: The connection stays open. The application reads from the stream continuously.
- Parsing: The stream yields data in specific formats (e.g., OpenAI uses
data: {...}JSON lines). The application must parse these lines incrementally.
This requires a shift in mindset from "Request -> Response" to "Request -> Continuous Flow". The IHttpClientFactory ensures the underlying connection remains stable during this long-lived read operation.
Authentication Strategies: API Keys vs. Azure AD
AI services require authentication, but the method varies by deployment.
-
API Keys (OpenAI/Azure OpenAI):
- Mechanism: A static string passed in the
Authorizationheader (usuallyBearer <key>orapi-keyheader). - Theory: This is simple but risky. If leaked, anyone can use your quota.
- Implementation: We use
DelegatingHandler(specificallyHttpClientHandler) to inject this header automatically for every request made by a specific named client, keeping the key out of controller logic.
- Mechanism: A static string passed in the
-
Azure AD (Entra ID):
- Mechanism: OAuth 2.0 flows. The application requests a token from Azure AD using Client Credentials (Service-to-Service) or On-Behalf-Of flows (User-to-Service).
- Theory: This is more secure and allows for auditing and RBAC (Role-Based Access Control).
- Challenge: Tokens expire. A naive implementation might fail if a token expires mid-request.
- Solution: We integrate
IHttpClientFactorywith theAzure.Identitylibrary. A customDelegatingHandlerintercepts the request, checks if a valid token is cached, retrieves or acquires a new token, and injects it. This abstracts the complexity of token management away from the business logic.
Resilience with Polly: Handling Transient Faults
External AI APIs are not infallible. They suffer from:
- Rate Limits (429 Too Many Requests): You are sending requests faster than your quota allows.
- Transient Errors (5xx): Temporary server glitches.
- Timeouts: The model takes too long to generate a response.
Polly is a .NET resilience library that integrates seamlessly with IHttpClientFactory. It allows us to define policies that wrap our HTTP calls.
Key Policies for AI:
- Retry Policy: If a 429 (Rate Limit) or 503 (Service Unavailable) occurs, wait for a specific duration (often derived from the
Retry-Afterheader) and try again. - Circuit Breaker: If a service fails repeatedly (e.g., 5 consecutive 5xx errors), "break" the circuit. Subsequent calls fail immediately without hitting the network, giving the external service time to recover.
- Timeout: AI calls can hang. A timeout policy ensures the request is canceled if no data is received within a set time (e.g., 60 seconds).
Integration:
Polly policies are added to the IHttpClientFactory pipeline. When a request is made, it passes through the Policy pipeline before hitting the network and again after receiving the response.
Architectural Visualization
The following diagram illustrates the flow of data and control when consuming an AI API using these patterns.
Theoretical Foundations
By combining these concepts, we establish a robust architecture:
- Decoupling: The
TypedService(discussed in the next subsection) hides the complexity of the external API. The Controller only knows about a domain interface (e.g.,IChatService), not HTTP or JSON. - Efficiency:
IHttpClientFactorymanages the socket lifecycle, preventing resource exhaustion under the high-latency loads typical of AI. - Safety: The Builder pattern ensures requests are valid and type-safe before serialization.
- Security:
DelegatingHandlersabstract authentication, allowing seamless switching between API keys and Azure AD without changing business logic. - Stability: Polly policies transform transient network failures into manageable exceptions or automatic recoveries, essential for production-grade AI systems.
This theoretical foundation sets the stage for the implementation details in the following sections, where we translate these patterns into concrete C# code.
Basic Code Example
using System.Net.Http.Json;
using System.Text.Json;
using System.Text.Json.Serialization;
// A simple 'Hello World' example demonstrating how to configure and use
// IHttpClientFactory to call an external AI service (simulated here).
// This approach prevents socket exhaustion and allows for centralized configuration.
// 1. Define the request model (The "Builder" pattern concept)
public record class AiPromptRequest(
[property: JsonPropertyName("prompt")] string Prompt,
[property: JsonPropertyName("max_tokens")] int MaxTokens = 50
);
// 2. Define the response model
public record class AiResponse(
[property: JsonPropertyName("id")] string Id,
[property: JsonPropertyName("choices")] List<AiChoice> Choices
);
public record class AiChoice(
[property: JsonPropertyName("text")] string Text
);
// 3. The Typed Client Service
// This service encapsulates the logic for communicating with the AI provider.
public class AiService
{
private readonly HttpClient _httpClient;
private readonly ILogger<AiService> _logger;
public AiService(HttpClient httpClient, ILogger<AiService> logger)
{
_httpClient = httpClient;
_logger = logger;
}
public async Task<string> GetCompletionAsync(string prompt)
{
try
{
// Construct the request payload
var request = new AiPromptRequest(prompt);
// PostAsJsonAsync handles serialization automatically
var response = await _httpClient.PostAsJsonAsync("v1/completions", request);
// Ensure success status code (throws HttpRequestException on failure)
response.EnsureSuccessStatusCode();
// Deserialize the response
var aiResponse = await response.Content.ReadFromJsonAsync<AiResponse>();
// Return the first choice's text
return aiResponse?.Choices?.FirstOrDefault()?.Text ?? "No response generated.";
}
catch (HttpRequestException ex)
{
_logger.LogError(ex, "HTTP request failed while calling AI service.");
throw; // Re-throw to let the caller handle the UI feedback
}
}
}
// 4. Program Setup (Minimal API style)
// This simulates the Startup/Program.cs configuration.
var builder = WebApplication.CreateBuilder(args);
// CRITICAL: Configure IHttpClientFactory
// We register the Typed Client 'AiService' and configure its HttpClient.
builder.Services.AddHttpClient<AiService>(client =>
{
// Base address for the external API
client.BaseAddress = new Uri("https://api.example-ai-provider.com/");
// Set common headers (e.g., API Key)
// In a real app, retrieve this from IConfiguration or Azure Key Vault.
var apiKey = builder.Configuration["ApiKey"] ?? "sk-12345";
client.DefaultRequestHeaders.Add("Authorization", $"Bearer {apiKey}");
// Set timeout to prevent hanging indefinitely
client.Timeout = TimeSpan.FromSeconds(30);
});
// Add logging
builder.Services.AddLogging();
var app = builder.Build();
// 5. Define a simple endpoint to trigger the service
app.MapGet("/chat", async (AiService aiService, string prompt) =>
{
var response = await aiService.GetCompletionAsync(prompt);
return Results.Ok(new { response });
});
app.Run();
Line-by-Line Explanation
1. Model Definitions (The Builder Pattern Foundation)
public record class AiPromptRequest(...): We use arecordfor immutability, which is ideal for data transfer objects (DTOs). The attributes[JsonPropertyName]instruct the JSON serializer (System.Text.Json) how to map our C# properties to the specific snake_case naming conventions often used by external APIs.public record class AiResponse(...): Similarly, we define the shape of the expected JSON response. This strongly typed approach prevents runtime errors caused by typos in property names (e.g.,response["choies"]vsresponse["choices"]).
2. The Typed Client Service (AiService)
public class AiService: Instead of injectingIHttpClientFactorydirectly into controllers and managing string-based keys, we use the Typed Client pattern. The DI container injects a pre-configuredHttpClientinstance directly into this class.private readonly HttpClient _httpClient;: This instance is unique toAiServicebut shares the underlying connection pool managed byIHttpClientFactory.public AiService(HttpClient httpClient, ...): Constructor injection. TheHttpClientis provided by the framework.var request = new AiPromptRequest(prompt);: We instantiate our request model. This is the "Builder" step—constructing the payload.await _httpClient.PostAsJsonAsync(...): This extension method serializes therequestobject to JSON and sets theContent-Typeheader toapplication/jsonautomatically.response.EnsureSuccessStatusCode();: A helper method that throws anHttpRequestExceptionif the HTTP response status code is an error (4xx or 5xx). This centralizes error checking.await response.Content.ReadFromJsonAsync<AiResponse>();: We deserialize the JSON body directly into our strongly typedAiResponserecord.
3. Dependency Injection Setup (Program.cs)
builder.Services.AddHttpClient<AiService>(...): This is the core configuration method.- It registers
AiServiceas a transient service. - It automatically creates an
HttpClientinstance forAiService. - It registers
AiServiceitself as a Typed Client, meaning it can be injected into other classes.
- It registers
client.BaseAddress: Setting the base address ensures that subsequent requests in the service can use relative paths (e.g.,"v1/completions").client.DefaultRequestHeaders.Add(...): We configure authentication headers globally. This prevents the need to add headers manually in every request method insideAiService.client.Timeout: We explicitly set a timeout. The default is 100 seconds, which is often too long for interactive AI applications.
4. Endpoint Execution
app.MapGet("/chat", ...): We define a minimal API endpoint.async (AiService aiService, string prompt): The dependency injection container automatically resolves and injects the configuredAiServiceinstance.return Results.Ok(...): We wrap the AI response in a simple JSON object for the client.
Common Pitfalls
-
Instantiating
HttpClientManually:- Mistake: Creating a new instance of
HttpClientusingnew HttpClient()inside a service or controller. - Consequence: This leads to Socket Exhaustion.
HttpClientimplementsIDisposable, but it does not immediately close the underlying TCP connection. Under load, you will run out of available sockets, causing the application to hang or crash. - Solution: Always use
IHttpClientFactoryviaAddHttpClient().
- Mistake: Creating a new instance of
-
Using
IHttpClientFactoryIncorrectly:- Mistake: Injecting
IHttpClientFactoryinto a class and callingCreateClient("NamedClient")repeatedly inside a loop. - Consequence: While better than manual instantiation, this incurs unnecessary overhead. The factory retrieves a pre-configured handler from a cache; however, excessive calls still add minor GC pressure.
- Solution: Prefer the Typed Client pattern (as shown in the example) where the
HttpClientis injected directly into the service's constructor. The instance is created once per service scope.
- Mistake: Injecting
-
Swallowing Exceptions:
- Mistake: Wrapping the HTTP call in a generic
try-catchthat simply returnsnullor a default value without logging. - Consequence: Debugging becomes impossible. You won't know if the API is down, the key is invalid, or the network failed.
- Solution: Log the exception (using
ILogger) and either re-throw it or return a result object that explicitly indicates failure (e.g.,Result<T, Error>).
- Mistake: Wrapping the HTTP call in a generic
-
Ignoring Timeouts:
- Mistake: Relying on the default 100-second timeout for an AI API that should respond in milliseconds.
- Consequence: Under high load or network latency, threads wait indefinitely, exhausting the thread pool.
- Solution: Set
client.Timeoutto a reasonable duration (e.g., 30s) and handleTaskCanceledExceptionspecifically.
Architectural Visualization
The following diagram illustrates the flow of a request through the IHttpClientFactory architecture. Note how the factory manages the handler pool, while the Typed Client provides a clean abstraction for the application logic.
The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon
Loading knowledge check...
Code License: All code examples are released under the MIT License. Github repo.
Content Copyright: Copyright © 2026 Edgar Milvus | Privacy & Cookie Policy. All rights reserved.
All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.