Chapter 2: Controllers vs Minimal APIs - Performance Choices

Theoretical Foundations

The architectural decision between using traditional Controllers and the newer Minimal API framework in ASP.NET Core is not merely a stylistic preference; it is a foundational choice that dictates the performance profile, maintainability, and scalability of AI-driven applications. When serving Large Language Models (LLMs) or computer vision models, where inference latency is critical and request throughput is high, the overhead introduced by the web framework itself becomes a significant factor in the total response time. This section explores the theoretical underpinnings of both patterns, dissecting how they handle request processing pipelines, dependency injection (DI), and data serialization to help you make an informed choice for high-performance AI workloads.

The Request Processing Pipeline: A Tale of Two Lifecycles

At the heart of the performance discussion lies the request processing pipeline. In a traditional Controller-based architecture, the framework relies heavily on reflection and a complex middleware chain to instantiate controllers, resolve dependencies, and bind data. While powerful, this abstraction layer introduces latency.

Consider the lifecycle of a request in a Controller-based application. When an HTTP request arrives, the routing middleware identifies the endpoint and invokes the ControllerActionInvoker. This invoker must:

Instantiate the Controller class (often via a factory).
Resolve all constructor dependencies from the DI container.
Execute the Action method, which involves binding parameters from the request (headers, query strings, body) to method arguments.
Serialize the return value into the response stream.

In contrast, Minimal APIs flatten this hierarchy. They utilize a source-generator-driven approach (introduced in .NET 7) to generate highly optimized code at compile time. Instead of reflection-based invocation, Minimal APIs map requests directly to endpoint handlers with minimal overhead. The DI resolution is streamlined, and parameter binding is often inferred and optimized, bypassing the heavy machinery of the action invocation context.

Analogy: The Restaurant Kitchen Imagine a high-end restaurant (Traditional Controllers) versus a fast-casual food truck (Minimal APIs).

The Restaurant (Controllers): You have a maître d' (Routing Middleware), a head chef (Action Invoker), and specialized stations (Model Binders). When an order comes in, it passes through multiple hands. The head chef reads the ticket (Reflection), assigns the station (Dependency Injection), and ensures the plating meets standards (Serialization). This allows for complex, multi-course meals (Intricate Logic) but introduces "service time" overhead for every dish.
The Food Truck (Minimal APIs): The chef is also the cashier and the cook. They take the order, grab pre-prepped ingredients (Optimized Closures), and cook the meal immediately. There is no chain of command. The "overhead" per order is negligible, allowing for maximum throughput of simple, high-quality dishes (Focused Endpoints).

Dependency Injection and the Cost of Abstraction

Dependency Injection is the backbone of modern .NET applications, enabling loose coupling and testability. However, the way DI is utilized differs significantly between the two patterns.

In Controllers, DI is typically constructor-based. Every request results in the instantiation of a new Controller instance and the resolution of its dependencies. If your AI service requires a ModelLoader, a Tokenizer, and a CacheService, the container must resolve three objects per request. While the .NET DI container is highly optimized, the act of resolving a chain of dependencies (especially if they are scoped or transient) adds CPU cycles to the request lifecycle.

Minimal APIs support both constructor-style injection (via closures) and parameter injection. However, their primary advantage lies in the ability to define endpoints as lightweight lambda expressions. These lambdas capture only the specific dependencies required for that specific endpoint. This reduces the "surface area" of injection.

Furthermore, in AI applications, we often deal with heavy singleton services—such as the model inference engine itself (e.g., an ONNX runtime session or a connector to an external LLM like OpenAI). In a Controller, accessing this singleton requires passing it through the constructor or the HttpContext.RequestServices. Minimal APIs allow for direct closure capture of these singletons, eliminating the lookup overhead within the hot path of the request.

Analogy: The Toolbox

Controllers: Imagine a mechanic (the Controller) who has a massive toolbox (the DI Container). To tighten a bolt, they must walk to the toolbox, open it, find the wrench (resolve the service), use it, and put it back. Every bolt requires this walk.
Minimal APIs: Imagine the mechanic has the specific wrench tucked into their belt (the closure). They reach for it instantly. The tool is right there, attached to the execution context, eliminating the travel time to the toolbox.

Serialization and Payload Efficiency

When serving AI models, the payload is often large. A request to a chat endpoint might include a long conversation history (arrays of messages), and the response is a stream of tokens or a structured JSON object containing the generated text and usage statistics.

Traditional Controllers often rely on System.Text.Json serialization, but the process is wrapped in ObjectResult execution. The framework must inspect the object type, determine the appropriate formatter, and write to the response stream. While efficient, there is a layer of indirection.

Minimal APIs, by default, use Results.Json() or implicit serialization. More importantly, they encourage the use of IResult types that are optimized for the response lifecycle. For high-throughput AI inference, where we might want to stream tokens (Server-Sent Events or NDJSON), Minimal APIs offer a more direct way to write to the response stream using Results.Stream() or Results.Text() without the overhead of constructing complex action result objects.

Visualizing the Pipeline Overhead

The following diagram illustrates the difference in steps required to process a request for an AI inference endpoint under both architectures.

The diagram contrasts a traditional MVC pipeline—laden with intermediate action result objects and controller overhead—against a streamlined Minimal API pipeline that processes AI inference requests with fewer steps and no unnecessary object construction.

Theoretical Foundations

The choice between these patterns has profound implications for specific AI scenarios:

High-Frequency Inference (e.g., Sentiment Analysis):
- Context: Models like DistilBERT are lightweight and require sub-50ms latency.
- Implication: The overhead of Controller instantiation and reflection can actually exceed the model inference time. Minimal APIs reduce the framework overhead to microseconds, ensuring the model is the bottleneck, not the web server.
Streaming Chat Endpoints:
- Context: LLMs generate text token-by-token. The API must stream this back to the client.
- Implication: Controllers require ActionResult<Stream> or specific FileStreamResult handling. Minimal APIs provide Results.Stream() which allows for a direct HttpContext.Response.Body write loop. This direct access reduces buffering and memory allocation, crucial for maintaining a smooth streaming experience.
Model Management (Admin Endpoints):
- Context: Endpoints to load/unload models (e.g., POST /models/load).
- Implication: These endpoints are low-traffic but high-complexity. They require robust validation and error handling. Here, the structure of Controllers (Filters, Action Constraints) might be preferred for maintainability, even if Minimal APIs offer slightly better raw performance.

The "What If": Scalability and Resource Contention

In a cloud environment where AI inference is CPU or GPU-bound, every cycle saved on request processing translates to higher throughput.

What if we choose Controllers for a high-throughput scenario? The application will likely function correctly, but the "Time to First Byte" (TTFB) will be higher. Under load, the thread pool may be occupied by serialization and DI resolution tasks rather than processing inference requests. The memory pressure from creating Controller instances and ActionContexts for every request adds to the Garbage Collection (GC) frequency, potentially causing "stop-the-world" pauses that disrupt real-time AI interactions.

What if we choose Minimal APIs for a complex enterprise system? While we gain performance, we might lose some built-in architectural scaffolding. Minimal APIs encourage a functional programming style within an object-oriented ecosystem. Without the discipline of organizing endpoints into classes (using GroupRoutes or partial classes), the codebase can become a "spaghetti" of lambda expressions, making it harder to enforce cross-cutting concerns like authorization or logging uniformly. However, modern .NET 8+ attributes allow applying metadata directly to Minimal API handlers, mitigating this risk.

Decision Framework: The Performance/Complexity Matrix

To visualize the decision-making process, we can map the two patterns based on the complexity of the AI workload and the required throughput.

This diagram maps AI workload patterns onto a performance-complexity matrix, visually guiding the selection between Minimal API handlers for simple tasks and more complex architectures for high-throughput requirements.

Deep Dive: The Source Generator Advantage

A critical theoretical differentiator, often overlooked, is the role of Source Generators in Minimal APIs. In traditional Controllers, the framework uses System.Reflection.Emit or runtime type inspection to bind routes to methods. This is "JIT" (Just-In-Time) work.

Minimal APIs in .NET 7+ utilize Source Generators. At compile time, the C# compiler analyzes the MapGet or MapPost calls and generates C# code that directly instantiates the endpoint handler and performs parameter binding. This code is baked into the assembly.

Example of Conceptual Generated Code (Mental Model): Instead of:

// Runtime Reflection (Conceptual)
var method = controllerType.GetMethod("Infer");
var parameters = BindParameters(context);
method.Invoke(controllerInstance, parameters);

The Source Generator produces:

// Compile-time Generated (Conceptual)
public static Task InferenceEndpoint(HttpContext context)
{
    // Direct access to services
    var model = context.RequestServices.GetRequiredService<InferenceModel>();

    // Direct parameter binding without reflection
    var input = BindInputDirectly(context); 

    // Direct execution
    return model.PredictAsync(input, context.Response.Body);
}

This shift from runtime reflection to compile-time code generation is the primary reason Minimal APIs can outperform Controllers in raw request processing speed.

Conclusion

The theoretical foundation of choosing between Controllers and Minimal APIs for AI web APIs rests on the balance between structure and speed. Controllers provide a rigid, class-based structure that is beneficial for complex, stateful logic and large teams requiring strict separation of concerns. However, this structure introduces overhead via reflection, DI resolution, and object instantiation.

Minimal APIs strip away these layers, offering a direct line from the HTTP request to the AI inference logic. For AI workloads—where the goal is to minimize latency and maximize the number of inferences per second—the reduction in framework overhead is not just a micro-optimization; it is a fundamental architectural advantage. By understanding these theoretical underpinnings, developers can architect systems that are not only performant but also aligned with the specific demands of modern AI applications.

Basic Code Example

Scenario: You are building an internal tool for a data science team. They need a lightweight, high-performance endpoint to quickly get predictions from a pre-trained sentiment analysis model. The model is loaded in memory, and the API must handle many concurrent requests with minimal overhead. This example compares a minimal API implementation against a traditional controller approach for this specific task.

Minimal API Implementation

This approach uses the new Minimal API framework in ASP.NET Core, which is designed for maximum performance and minimal ceremony.

using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Http;
using Microsoft.Extensions.DependencyInjection;
using System.Text.Json;
using System.Text.Json.Serialization;

// --- 1. Define the Data Models ---
// Using records for immutable data transfer objects (DTOs).
public record PredictionRequest(string Text);
public record PredictionResult(string Sentiment, double Confidence);

// --- 2. Define the Model Service ---
// This simulates a loaded AI model. In a real app, this would be a 
// complex class like an ONNX runtime session or a TensorFlow model wrapper.
public interface ISentimentModel
{
    PredictionResult Predict(string text);
}

public class MockSentimentModel : ISentimentModel
{
    // A simple dictionary to mock model inference logic.
    private static readonly Dictionary<string, PredictionResult> _knowledgeBase = new()
    {
        ["I love this product"] = new PredictionResult("Positive", 0.98),
        ["This is terrible"] = new PredictionResult("Negative", 0.95),
        ["It's okay"] = new PredictionResult("Neutral", 0.60)
    };

    public PredictionResult Predict(string text)
    {
        // Simulate computational delay (e.g., matrix multiplication)
        Thread.Sleep(10); 

        if (_knowledgeBase.TryGetValue(text, out var result))
        {
            return result;
        }

        // Default fallback for unknown text
        return new PredictionResult("Unknown", 0.50);
    }
}

// --- 3. Application Entry Point & Configuration ---
var builder = WebApplication.CreateBuilder(args);

// Register the model as a Singleton. 
// CRITICAL: The model is heavy; we load it once and share it across all requests.
builder.Services.AddSingleton<ISentimentModel, MockSentimentModel>();

// Configure JSON serialization options for consistent casing (camelCase).
builder.Services.ConfigureHttpJsonOptions(options =>
{
    options.SerializerOptions.PropertyNamingPolicy = JsonNamingPolicy.CamelCase;
});

var app = builder.Build();

// --- 4. Define the Minimal API Endpoint ---
// This replaces the entire Controller class structure.
app.MapPost("/predict", async (HttpContext context, ISentimentModel model) =>
{
    // Read the request body asynchronously
    var request = await context.Request.ReadFromJsonAsync<PredictionRequest>();

    if (request?.Text is null || string.IsNullOrWhiteSpace(request.Text))
    {
        // Explicit validation handling
        context.Response.StatusCode = 400;
        await context.Response.WriteAsJsonAsync(new { error = "Text is required." });
        return Results.BadRequest();
    }

    // Execute the AI model prediction
    var result = model.Predict(request.Text);

    // Return the result with HTTP 200 OK
    return Results.Ok(result);
});

// --- 5. Run the Application ---
app.Run();

Detailed Line-by-Line Explanation

Data Models (PredictionRequest, PredictionResult):
- We define record types. In modern C#, records are preferred for DTOs because they are immutable by default, have value-based equality, and reduce boilerplate code for properties. This prevents accidental modification of request data after it is received.
Service Interface (ISentimentModel):
- We define an interface to abstract the AI model logic. This is crucial for dependency injection (DI) and unit testing. It adheres to the Dependency Inversion Principle.
Mock Implementation (MockSentimentModel):
- This class simulates a real AI model. In a production environment, this would wrap a library like Microsoft.ML or TorchSharp.
- Thread.Sleep(10): This simulates the latency inherent in neural network inference. This latency is the primary reason we need efficient request handling—to ensure the thread isn't blocked unnecessarily.
- Singleton Lifetime: We will register this as a Singleton later, meaning one instance handles all requests. This is standard for stateless, heavy services like AI models.
Builder Configuration (WebApplication.CreateBuilder):
- This initializes the ASP.NET Core host. It sets up default configuration sources (appsettings.json, environment variables) and logging.
Dependency Injection Setup (builder.Services.AddSingleton):
- AddSingleton<ISentimentModel, MockSentimentModel>() tells the DI container to create exactly one instance of our model service and reuse it for the application's lifetime.
- Why Singleton? Loading an AI model can take seconds and consume gigabytes of RAM. You cannot afford to reload it for every HTTP request. This is a critical performance optimization for AI APIs.
JSON Serialization Configuration:
- We configure HttpJsonOptions to use CamelCase. This ensures that PredictionResult properties like Sentiment become sentiment in the JSON response, adhering to standard web API conventions.
Endpoint Definition (app.MapPost):
- MapPost("/predict", ...): This is the core of the Minimal API. It maps an HTTP POST request to the /predict route.
- Lambda Expression: Instead of a separate class (Controller), we use a lambda function.
- Parameter Injection: The framework automatically injects HttpContext and ISentimentModel. The DI container resolves ISentimentModel efficiently because it's a singleton.
- ReadFromJsonAsync: This is a high-performance extension method that deserializes the request body stream directly into our PredictionRequest object. It avoids the overhead of traditional model binding used in Controllers.
Validation & Error Handling:
- We manually check if the text is null or whitespace.
- We set StatusCode = 400 and write a JSON error response. In Minimal APIs, you have granular control over the response stream.
Execution & Response:
- model.Predict(request.Text) performs the simulated inference.
- Results.Ok(result) serializes the PredictionResult object to JSON and sends it back with a 200 status code.

Traditional Controller Implementation (For Comparison)

To understand the performance choices, we must look at the traditional alternative.

using Microsoft.AspNetCore.Mvc;

// --- Controller Definition ---
[ApiController]
[Route("[controller]")]
public class PredictController : ControllerBase
{
    private readonly ISentimentModel _model;

    // Constructor Injection
    public PredictController(ISentimentModel model)
    {
        _model = model;
    }

    [HttpPost]
    public ActionResult<PredictionResult> Predict([FromBody] PredictionRequest request)
    {
        if (request?.Text is null || string.IsNullOrWhiteSpace(request.Text))
        {
            return BadRequest(new { error = "Text is required." });
        }

        var result = _model.Predict(request.Text);
        return Ok(result);
    }
}

Detailed Line-by-Line Explanation

Class Inheritance (ControllerBase):
- The class inherits from ControllerBase. This brings in properties like HttpContext, Request, Response, and helper methods like Ok() and BadRequest(). This adds a small amount of memory overhead per controller instance compared to the static Results class in Minimal APIs.
Attributes ([ApiController], [Route]):
- These attributes enable API-specific behaviors (automatic model validation, route matching). While convenient, they rely on reflection and attribute scanning at startup, which can slightly increase cold-start times compared to the explicit code-based routing of Minimal APIs.
Constructor Injection:
- The dependency ISentimentModel is injected via the constructor. The DI container creates a new instance of PredictController for every request (by default). While the ISentimentModel is shared (Singleton), the controller wrapper itself is instantiated per request (Transient), adding allocation overhead.
Action Method (Predict):
- [FromBody] tells the model binder to deserialize the JSON body. This uses the System.Text.Json input formatter.
- The method returns ActionResult<PredictionResult>, a wrapper type that allows returning either a successful result or an error.

Common Pitfalls

Registering the AI Model as Transient or Scoped:
- Mistake: Using builder.Services.AddScoped<ISentimentModel, MockSentimentModel>().
- Consequence: The AI model (and its underlying memory, like onnx session weights) would be loaded from disk and initialized for every single HTTP request. This will cause massive memory spikes, high CPU usage, and likely crash the server under load. Always use Singleton for heavy, stateless services like AI models.
Synchronous I/O in Minimal APIs:
- Mistake: Using context.Request.Body.Read(...) synchronously instead of ReadFromJsonAsync.
- Consequence: Blocking the thread while waiting for I/O reduces the server's ability to handle concurrent requests. ASP.NET Core relies on async/await to free up threads to serve other requests. Always use asynchronous methods (Async suffix) for network and file I/O.
Over-Validation in Controllers vs Minimal APIs:
- Mistake: Relying solely on built-in validation attributes (like [Required]) in Controllers without manual checks.
- Consequence: While attributes are clean, they incur reflection overhead during startup and request processing. In Minimal APIs, manual validation (as shown in the example) is often faster and more explicit, though it requires more boilerplate code.

Visualizing the Request Flow

The following diagram illustrates the difference in request processing complexity between the two approaches.

This diagram contrasts the streamlined, single-step flow of an AI model directly handling a request with the multi-stage, explicit validation path required in a manual Minimal API setup.

Analysis of the Flow:

Minimal API: The path is direct. The MapPost handler is invoked immediately. There is no intermediate controller class instantiation. This reduces memory pressure (fewer object allocations) and CPU cycles spent on reflection/attribute scanning.
Controller API: The flow involves an extra step of instantiating the controller class. While modern .NET is highly optimized, this allocation still occurs per request. For high-throughput AI serving (e.g., 10,000+ requests per second), eliminating this per-request allocation can significantly improve throughput and reduce latency.

The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon

Loading knowledge check...

Code License: All code examples are released under the MIT License. Github repo.

All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.