Chapter 2: Configuring the Kernel - Azure OpenAI vs Ollama

Theoretical Foundations

The theoretical foundation of configuring the Semantic Kernel hinges on the principle of Abstraction. In AI engineering, we rarely interact with a Large Language Model (LLM) directly via raw HTTP requests; instead, we interact with an abstraction layer that standardizes the chaos of disparate AI providers into a predictable, programmable interface. This subsection establishes the architectural philosophy required to decouple your application’s core logic from the specific AI service provider, enabling seamless transitions between high-scale cloud services like Azure OpenAI and local, privacy-centric models like Ollama.

The Abstraction of the AI Service

At its core, the Semantic Kernel is an orchestration engine. It does not inherently "know" how to generate text; it delegates this responsibility to a service connector. In C#, this is modeled using interfaces. The most critical interface in this context is IChatCompletionService.

Imagine a translator in a high-stakes diplomatic negotiation. The diplomat (your application logic) delivers a speech. The translator (the AI service) converts that speech into a language the listener (the LLM) understands. The diplomat does not care if the translator is using a human, a sophisticated software, or a dictionary; the diplomat only cares that the message is delivered and understood. The interface IChatCompletionService is the contract between the diplomat and the translator. It guarantees that GetChatMessageContentsAsync will return a list of chat messages, regardless of whether the underlying implementation is calling an Azure endpoint or a local Ollama process.

The Real-World Analogy: The Universal Remote Consider a universal remote control. The buttons (Power, Volume Up, Input Select) represent the methods defined in the interface. The remote does not care if it is pointing at a Sony TV, a Samsung TV, or a projector. The "Sony TV" implementation knows how to convert the "Power" button press into an infrared signal specific to Sony, while the "Samsung" implementation converts the same button press into a different signal. By configuring the Semantic Kernel, you are essentially programming the universal remote to point at a specific device. Swapping from Azure OpenAI to Ollama is as simple as changing the "battery pack" (the service registration) without changing the buttons you press.

The Kernel as a Service Container

The Kernel instance in Semantic Kernel acts as a Dependency Injection (DI) container specifically tailored for AI orchestration. This builds directly upon the concept of the Kernel Instance introduced in Book 1, where we established the kernel as the central nervous system of our AI application.

In traditional software engineering, the DI container manages the lifecycle of objects. In AI Engineering, the kernel manages the lifecycle and configuration of intelligence sources. When we register a service, we are not merely storing an API key; we are defining a strategy for how the kernel resolves requests for intelligence.

Azure OpenAI (The Distributed Cloud): This represents a stateless, high-availability service. The kernel configuration here involves endpoints, API keys, and versioning. It relies on the network. It is the "Serverless" paradigm applied to intelligence. 2. Ollama (The Localized Edge): This represents a stateful, resource-constrained environment. The kernel configuration here involves local ports, model file names, and hardware acceleration (GPU/CPU). It relies on local resources. It is the "Embedded" paradigm applied to intelligence.

The Role of Modern C# Features in Configuration

To manage this complexity efficiently, we utilize modern C# features, specifically Records and Configuration Binding.

1. Configuration Records (Immutability)

In older C# versions, configuration was often stored in mutable classes or dictionaries, leading to runtime errors if a property was overwritten. In modern AI applications, we treat configuration as immutable data. Once the kernel is built, its configuration should be frozen to ensure deterministic behavior.

using System.ComponentModel;

// Using a Record for immutable configuration data
public record AzureOpenAIConfig(string DeploymentName, string Endpoint, string ApiKey) : ServiceConfig;
public record OllamaConfig(string ModelName, string Endpoint) : ServiceConfig;

[Description("Base class for service configuration")]
public abstract record ServiceConfig;

Why this matters: If you pass a mutable configuration object to the kernel, and another thread modifies the API key or endpoint while the kernel is processing a request, you introduce race conditions. By using record types, we enforce immutability, ensuring that the "Translator" does not change dialect halfway through the conversation.

2. The Adapter Pattern via Dependency Injection

The KernelBuilder uses the Adapter Pattern to wrap external services. When you call .AddAzureOpenAIChatCompletion(...), you are registering an adapter that translates the Semantic Kernel's internal prompt handling logic into the specific API contract of Azure OpenAI.

The "What If" Scenario: What if Azure OpenAI introduces a new v2024-08-01-preview API that changes the JSON structure of the response? If you had hardcoded HTTP calls throughout your application, you would have to refactor every call site. With the Kernel configuration approach, you only update the Adapter (the service connector). The rest of your application, which calls IChatCompletionService.GetChatMessageContentsAsync, remains untouched.

Visualizing the Configuration Flow

The following diagram illustrates the flow of a request through the abstraction layers. Note how the Kernel acts as a firewall between the application logic and the specific provider implementation.

A request flows from the Application Logic through the Kernel—acting as a firewall—before being routed to the specific Provider implementation.

Deep Dive: Token Management and Latency Implications

Configuration is not just about connectivity; it is about performance characteristics. When we configure the kernel, we implicitly accept the latency and token constraints of the underlying model.

Token Usage: In Azure OpenAI, token usage is a billing metric. The kernel configuration must be precise to avoid "prompt injection" costs. By using the PromptTemplateEngine (discussed in Book 3), we can optimize prompts before they reach the model. However, the choice of model (e.g., gpt-4 vs gpt-3.5-turbo) dictates the token limit per request. The kernel must be aware of these limits to truncate or summarize context effectively.

Latency: Theoretical Implication: When configuring the kernel for Ollama, we often enable streaming (GetStreamingChatMessageContentsAsync) to provide a perceived faster response to the user, as the local model generates tokens sequentially. For Azure, we might use the same streaming approach to reduce "Time to First Token" (TTFT), but the network overhead remains a constant factor.

The "Why" of Configuration Strategy

Testing and CI/CD: By abstracting the configuration, we can inject a "Mock" service connector during unit testing. This allows us to test the logic of our AI agents without actually calling an expensive LLM or requiring a local Ollama instance. 2. Fallback Strategies: Advanced configurations allow for "Circuit Breaker" patterns. If the Azure OpenAI service is down (or throttled), the kernel configuration can be set up to automatically failover to a backup model (perhaps a smaller, locally hosted model via Ollama) to ensure the application remains functional. 3. Context Awareness: The configuration defines the "Persona" of the AI. By binding the kernel to specific execution settings (Temperature, TopP, FrequencyPenalty), we tune the probabilistic nature of the model. Azure and Ollama may interpret these parameters slightly differently; the abstraction layer normalizes them.

Theoretical Foundations

The theoretical foundation of this chapter is that intelligence in software is a pluggable resource. By leveraging C# interfaces and dependency injection, we treat the AI model not as a monolithic entity, but as a swappable component. This allows us to architect systems that are resilient to provider outages, cost-effective in development, and scalable in production.

Whether the kernel is pointing to a cloud endpoint or a local process, the application's logic remains agnostic, focusing purely on the orchestration of thought rather than the mechanics of transmission.

Basic Code Example

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Connectors.OpenAI;
using Microsoft.SemanticKernel.Connectors.Ollama;
using System.ComponentModel;

// --- CONFIGURATION ---
// In a real application, these would come from secure configuration (e.g., Azure Key Vault, appsettings.json)
const string AZURE_OPENAI_DEPLOYMENT_NAME = "gpt-4o-mini";
const string AZURE_OPENAI_ENDPOINT = "https://your-resource.openai.azure.com/";
const string AZURE_OPENAI_API_KEY = "YOUR_AZURE_API_KEY";

const string OLLAMA_MODEL_NAME = "llama3.2";
const string OLLAMA_ENDPOINT = "http://localhost:11434";

// --- KERNEL FACTORY PATTERN ---
// This method encapsulates the logic for creating a Kernel instance configured for a specific provider.
// It demonstrates the abstraction layer: the calling code doesn't need to know which provider is used.
public static class KernelFactory
{
    public static Kernel CreateAzureOpenAIService()
    {
        // 1. Instantiate the Kernel builder.
        var builder = Kernel.CreateBuilder();

        // 2. Add the Azure OpenAI Chat Completion service.
        // This registers the service with the DI container within the kernel.
        builder.AddAzureOpenAIChatCompletion(
            deploymentName: AZURE_OPENAI_DEPLOYMENT_NAME,
            endpoint: AZURE_OPENAI_ENDPOINT,
            apiKey: AZURE_OPENAI_API_KEY);

        // 3. Build the Kernel.
        return builder.Build();
    }

    public static Kernel CreateOllamaService()
    {
        // 1. Instantiate the Kernel builder.
        var builder = Kernel.CreateBuilder();

        // 2. Add the Ollama Chat Completion service.
        // Note: Ollama typically runs locally, so no API key is required by default.
        builder.AddOllamaChatCompletion(
            modelId: OLLAMA_MODEL_NAME,
            endpoint: new Uri(OLLAMA_ENDPOINT));

        // 3. Build the Kernel.
        return builder.Build();
    }
}

// --- PLUG-IN DEFINITION ---
// A simple plugin to demonstrate kernel execution.
public class TimePlugin
{
    [KernelFunction, Description("Retrieves the current local time.")]
    public string GetCurrentTime() => DateTime.Now.ToString("T");
}

// --- MAIN EXECUTION LOGIC ---
// This simulates an application that needs to perform an AI task.
// It abstracts the provider choice, allowing us to swap Azure OpenAI for Ollama with minimal code change.
public class Program
{
    public static async Task Main(string[] args)
    {
        Console.WriteLine("Select AI Provider:\n1. Azure OpenAI\n2. Ollama");
        var choice = Console.ReadLine();

        // Determine which Kernel to create based on user input.
        // In a production app, this decision might be driven by configuration flags or runtime conditions.
        Kernel kernel = choice?.Trim() == "2" 
            ? KernelFactory.CreateOllamaService() 
            : KernelFactory.CreateAzureOpenAIService();

        // Import the plugin into the kernel.
        kernel.ImportPluginFromObject(new TimePlugin(), "time");

        // Define the prompt. 
        // We use a simple prompt that asks the AI to utilize the 'time' plugin.
        string prompt = "What is the current time? Please use the time plugin.";

        Console.WriteLine($"\n--- Executing Prompt: '{prompt}' ---");

        // Create execution settings.
        // We explicitly request the AI to call a function if necessary.
        var executionSettings = new OpenAIPromptExecutionSettings
        {
            ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions
        };

        try
        {
            // Invoke the kernel. This is the core interaction point.
            // The kernel routes the prompt to the configured AI service, 
            // processes the response (including function calls), and returns the result.
            var result = await kernel.InvokePromptAsync(prompt, executionSettings);

            Console.WriteLine($"\n--- Result ---");
            Console.WriteLine(result);
        }
        catch (Exception ex)
        {
            Console.WriteLine($"\n--- Error ---");
            Console.WriteLine($"An error occurred: {ex.Message}");
            // In a real app, log the full exception stack trace.
        }
    }
}

Detailed Line-by-Line Explanation

1. Imports and Configuration

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Connectors.OpenAI;
using Microsoft.SemanticKernel.Connectors.Ollama;
using System.ComponentModel;

// --- CONFIGURATION ---
const string AZURE_OPENAI_DEPLOYMENT_NAME = "gpt-4o-mini";
const string AZURE_OPENAI_ENDPOINT = "https://your-resource.openai.azure.com/";
const string AZURE_OPENAI_API_KEY = "YOUR_AZURE_API_KEY";

const string OLLAMA_MODEL_NAME = "llama3.2";
const string OLLAMA_ENDPOINT = "http://localhost:11434";
  **Imports**: We import the core `Microsoft.SemanticKernel` namespace, along with specific connector namespaces for both Azure OpenAI (`Connectors.OpenAI`) and Ollama (`Connectors.Ollama`). `System.ComponentModel` is used for decorating plugin methods with descriptions.
*   **Configuration Constants**: We define constants for the endpoints and keys. In a production environment, these should never be hardcoded. They should be injected via dependency injection or retrieved from a secure secret store like Azure Key Vault. This example hardcodes them for simplicity and self-containment.

#### 2. The Kernel Factory Pattern
```csharp
public static class KernelFactory
{
    public static Kernel CreateAzureOpenAIService()
    {
        var builder = Kernel.CreateBuilder();
        builder.AddAzureOpenAIChatCompletion(
            deploymentName: AZURE_OPENAI_DEPLOYMENT_NAME,
            endpoint: AZURE_OPENAI_ENDPOINT,
            apiKey: AZURE_OPENAI_API_KEY);
        return builder.Build();
    }

    public static Kernel CreateOllamaService()
    {
        var builder = Kernel.CreateBuilder();
        builder.AddOllamaChatCompletion(
            modelId: OLLAMA_MODEL_NAME,
            endpoint: new Uri(OLLAMA_ENDPOINT));
        return builder.Build();
    }
}

Purpose: This class encapsulates the complexity of initializing the Kernel. It acts as a Factory, a design pattern that creates objects without exposing the instantiation logic to the client. * CreateAzureOpenAIService: 1. Kernel.CreateBuilder(): Initializes a new instance of the KernelBuilder, which is the standard way to configure a Kernel in Semantic Kernel. 2. builder.AddAzureOpenAIChatCompletion(...): This is the specific configuration for Azure OpenAI. It requires the deployment name (the specific model you deployed in Azure), the endpoint URL, and the API key. This method registers the chat completion service with the kernel's internal dependency injection container. 3. return builder.Build(): Finalizes the configuration and returns the configured Kernel instance. * CreateOllamaService: 1. Similar to the Azure method, it starts with a builder. 2. builder.AddOllamaChatCompletion(...): This configures the kernel to use a local Ollama instance. It requires the model ID (e.g., "llama3.2") and the endpoint where Ollama is listening (usually http://localhost:11434). Note that Ollama does not require an API key by default. 3. return builder.Build(): Returns the configured Kernel. * Architectural Implication: By using this factory pattern, the rest of your application code doesn't need to know how the Kernel is created. It simply asks for a Kernel instance. This makes swapping providers trivial—change the factory method call, and the rest of the logic remains untouched.

3. The Plugin Definition

public class TimePlugin
{
    [KernelFunction, Description("Retrieves the current local time.")]
    public string GetCurrentTime() => DateTime.Now.ToString("T");
}

TimePlugin Class: This is a simple C# class that we intend to make available to the AI model. * [KernelFunction] Attribute: This attribute is crucial. It marks the GetCurrentTime method as a function that the Semantic Kernel can expose to the AI model. Without this, the Kernel would not recognize the method as a callable tool. * [Description(...)] Attribute: This provides a natural language description of what the function does. The AI model uses this description to understand when it should call this function. For example, if the user asks "What time is it?", the model sees the description "Retrieves the current local time" and decides to invoke the function. * Method Implementation: The method simply returns the current time formatted to show hours, minutes, and seconds.

4. Main Execution Logic

public class Program
{
    public static async Task Main(string[] args)
    {
        Console.WriteLine("Select AI Provider:\n1. Azure OpenAI\n2. Ollama");
        var choice = Console.ReadLine();

        Kernel kernel = choice?.Trim() == "2" 
            ? KernelFactory.CreateOllamaService() 
            : KernelFactory.CreateAzureOpenAIService();

        kernel.ImportPluginFromObject(new TimePlugin(), "time");

        string prompt = "What is the current time? Please use the time plugin.";

        var executionSettings = new OpenAIPromptExecutionSettings
        {
            ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions
        };

        try
        {
            var result = await kernel.InvokePromptAsync(prompt, executionSettings);
            Console.WriteLine($"\n--- Result ---");
            Console.WriteLine(result);
        }
        catch (Exception ex)
        {
            Console.WriteLine($"\n--- Error ---");
            Console.WriteLine($"An error occurred: {ex.Message}");
        }
    }
}

Provider Selection: The application starts by asking the user to select an AI provider. This simulates a real-world scenario where an application might support multiple backends (e.g., a cloud version using Azure and a local/offline version using Ollama). * Kernel Instantiation: Based on the user's choice, the KernelFactory is used to create the appropriate Kernel instance. This is the key demonstration of the abstraction layer. * Plugin Import: kernel.ImportPluginFromObject(new TimePlugin(), "time") adds the TimePlugin to the kernel. The first argument is the plugin instance, and the second ("time") is the namespace under which the plugin's functions will be available to the AI. The model will refer to the function as time.GetCurrentTime. * Prompt and Execution Settings: * prompt: The natural language request sent to the AI. * OpenAIPromptExecutionSettings: This class configures how the prompt is processed. We set ToolCallBehavior to AutoInvokeKernelFunctions. This instructs the kernel to automatically handle the function calling loop: send the prompt to the model, if the model requests a function call, execute it locally, send the result back to the model, and finally return the natural language response. * Kernel Invocation: kernel.InvokePromptAsync is the core method that orchestrates the entire process. It communicates with the configured AI service (Azure or Ollama), handles the function calling logic if needed, and returns the final result. * Error Handling: The try-catch block is essential for handling potential issues, such as network errors, invalid API keys, or model unavailability.

Missing [KernelFunction] Attribute: A frequent mistake is creating a plugin method without the [KernelFunction] attribute. The Semantic Kernel will not recognize the method as a callable function, and the AI model will not see it in its list of available tools. This often leads to the model ignoring the plugin entirely or stating it cannot perform the requested action. 2. Incorrect Endpoint Configuration: For Ollama, the endpoint must be accessible from where the application is running. A common error is using http://localhost:11434 in a containerized environment (like Docker) where "localhost" refers to the container itself, not the host machine. In such cases, the endpoint should be the host's IP or a Docker service name. 3. API Key Mismanagement: Hardcoding API keys directly in the source code is a severe security risk. In production, always use environment variables, secure configuration files, or secret management services. 4. Ignoring Model Context Length: Different models have different context window sizes (measured in tokens). If you provide a prompt that, combined with the chat history and function descriptions, exceeds this limit, the API will return an error. Always be mindful of the token usage, especially when using plugins with complex descriptions. 5. Not Handling Asynchronous Operations Correctly: The Semantic Kernel's core methods are asynchronous. Forgetting to use await or trying to run synchronous code in an async context can lead to deadlocks or runtime exceptions.

Visualizing the Kernel Configuration Flow

A diagram illustrating the Semantic Kernel configuration flow would show how asynchronous kernel methods (like InvokeAsync) require proper await handling to prevent deadlocks and runtime exceptions, contrasting correct asynchronous execution with the pitfalls of synchronous misuse. — A diagram illustrating the Semantic Kernel configuration flow would show how asynchronous kernel methods (like `InvokeAsync`) require proper `await` handling to prevent deadlocks and runtime exceptions, contrasting correct asynchronous execution with the pitfalls of synchronous misuse.

The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon

Loading knowledge check...

Code License: All code examples are released under the MIT License. Github repo.

All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.