Chapter 9: Prompt Engineering Techniques in C# (Few-Shot, CoT)

Theoretical Foundations

The theoretical foundation of prompt engineering within the Microsoft Semantic Kernel rests on a fundamental shift in perspective: we are no longer merely calling an AI model; we are orchestrating a computational process. In earlier chapters, specifically when discussing Book 7: The Core of AI Engineering - Microsoft Semantic Kernel & Agentic Patterns, we established the architecture of the Kernel itself—the dependency injection, the connector patterns, and the basic function calling. However, raw model access is insufficient for deterministic, reliable enterprise applications. To bridge the gap between probabilistic language models and deterministic software logic, we must apply rigorous prompt engineering techniques directly within the C# type system.

This subsection focuses on the "Theoretical Foundations" of Few-Shot Prompting and Chain of Thought (CoT) reasoning. These are not mere text-wrangling tricks; they are mechanisms to manipulate the attention and inference pathways of the underlying Large Language Model (LLM) by leveraging the structural advantages of the C# language.

The Nature of Probabilistic Inference vs. Deterministic Logic

To understand why prompt engineering is necessary, we must first visualize the LLM not as a database, but as a high-dimensional probability distribution. When you send a raw query to a model, you are effectively asking it to complete a pattern based on its training data. Without guidance, the model relies on the most statistically likely completion, which is often generic, shallow, or hallucinated.

In C# engineering, we are accustomed to strict contracts. We define interfaces, classes, and methods with explicit types. The compiler enforces these contracts. An LLM has no compiler; it has a "context window." The prompt engineering techniques discussed here serve as the "compiler" for the AI, enforcing structure and logic before the request ever leaves the application.

Few-Shot Prompting: The "Show, Don't Tell" Paradigm

Few-Shot Prompting is the technique of providing the model with a set of input-output examples within the prompt itself before asking it to perform a task on new data. In the context of C# and Semantic Kernel, this moves beyond simple string concatenation.

Theoretical Foundations

In a zero-shot scenario (asking the model to perform a task without examples), the model must infer the user's intent from the instruction alone. This is prone to ambiguity. For instance, if we ask a model to "extract entities," it might return a list of names, dates, or a summary, depending on its training bias.

Few-Shot Prompting works by establishing a "local context" within the prompt. It essentially performs meta-learning at inference time. By providing examples, we are narrowing the model's search space. We are forcing the model to recognize a pattern and replicate it.

In C#, we can formalize this using the Strategy Pattern or the Template Method Pattern. We don't just hardcode strings. We define a structure where examples are first-class citizens.

Consider the analogy of teaching a new junior developer. If you simply hand them a spec sheet that says "Write clean code," the result is unpredictable. If you hand them a spec sheet and three code reviews of previous PRs showing exactly what "clean code" means in your repository (variable naming, structure, error handling), their output will align with your expectations. Few-Shot Prompting is that code review embedded directly into the execution context.

Architectural Implications in C

In Semantic Kernel, we treat Few-Shot examples as data. We encapsulate them in classes. This allows us to swap out example sets dynamically based on the user's context or the specific domain (e.g., medical vs. legal extraction) without changing the underlying model call.

This relies heavily on Records (introduced in C# 9) and Immutable Collections. Because prompts are often cached or logged, immutability ensures that the examples provided to the model cannot be accidentally mutated during the orchestration pipeline.

using System.Collections.Immutable;

namespace AI.Engineering.Core.Prompts
{
    // Using Records for immutable data transfer objects representing Few-Shot examples.
    // This ensures thread-safety when the Kernel processes parallel requests.
    public record FewShotExample(string Input, string Output);

    public class EntityExtractionPrompt
    {
        public string TaskDescription { get; init; } = "Extract the primary entity and sentiment from the text.";

        // We use an ImmutableList to guarantee that the examples provided 
        // to the model remain constant throughout the prompt construction lifecycle.
        public ImmutableList<FewShotExample> Examples { get; init; } = ImmutableList<FewShotExample>.Empty;

        // The construction of the final prompt string is a pure function.
        // Given the same inputs, it produces the exact same prompt structure.
        public string BuildPrompt(string query)
        {
            // Theoretical construction of the prompt string
            // 1. System Role / Task Description
            // 2. Iteration of Examples (Input -> Output)
            // 3. Delimiter (e.g., "---")
            // 4. The actual user query

            // In a real implementation, this would use a StringBuilder or 
            // a templating engine like Handlebars.Net to avoid GC pressure.
            return $"{TaskDescription}\n\n" +
                   string.Join("\n", Examples.Select(ex => $"Input: {ex.Input}\nOutput: {ex.Output}")) +
                   $"\n---\nInput: {query}\nOutput:";
        }
    }
}

Chain of Thought (CoT): Decomposing Complexity

While Few-Shot provides examples, Chain of Thought provides reasoning steps. CoT prompting encourages the model to generate intermediate reasoning steps before arriving at a final answer. This is crucial for mathematical problems, logical deduction, and complex planning.

Theoretical Foundations

The "Why" behind CoT is rooted in the autoregressive nature of transformers. The model generates one token at a time. If we ask for a final answer immediately, the model must compress all logical reasoning into a single inference step, which often overflows its "working memory" (the attention window).

By forcing the model to output intermediate steps (e.g., "First, I need to calculate X, then I apply Y..."), we allow the model to use its own output as context for the next token. This effectively offloads the working memory from the model's internal state to the generated text stream.

The "Grandma's Recipe" Analogy

Zero-Shot: You ask, "How do I make cookies?" She might reply, "Mix flour, sugar, and eggs, then bake." This is technically correct but useless for a beginner. * Chain of Thought: You ask, "Walk me through the process step-by-step." She says, "First, preheat the oven to 350. While that heats, cream the butter and sugar. Then, add the eggs one at a time..." Now you have a logical flow where Step A sets up Step B.

In software terms, CoT is the equivalent of breaking a complex LINQ query into multiple intermediate variables or using the let keyword in a query expression. It makes the execution traceable and debuggable.

C# Implementation Strategy: The Planner Pattern

In Semantic Kernel, we don't just ask the model to "think step-by-step" in a raw string. We structure the prompt to enforce a specific output format that represents the chain of thought. We often use JSON Schema or XML to structure these thoughts, which C# can then parse strongly.

We can utilize C# Source Generators to dynamically create prompts that enforce CoT. By analyzing a method's parameters and return type, a Source Generator could theoretically inject CoT instructions into the prompt template at compile time.

However, a more common runtime approach involves the Function Calling capabilities of the Kernel (introduced in Book 7). Instead of asking the model to output text that describes reasoning, we ask the model to invoke a sequence of native C# functions. Each function invocation represents a step in the chain of thought.

Unifying Concepts: Plugins as Reusable Prompt Structures

The ultimate goal of this chapter is to move these techniques out of ad-hoc strings and into reusable C# plugins. This is where the architecture of the application solidifies.

In previous chapters, we discussed Kernel Functions. We can now define a Kernel Function that encapsulates a CoT prompt or a Few-Shot prompt. This creates a "Prompt as Code" paradigm.

The Interface Analogy

Just as an IEnumerable<T> allows you to swap between an array and a list without changing the consuming code, a Prompt Plugin allows you to swap the underlying reasoning strategy (Few-Shot vs. Zero-Shot vs. CoT) or even the model provider (OpenAI vs. Azure OpenAI vs. Local Llama) without changing the agentic logic.

When we define a plugin in C#, we are essentially creating a wrapper around the LLM's inference engine. The C# method signature acts as the strict interface contract.

using Microsoft.SemanticKernel;
using System.ComponentModel;

namespace AI.Engineering.Core.Plugins
{
    public class ReasoningPlugin
    {
        // This method acts as the interface for the AI's reasoning process.
        // The Description attribute is crucial: it's the prompt the LLM uses 
        // to decide WHEN to call this function.
        [Description("Solves a complex logical problem by breaking it down into intermediate steps.")]
        [KernelFunction]
        public async Task<string> SolveWithCoTAsync(
            Kernel kernel, 
            [Description("The complex problem to solve")] string problem)
        {
            // Theoretical implementation:
            // 1. Construct a prompt that enforces Chain of Thought.
            // 2. Send to the kernel.
            // 3. Parse the reasoning steps.
            // 4. Return the final answer.

            // In practice, this method would likely orchestrate multiple 
            // internal calls to the LLM or other native functions.
            return await Task.FromResult("Simulated CoT Result"); 
        }
    }
}

Visualizing the Flow

To visualize how Few-Shot and CoT integrate into the C# execution flow, we can look at the data transformation pipeline. The C# application acts as the orchestrator, transforming user input into a structured prompt, sending it to the model, and parsing the output back into types.

This diagram illustrates a C# application orchestrating a pipeline where user input is transformed into a structured prompt, sent to an AI model, and the resulting output is parsed back into types.

Edge Cases and Architectural Considerations

Token Limits and Context Compression: Few-Shot examples consume tokens. If you provide too many examples, you may exceed the model's context window. In C#, we must implement logic to dynamically select the most relevant examples (perhaps using vector search, as discussed in previous books on RAG) rather than blindly appending all examples. 2. CoT Hallucination: CoT allows the model to "ramble." If the intermediate steps are wrong, the final answer will be wrong. In a C# agentic pattern, we should implement Validation Middleware. This middleware would parse the CoT steps and, using a smaller/faster model or heuristic checks, verify the logical consistency of each step before proceeding. 3. Type Safety: The output of an LLM performing CoT is often verbose text. We rely on C# System.Text.Json or custom parsers to extract the final answer. If the model deviates from the expected output format (e.g., it forgets to wrap the answer in JSON tags), the application must have fallback logic. This is where resilience patterns (like Polly) integrate with prompt engineering.

Conclusion

The theoretical foundation of Prompt Engineering in C# is the application of software engineering principles—encapsulation, immutability, and interface design—to the inherently non-deterministic world of LLMs. By using Few-Shot Prompting, we provide the model with explicit examples, narrowing its search space to match our domain. By using Chain of Thought, we decompose complex problems into manageable steps, leveraging the autoregressive nature of transformers. Finally, by encapsulating these techniques within Semantic Kernel Plugins, we create reusable, maintainable, and robust AI capabilities that integrate seamlessly with the rest of the .NET ecosystem.

Basic Code Example

Here is a simple, self-contained 'Hello World' level code example demonstrating Few-Shot Prompting using Microsoft Semantic Kernel in C#.

The Problem: Automated Customer Support Categorization

Imagine you are building an automated support system for a tech company. You need to route incoming user tickets to the correct department (e.g., "Billing", "Technical Support", "General Inquiry"). A standard keyword search is often too rigid and misses context. Instead, we will use Few-Shot Prompting to provide the AI with explicit examples of how to classify text, ensuring it learns the pattern before making a prediction.

Code Example

This example uses the Semantic Kernel SDK. Ensure you have the Microsoft.SemanticKernel NuGet package installed.

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
using System;
using System.Threading.Tasks;

class Program
{
    static async Task Main(string[] args)
    {
        // 1. SETUP: Initialize the Kernel with a chat completion service.
        // In a real scenario, configure this with your Azure OpenAI or OpenAI API key.
        var kernel = Kernel.CreateBuilder()
            .AddOpenAIChatCompletion(
                modelId: "gpt-4o-mini", // Or any available model
                apiKey: "fake-api-key-for-demo") 
            .Build();

        // 2. FEW-SHOT PROMPT CONSTRUCTION:
        // We manually construct a prompt that includes "shots" (examples).
        // This guides the model on the expected input/output format.
        string fewShotPrompt = """
            You are a helpful assistant that categorizes support tickets.
            Classify the following ticket into one of these categories: 
            [Billing, Technical Support, General Inquiry].

            Examples (Few-Shots):
            Ticket: "I can't login to my account."
            Category: Technical Support

            Ticket: "Where is my invoice for last month?"
            Category: Billing

            Ticket: "How do I reset my password?"
            Category: Technical Support

            Ticket: "I have a feature request."
            Category: General Inquiry

            Now classify this new ticket:
            Ticket: "{{ $input }}"
            Category:
            """;

        // 3. AGENT CREATION:
        // Create a chat completion agent using the kernel and the custom prompt.
        // We use the ChatCompletionAgent class which leverages the IChatCompletionService.
        var agent = new ChatCompletionAgent(
            kernel: kernel,
            instructions: fewShotPrompt);

        // 4. EXECUTION:
        // Simulate a user query and pass it to the agent.
        // We use the ChatHistory object to maintain context (though here it's a single turn).
        string userTicket = "My internet connection keeps dropping.";

        Console.WriteLine($"Input Ticket: \"{userTicket}\"");
        Console.WriteLine("Processing with Few-Shot Agent...\n");

        // The agent invokes the LLM with the constructed prompt.
        var response = await agent.InvokeAsync(userTicket);

        // 5. OUTPUT:
        // The response should ideally be "Technical Support" based on the pattern 
        // established in the examples (login issues, password resets).
        Console.WriteLine($"Predicted Category: {response.Content}");
    }
}

Kernel.CreateBuilder(): * This initializes the Semantic Kernel builder pattern. It is the entry point for configuring dependencies, services, and plugins. Even for a simple example, the Kernel acts as the orchestrator.

.AddOpenAIChatCompletion(...):
- This registers the IChatCompletionService into the kernel's service collection. In a production environment, you would likely use AddAzureOpenAIChatCompletion for enterprise-grade security and scalability. For this demo, we mock the API key.
string fewShotPrompt (Raw String Literal):
- We use C# 11+ raw string literals ("""...""") to define the prompt cleanly without escaping quotes.
- Role Definition: "You are a helpful assistant..." sets the system behavior.
- Constraints: We explicitly list the allowed categories. This is a form of constrained decoding via prompt engineering.
- The "Shots": The block labeled "Examples" contains pairs of inputs and expected outputs. This is the core of Few-Shot learning. The model analyzes these to infer the logic (e.g., "login" -> "Technical Support").
- {{ $input }}: This is a template variable. In a real Semantic Kernel plugin, this would be bound to a function parameter. Here, we simulate the injection of the user's specific ticket.
new ChatCompletionAgent(...):
- We instantiate an Agent. The ChatCompletionAgent is a specialized class in Semantic Kernel designed to wrap an IChatCompletionService.
- We pass the instructions (our fewShotPrompt) and the configured kernel. This decouples the logic from the underlying model implementation.
agent.InvokeAsync(userTicket):
- This triggers the AI model. The kernel combines the instructions with the user input, sends it to the configured LLM, and waits for the token stream to complete.
- The model processes the few-shot examples, recognizes the semantic similarity between "internet connection keeps dropping" and the previous technical examples, and outputs the category.

Common Pitfalls

1. Hallucination due to Ambiguous Examples If your few-shot examples are contradictory or vague, the model will produce inconsistent results. For instance, if one example maps "login" to "Technical Support" and another maps "login" to "Billing" (perhaps regarding a login fee), the model will struggle. Always ensure your examples are consistent and representative of the complexity you expect in production data.

2. Token Limit Overflows Few-shot prompting increases the prompt length. LLMs have a finite context window (e.g., 8k, 32k, or 128k tokens). If you provide too many examples (e.g., 50 shots), you may exceed the token limit, causing the model to truncate the earliest examples or fail entirely. For large-scale systems, use Fine-Tuning instead of Few-Shot prompting.

3. Hardcoded Prompts vs. Semantic Kernel Plugins In the example above, we hardcoded the prompt string. In a real agentic architecture, you should wrap this logic in a KernelFunction. This allows the prompt to be version-controlled, tested, and reused across different agents or workflows. Hardcoding makes refactoring difficult and breaks the modularity of the AI system.

Visualizing the Flow

The following diagram illustrates the data flow within the Semantic Kernel architecture for this example.

The diagram illustrates the Semantic Kernel's modular flow, where a planner dynamically selects and chains plugins to process user input, contrasting the rigidity of a hardcoded approach.

The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon

Loading knowledge check...

Code License: All code examples are released under the MIT License. Github repo.

All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.