Chapter 20: Capstone - Building a Full 'Jarvis' Assistant for Windows
Theoretical Foundations
The theoretical foundation for building a "Jarvis" assistant using Microsoft Semantic Kernel and C# rests on a paradigm shift from static, deterministic programming to dynamic, probabilistic orchestration. In traditional software engineering, logic flows linearly: input → processing → output. In AI engineering, specifically with agentic patterns, we build systems that can reason, plan, and utilize tools to achieve a goal, where the path from input to output is not pre-defined but discovered at runtime.
The Agentic Cognitive Architecture
To understand how we build a desktop-native assistant, we must first understand the cognitive architecture of an AI agent. An agent is not merely a chatbot; it is an entity capable of perceiving its environment (via plugins), reasoning about it (via LLMs), and acting upon it (via function calling).
The Kernel (The Core Processor): The orchestrator that connects the AI model to the plugins. 2. The Plugins (The Senses and Limbs): The tools that allow the agent to interact with the world (both digital and physical). 3. The Memory (The Context): The store of persistent knowledge that allows the agent to personalize its behavior.
The Kernel as the Operating System of the Mind
Think of the Semantic Kernel not as a library, but as a microkernel operating system for AI. Just as Windows manages hardware resources and schedules processes, the Kernel manages LLM interactions and schedules function executions.
Model Selection: Deciding which "brain" (e.g., GPT-4, a local model) to use for a specific task. * Function Registration: Loading available tools (plugins) into the active context. * Execution Loop: Handling the iterative process of "Thought → Action → Observation."
Why this matters: By abstracting the LLM behind the Kernel interface, we decouple the logic of our assistant from the implementation of the AI model. This is crucial for future-proofing. As we discussed in Book 5, Chapter 12: "Model Abstraction and Provider Agnosticism," hard-coding calls to a specific API (like OpenAI) creates technical debt. The Kernel allows us to swap the underlying model provider without rewriting the orchestration logic.
The Concept of Native Plugins (The Bridge to Windows)
A "Jarvis" assistant is useless if it cannot control the environment it lives in. While an LLM can generate text, it cannot natively adjust the system volume, read a file, or send a notification. This is where Native Plugins come into play.
A Native Plugin in Semantic Kernel is a C# class annotated with specific attributes that map methods to AI-accessible functions. Unlike "Prompt Plugins" (which are YAML/JSON definitions of semantic functions), Native Plugins are compiled code.
The Analogy: The Butler vs. The Librarian Imagine an LLM as a brilliant Librarian. They know everything in books (training data) and can answer questions about them. However, they are sitting behind a desk. They cannot physically open a door, turn on a light, or fetch a physical book from a high shelf. A Native Plugin acts as the Butler who stands beside the Librarian. When the user asks, "Turn down the lights," the Librarian (LLM) understands the intent but hands the instruction to the Butler (Native Plugin). The Butler, having access to the house's electrical system (Windows APIs), executes the action.
The Technical Bridge: Function Calling How does the LLM communicate with the C# code? It uses a mechanism called Function Calling (or Tool Calling). When the Kernel sends a prompt to the model, it also sends a list of available functions (plugins) and their descriptions. If the model determines that executing a function is necessary to fulfill the user's request, it returns a structured response indicating which function to call and with what arguments.
The Kernel intercepts this response, executes the corresponding C# method, and feeds the result back into the model for the final response generation.
Agentic Patterns: Planners and the ReAct Loop
One of the core concepts in this capstone is moving from single-turn interactions to multi-step workflows. We achieve this using Planners.
A Planner is a strategy that takes a high-level goal and breaks it down into a sequence of executable steps (a plan). In the context of a "Jarvis" assistant, a user might say, "Organize my downloads folder and summarize the newest PDF."
List files in the Downloads directory.
2. Identify files with the .pdf extension.
3. Read the content of the most recent PDF.
4. Summarize the content.
5. Move the PDF to an "Archived" folder.
The ReAct Pattern Reasons: Analyzes the current state and the goal. 2. Acts: Selects a plugin to invoke. 3. Observes: Receives the output from the plugin. 4. Repeats: Continues until the goal is achieved.
This is a departure from traditional procedural programming. We do not write a for loop to iterate over files; we instruct the agent to "organize the folder," and the agent decides to loop internally. This is the essence of Generative AI Engineering: we engineer the capabilities (plugins) and the constraints (instructions), but the execution path is generated dynamically.
Stateful Memory: The Personalization Engine
A static assistant is a tool; a stateful assistant is a companion. To achieve the "Jarvis" persona, the assistant must remember previous interactions and user preferences.
In Semantic Kernel, this is handled through ISemanticTextMemory. However, for a desktop assistant, we go beyond simple text memory. We utilize vector databases (or local file-based vector stores) to enable semantic search.
The Analogy: The Elephant's Memory An LLM without memory is like a goldfish; it forgets everything the moment the conversation window closes. A vector store is like an elephant's memory—it doesn't just store facts; it stores associations. If you tell the assistant, "My preferred coding font is Fira Code," and later ask, "What font should I use for my IDE?", the vector store allows the assistant to retrieve the relevant memory based on semantic similarity, not just keyword matching.
Why Vector Embeddings? We convert text into high-dimensional vectors (embeddings). When a user asks a question, we convert the question into a vector and search the memory store for vectors that are "close" to it. This allows the assistant to retrieve contextually relevant information even if the exact phrasing differs.
Background Service Integration: The Daemon Pattern
Finally, for an assistant to feel like "Jarvis," it must be omnipresent but unobtrusive. It should not be a console window that opens and closes. It must be a Windows Background Service.
Lifetime Management: The service starts with the OS and runs until shutdown. This requires careful management of resources (like the LLM connection) to prevent memory leaks.
2. Asynchronous Processing: A background service must never block the main thread. All AI interactions (which can take seconds) must be async/await.
3. Inter-Process Communication (IPC): How does the background service interact with the UI? It might use named pipes, gRPC, or standard I/O to communicate with a foreground UI process if a visual interface is needed.
The Analogy: The Building's Infrastructure Think of the assistant as the HVAC (Heating, Ventilation, and Air Conditioning) system of a smart building. You don't interact with the HVAC directly (usually); it runs in the background, sensing the environment (temperature sensors) and adjusting the airflow (actuators) to maintain comfort. Similarly, the assistant runs silently, listening for triggers (voice, hotkeys) and adjusting the digital environment (files, apps, data) to maintain user productivity.
Architectural Visualization
To visualize how these components interact in the "Jarvis" system, we can map the flow of data and control.
Deep Dive: Modern C# Features in AI Engineering
To build this system effectively, we leverage modern C# features that align perfectly with the asynchronous and functional nature of AI workflows.
1. IAsyncEnumerable<T> and Streaming
AI responses are rarely instantaneous. When an LLM generates a response, it does so token-by-token. In a desktop application, blocking the UI thread while waiting for the full response creates a poor user experience.
Modern C# provides IAsyncEnumerable<T>, which allows us to iterate over a sequence asynchronously.
Why it matters for AI: It enables "typewriter effects" in the UI. We can stream the AI's output directly to the screen as it is generated, rather than waiting for the entire block of text to be ready. This reduces perceived latency and makes the assistant feel more responsive.
// Conceptual usage of streaming AI responses
public async Task StreamResponseToUser(string prompt)
{
// Get the streaming result from the Kernel
await foreach (var content in kernel.RunStreamingAsync(prompt))
{
// Append to UI immediately
Console.Write(content);
}
}
2. Records and Immutability for Memory
When storing context or user preferences, we want to ensure that data isn't accidentally mutated. C# record types provide value-based equality and immutability (by default).
Why it matters for AI: When saving a user's preference (e.g., "Theme: Dark Mode") to a vector store, we encapsulate this in a record. This ensures that the data structure remains consistent and thread-safe when multiple agents might be accessing the memory store simultaneously.
// Defining a memory entry as a record
public record UserPreference(string Key, string Value, DateTime CreatedAt);
3. Source Generators for Zero-Overhead Reflection
Semantic Kernel uses reflection heavily to discover plugins. However, reflection can be slow. Modern C# Source Generators allow us to generate code at compile-time based on attributes.
Why it matters for AI: By using Source Generators (or the newer AOT compilation support in .NET 8+), we can drastically reduce the startup time of our background service. A "Jarvis" assistant must start instantly; we cannot afford seconds of JIT compilation or reflection scanning when the user presses a hotkey.
4. ValueTask for High-Performance I/O
In a background service that might handle thousands of micro-requests (checking system stats, processing clipboard changes), allocating Task objects can generate garbage collection pressure.
Why it matters for AI: Many plugin operations (like checking a file path) are synchronous or complete synchronously. Using ValueTask instead of Task avoids heap allocations in these hot paths, keeping the assistant lightweight and efficient.
// A plugin method that might complete synchronously
[KernelFunction]
public ValueTask<string> CheckSystemStatusAsync()
{
// Fast operation, no async overhead needed
string status = System.Environment.MachineName;
return ValueTask.FromResult(status);
}
Theoretical Foundations
The LLM is the reasoning engine (the brain). 2. Native C# Plugins are the interface to the Windows OS (the senses and limbs). 3. The Semantic Kernel is the glue that binds them (the nervous system). 4. Vector Memory provides personalized context (the long-term memory). 5. Modern C# provides the performance and async capabilities to make it all run smoothly in the background (the circulatory system).
By understanding these theoretical underpinnings, we move from merely "using" an AI to "engineering" an AI system that is robust, scalable, and truly integrated into the user's digital life.
Basic Code Example
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Memory;
using Microsoft.SemanticKernel.Connectors.Memory.Sqlite;
using System.Text;
using System.Text.Json;
namespace JarvisMini;
public class Program
{
// Entry point for the console application
public static async Task Main(string[] args)
{
// 1. Setup: Initialize the Kernel with essential services
var kernel = new KernelBuilder()
// Using OpenAI GPT-3.5 Turbo for text generation (requires environment variable OPENAI_API_KEY)
.WithOpenAIChatCompletionService("gpt-3.5-turbo", Environment.GetEnvironmentVariable("OPENAI_API_KEY")!)
// Using a local SQLite database for persistent memory storage
.WithMemoryStorage(new SqliteMemoryStorage("jarvis_memory.db"))
.Build();
// 2. Context: Load or create persistent user preferences
// We simulate a "real-world" scenario where the assistant remembers user preferences.
// We use a specific collection name to organize our data.
const string memoryCollection = "UserPreferences";
// Retrieve the user's preferred greeting style
var greetingPreference = await kernel.Memory.GetAsync(memoryCollection, "GreetingStyle");
string greetingStyle;
if (greetingPreference == null)
{
// First run: Default to formal, and save it for next time
greetingStyle = "formal";
await kernel.Memory.SaveInformationAsync(memoryCollection, "formal", "GreetingStyle");
}
else
{
greetingStyle = greetingPreference.Metadata.Text;
}
// 3. Agentic Workflow: Define a simple plan for the assistant
// In a full Jarvis system, the Planner would break down complex requests.
// Here, we simulate a plan to handle a user request to "Summarize my day".
var userRequest = "Summarize my day based on the meeting notes I have in memory.";
// 4. Execution: Orchestrating the Kernel to process the request
Console.WriteLine($"[System]: Assistant initialized. Memory loaded. Style: {greetingStyle}");
Console.WriteLine($"[User]: {userRequest}");
// Create a prompt template that incorporates the retrieved memory context
// This demonstrates "Stateful Memory & Personalization"
var promptTemplate = $"""
You are a helpful desktop assistant.
The user prefers a {{$style}} greeting style.
The user request is: "{{$input}}".
Please respond to the user request.
If the request mentions memory or notes, acknowledge that you are retrieving context.
Keep the response concise and in the style defined by the greeting preference.
""";
// Execute the function
var result = await kernel.RunAsync(
promptTemplate,
new KernelArguments(new KernelPromptTemplateConfig())
{
["input"] = userRequest,
["style"] = greetingStyle
}
);
// 5. Output: Displaying the result
Console.WriteLine($"\n[Jarvis]: {result.Result}");
// 6. Demonstration of Plugin Integration (Simulated)
// In the full capstone, we would invoke native C# plugins here.
// For this example, we simulate a "SystemNotification" plugin call.
await SimulateSystemNotification(kernel, "Jarvis Assistant", "Task completed successfully.");
}
// Simulates a native Windows system plugin (e.g., Toast Notification)
private static async Task SimulateSystemNotification(IKernel kernel, string title, string message)
{
// In a real scenario, this would be a native C# method registered as a skill.
// We are using a local function here to keep the example self-contained.
Console.WriteLine($"\n[System Plugin]: Triggering notification -> Title: {title}, Message: {message}");
await Task.CompletedTask; // Simulating async IO
}
}
Detailed Explanation
This code example demonstrates the foundational architecture of a desktop-native AI assistant using Microsoft Semantic Kernel. It focuses on three pillars: Orchestration, Persistent Memory, and Contextual Execution.
1. Setup: Initializing the Kernel
WithOpenAIChatCompletionService: This configures the chat model. In a production Windows service, you might swap this for Azure OpenAI or a local model via Ollama.
* WithMemoryStorage: We use SqliteMemoryStorage. This is critical for a "Jarvis" assistant running on Windows. It ensures that user preferences and interaction history survive application restarts, unlike in-memory storage which is volatile.
2. Context: Stateful Memory Retrieval
kernel.Memory.GetAsync: This performs a semantic search or a direct key lookup (depending on the memory store implementation) to retrieve information.
* Logic Flow:
1. Check if "GreetingStyle" exists in the "UserPreferences" collection.
2. If it doesn't exist (first run), default to "formal" and save it.
3. If it exists, load the preference.
* Why this matters: This allows the assistant to be personalized. If a user prefers casual language, the LLM prompt is dynamically adjusted, changing the output tone without retraining the model.
3. Agentic Workflow: Prompt Engineering with Context
{{$style}} and {{$input}}: These are template variables. Semantic Kernel replaces them with the values provided in the KernelArguments.
* Orchestration: When kernel.RunAsync is called, the Kernel:
1. Formats the prompt.
2. Sends it to the configured LLM (GPT-3.5).
3. Receives the text generation response.
* Simulated Planner: In a full implementation, the Planner would take "Summarize my day" and break it down into: 1. Retrieve calendar events, 2. Read email summaries, 3. Synthesize text. Here, we simulate the synthesis step directly.
4. Execution: The Kernel Run
Async/Await: Essential for Windows Background Services to prevent blocking the main thread during network calls to the LLM.
* Result Handling: The result object contains the LLM's response, metadata (like token usage), and any errors.
5. Plugin Integration (Simulated)
Architecture: In the full capstone, this would be a C# class decorated with [SKFunction].
* Real-world Application: This allows the AI to interact with the OS. For example, if the LLM decides to create a reminder, it would invoke a native C# plugin to write to the Windows Task Scheduler or send a Toast Notification.
Missing API Keys: The code relies on Environment.GetEnvironmentVariable("OPENAI_API_KEY"). If this variable is not set in your Windows environment or Docker container, the application will throw a NullReferenceException immediately. Always validate environment variables at startup.
2. Blocking Async Calls: In a Windows Background Service (e.g., IHostedService), you must never block the main thread. Always use await with kernel.RunAsync. Using .Result or .Wait() can cause deadlocks in UI applications or freeze the service loop.
3. Memory Collection Naming: SqliteMemoryStorage relies on collection names to organize data. Using generic names like "Data" or "Memory" across different features will lead to collisions. Always use specific, namespaced collection names (e.g., UserPreferences, ConversationHistory).
4. Prompt Injection via Memory: If you allow user input to save data to memory (e.g., await kernel.Memory.SaveInformationAsync(..., userInput)), a malicious user could inject prompt instructions into memory. When that memory is retrieved later and injected into the system prompt, the AI might execute unintended commands. Always sanitize inputs before saving to memory.
Visualizing the Workflow
The following diagram illustrates the flow of data through the Semantic Kernel in this example.
The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon
Loading knowledge check...
Code License: All code examples are released under the MIT License. Github repo.
Content Copyright: Copyright © 2026 Edgar Milvus | Privacy & Cookie Policy. All rights reserved.
All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.