Chapter 16: TextMemory and Volatile Memory Store

Theoretical Foundations

The ITextMemory interface and the VolatileMemoryStore implementation represent the foundational layer of state management within the Microsoft Semantic Kernel. While Large Language Models (LLMs) are stateless functions that process input and generate output without retaining context between calls, AI agents often require continuity. They need to remember user preferences, previous interactions, or retrieved knowledge to function coherently. This chapter explores how to bridge the gap between the ephemeral nature of LLMs and the persistent requirements of intelligent applications using lightweight, in-memory vector storage.

The Problem of State in Agentic Systems

In the previous book, we discussed the Planner and how it orchestrates complex tasks by breaking them down into smaller steps. However, a Planner operating in a vacuum is limited. If a user asks, "Remind me to buy milk when I pass the grocery store," the system must retain that specific instruction ("buy milk") and the trigger condition ("pass the grocery store") beyond the lifespan of the immediate HTTP request.

Fast: Vector search must happen in milliseconds to maintain the illusion of "thought." 2. Ephemeral: Data should not persist after the application restarts, simplifying development and data privacy. 3. Semantic: We don't just want to search by keywords; we want to search by meaning.

This is where the VolatileMemoryStore comes in. It acts as a RAM-based vector database, allowing the Semantic Kernel to store text embeddings and retrieve them based on cosine similarity.

The `ITextMemory` Interface: The Contract of Recall

At the heart of this system lies the ITextMemory interface. In C#, interfaces are crucial for defining contracts without dictating implementation. In the context of AI engineering, ITextMemory decouples the logic of remembering from the mechanism of storage.

Just as we used the IChatCompletionService interface in Chapter 4 to swap seamlessly between OpenAI's GPT-4 and a local Llama model without changing our application logic, ITextMemory allows us to switch between a VolatileMemoryStore (for testing) and a persistent vector database like Azure Cognitive Search or Pinecone (for production) without altering the agent's core logic.

The interface defines methods for saving and retrieving information. Crucially, it treats memory not as simple key-value pairs, but as semantic concepts.

// Conceptual definition of the interface (simplified for explanation)
public interface ITextMemory
{
    Task<string?> RetrieveAsync(
        string collection, 
        string key, 
        string? input = null, 
        double? minRelevanceScore = null, 
        CancellationToken cancellationToken = default);

    Task SaveAsync(
        string collection, 
        string key, 
        string input, 
        string? description = null, 
        CancellationToken cancellationToken = default);

    // Additional methods for searching and removing exist
}

Why this matters for AI: In a traditional dictionary (Dictionary<string, string>), you retrieve a value by an exact key. If you save "The capital of France is Paris" under the key "fact_1", you can only retrieve it if you know the key "fact_1". An LLM, however, operates on concepts. If you ask, "What is the capital city of the French Republic?", you need the system to retrieve "Paris".

ITextMemory abstracts this by using embeddings. It converts the text into a vector (a list of floating-point numbers) that represents its semantic meaning. The retrieval method uses vector similarity to find the stored memory that is closest to the input query's meaning.

Vector Storage and Cosine Similarity

To understand VolatileMemoryStore, one must understand the mathematical foundation of vector search. When we save a memory, the Semantic Kernel calls an embedding generator (like text-embedding-ada-002). This model converts the text string into a high-dimensional vector.

For example, the sentence "I love hiking in the mountains" might be represented as: [0.12, -0.45, 0.88, ..., 0.01]

When we later query the memory with "My favorite outdoor activity is climbing peaks," the embedding model generates a new vector. Even though the words are different, the semantic meaning is similar, so the vectors will be close in geometric space.

VolatileMemoryStore stores these vectors in memory. When RetrieveAsync is called, it calculates the Cosine Similarity between the query vector and every vector in the specified collection (a logical grouping of memories).

\[ \text{Cosine Similarity} = \frac{\mathbf{A} \cdot \mathbf{B}}{\|\mathbf{A}\| \|\mathbf{B}\|} \]

The result is a value between -1 and 1 (though typically 0 to 1 for embeddings). A score of 1.0 means the vectors are identical in direction (identical meaning), while 0 means they are orthogonal (unrelated). The store returns the memory with the highest score that exceeds the minRelevanceScore threshold.

The `VolatileMemoryStore`: Architecture and Trade-offs

The VolatileMemoryStore is a concrete implementation of IMemoryStore (which ITextMemory utilizes). It is a pure C# implementation that uses ConcurrentDictionary to hold collections of embeddings.

Collections: Data is partitioned into named collections (e.g., "UserPreferences", "ProjectDocs"). Think of these as tables in a database or indexes in a search engine. 2. Embeddings: Inside each collection, key-value pairs are stored. The key is a string identifier (e.g., "user_123_pref_1"), and the value is an Embedding<float> object containing the vector data and the original text string. 3. Indexing: Since the data is in memory, "indexing" is instantaneous. There is no complex B-Tree construction or inverted index creation as seen in SQL or Lucene. The store simply holds the data in a structure optimized for rapid iteration.

Analogy: The Librarian vs. The Photographic Memory Persistent Storage (SQL/Vector DB): This is the Librarian. You ask for a book on "quantum physics." The Librarian goes to the catalog, finds the Dewey Decimal code, walks to the shelf, and retrieves the book. It is accurate and permanent, but it takes time. * VolatileMemoryStore: This is a person with a photographic memory standing right next to you. They have read every book in the room. When you ask for "quantum physics," they instantly hand you the book because they remember the concept of the book, not just the title. However, if the room is cleared (the application restarts), their memory is wiped clean.

Speed vs. Persistence: The primary trade-off is volatility. Data is lost on shutdown. This makes it unsuitable for long-term user profiles but perfect for: * Session State: Remembering what was discussed in the last 10 messages. * Scratchpad: A "working memory" for an agent to plan steps before committing to a final action. * Testing: Verifying semantic search logic without spinning up a cloud database. 2. Scale vs. RAM: Since all vectors reside in RAM, the dataset size is limited by the available memory. Storing millions of high-dimensional vectors (e.g., 1536 dimensions per vector) can consume gigabytes of RAM. VolatileMemoryStore is not designed for "Big Data" but for "Right Data" (the immediate context needed for the current task). 3. Concurrency: The use of ConcurrentDictionary ensures thread safety, allowing multiple agents or parallel processes to read/write memories simultaneously without race conditions, a critical feature when building multi-agent systems.

Integration with the Semantic Kernel

In the previous book, we utilized the Kernel Builder to assemble services. The memory system follows the same pattern. The ITextMemory interface is registered within the Kernel's dependency injection container. When an AI function (like a native method or a prompt template) requires memory, the Kernel injects the memory instance.

The VolatileMemoryStore acts as the backend, while a wrapper class (often TextMemory) handles the orchestration of generating embeddings via an IEmbeddingGenerationService before storing or searching.

This separation of concerns allows for a modular architecture. If your application starts with VolatileMemoryStore but later needs to scale to Azure Cognitive Search, you only change the registration in the Kernel configuration. The agentic logic—memory.SaveAsync(...) and memory.RetrieveAsync(...)—remains untouched.

Visualizing the Memory Flow

The following diagram illustrates how a user query flows through the memory system using VolatileMemoryStore.

The diagram shows a user query entering the VolatileMemoryStore, where it is processed by an AI model and the response is temporarily held in memory, illustrating the ephemeral nature of this storage method.

Real-World Analogy: The Chef's Mise en Place

The Recipe (LLM): The chef knows how to cook generally, but specific orders vary. * The Pantry (Persistent Database): This is where all raw ingredients are stored long-term. It is large and organized, but running to the pantry for every single pinch of salt is slow. * The Mise en Place (VolatileMemoryStore): This is the cutting board and small bowls next to the stove. The chef preps the specific ingredients needed for the current set of orders (the session). They are right at hand (RAM speed), organized by dish (Collections), and discarded at the end of the shift (Volatile). * The Chef's Intuition (Cosine Similarity): If the chef needs "garlic," they don't read the label on every clove. They recognize the shape, smell, and texture (semantic vector) instantly.

If the chef relies solely on the pantry (Database), the service is too slow. If they rely solely on the mise en place (VolatileMemoryStore), they cannot remember ingredients from yesterday's prep. A robust AI application uses VolatileMemoryStore for immediate context and persistent storage for long-term recall, orchestrated seamlessly by the ITextMemory interface.

Empty Collections: Calling RetrieveAsync on a non-existent or empty collection returns null. The application logic must handle this gracefully, perhaps by triggering a search in a broader, persistent store or asking the user for clarification. 2. Duplicate Keys: SaveAsync will overwrite existing entries if the key matches within the same collection. This is idempotent behavior, useful for updating memories (e.g., updating a user's current location). 3. Relevance Thresholds: Setting minRelevanceScore too high (e.g., 0.9) might result in no results even if relevant memories exist, because natural language is fuzzy. Setting it too low (e.g., 0.1) introduces noise, confusing the LLM with irrelevant context. 4. Embedding Model Drift: If you switch embedding models (e.g., from OpenAI to a local model), the vector dimensions and distribution change. VolatileMemoryStore data from the old model becomes useless because cosine similarity calculations between vectors from different models are invalid. You must clear and re-populate the store.

Summary

The TextMemory interface and VolatileMemoryStore provide the cognitive "working memory" for AI agents. By leveraging C# interfaces, we maintain architectural flexibility, allowing us to prototype rapidly with in-memory storage and scale to persistent solutions later. Understanding the vector-based nature of this memory is essential; it is not a simple lookup table but a semantic search engine that allows agents to retrieve information based on meaning rather than syntax. This capability transforms a stateless LLM into a stateful, context-aware agent capable of complex, multi-turn interactions.

Basic Code Example

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Memory;
using Microsoft.SemanticKernel.Text;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;

namespace TextMemoryBasics
{
    class Program
    {
        static async Task Main(string[] args)
        {
            // 1. Initialize the Semantic Kernel
            // The kernel is the orchestrator of AI plugins and memory.
            var kernel = Kernel.CreateBuilder()
                .Build();

            // 2. Initialize the VolatileMemoryStore
            // This is an in-memory vector store. It is ephemeral (lost on app restart)
            // and optimized for speed. It implements ISemanticTextMemory.
            var memoryStore = new VolatileMemoryStore();

            // 3. Create a SemanticTextMemory instance
            // This wrapper handles the logic of generating embeddings and storing them.
            var memory = new SemanticTextMemory(memoryStore, kernel.Services);

            // Define a collection name (like a table in a relational DB)
            const string collectionName = "UserPreferences";

            Console.WriteLine("--- Storing Memories ---");

            // 4. Store specific text memories with embeddings
            // We are creating a semantic relationship between a unique ID and the text.
            // The kernel automatically generates the vector embedding behind the scenes.
            await memory.SaveInformationAsync(
                collection: collectionName,
                id: "pref1",
                text: "User prefers dark mode UI and compact layouts."
            );

            await memory.SaveInformationAsync(
                collection: collectionName,
                id: "pref2",
                text: "User likes spicy food and Italian cuisine."
            );

            await memory.SaveInformationAsync(
                collection: collectionName,
                id: "pref3",
                text: "User enjoys reading sci-fi novels on weekends."
            );

            Console.WriteLine("Memories saved successfully.\n");

            // 5. Perform a Semantic Search
            // We want to find relevant memories based on a query, not just keywords.
            // "What food does the user like?" is semantically close to "spicy Italian".
            Console.WriteLine("--- Searching for 'Favorite cuisine' ---");
            var searchResults = memory.SearchAsync(
                collection: collectionName,
                query: "What food does the user like?",
                limit: 2, // Top 2 results
                minRelevanceScore: 0.0 // Filter out irrelevant results if needed
            );

            await foreach (var result in searchResults)
            {
                Console.WriteLine($"ID: {result.Metadata.Id}");
                Console.WriteLine($"Text: {result.Metadata.Text}");
                Console.WriteLine($"Relevance Score: {result.Relevance:F4}");
                Console.WriteLine("-----------------------------");
            }

            // 6. Retrieve a specific memory by ID
            // Useful when you know the exact ID but need the full text/context.
            Console.WriteLine("\n--- Retrieving specific memory by ID 'pref3' ---");
            var specificMemory = await memory.GetAsync(collectionName, "pref3");

            if (specificMemory != null)
            {
                Console.WriteLine($"Retrieved: {specificMemory.Text}");
            }

            // 7. List all memories in a collection
            // Useful for debugging or dumping state.
            Console.WriteLine("\n--- Listing all memories in collection ---");
            await foreach (var item in memory.SearchAsync(collectionName, "", limit: 10))
            {
                Console.WriteLine($"- {item.Metadata.Id}: {item.Metadata.Text}");
            }
        }
    }
}

Detailed Line-by-Line Explanation

Kernel.CreateBuilder().Build(): We instantiate the Semantic Kernel. Even though we aren't using an LLM directly in this snippet, the SemanticTextMemory class requires the Kernel (specifically its IServiceProvider) to access the default embedding generation services configured in the kernel. * new VolatileMemoryStore(): This creates the storage backend. It is a simple dictionary-backed store that holds vectors in RAM. It is strictly for prototyping; if the application crashes or restarts, all data is wiped. * new SemanticTextMemory(memoryStore, kernel.Services): This is the facade. It abstracts away the complexity of vector math. When you call SaveInformationAsync, this class: 1. Calls the embedding service (configured in the kernel) to convert the string text into a vector (array of floats). 2. Passes that vector to the VolatileMemoryStore along with metadata.

SaveInformationAsync: This method performs two distinct operations: 1. Embedding Generation: It sends the text to the embedding model. 2. Upsert: It stores the resulting vector in the collectionName under the unique id. * Collections: Think of a collection as a namespace or a table. You can store different types of data (e.g., "ProductCatalog" vs "UserPreferences") in separate collections to keep searches isolated.

SearchAsync: This is the core functionality of TextMemory. Unlike a SQL LIKE query, this performs vector similarity search (typically Cosine Similarity). * Query: "What food does the user like?" * Mechanism: The query is converted into a vector. The system calculates the angle between the query vector and every stored vector in the collection. Smaller angles (closer to 1.0 relevance) indicate semantic closeness. * limit: 2: We only want the top 2 most relevant results. * minRelevanceScore: A threshold. If a result has a relevance score below this (e.g., 0.6), it is discarded. In this example, we set it to 0.0 to see all results for educational purposes.

GetAsync: A direct key-value lookup. It does not perform vector search. It retrieves the stored text and metadata using the specific ID. This is O(1) complexity. * SearchAsync with empty query: Passing an empty string to SearchAsync can sometimes act as a "dump" mechanism depending on the implementation, but strictly speaking, SearchAsync is designed for vector similarity. To list all items reliably, you would typically iterate the store directly, but for this simple example, we use the search interface.

Missing Embedding Service Configuration: * The Mistake: Creating SemanticTextMemory without registering an embedding generator in the Kernel (e.g., AzureOpenAITextEmbeddingGeneration). * The Result: The code will throw a runtime exception when SaveInformationAsync is called because it cannot convert text to a vector. * The Fix: Always ensure builder.Services.Add...TextEmbeddingGeneration() is called before building the kernel.

Confusing GetAsync with SearchAsync:
- The Mistake: Using GetAsync to find "related" items.
- The Result: GetAsync only returns data if you know the exact ID. It has zero semantic understanding.
- The Fix: Use SearchAsync for intent-based retrieval (e.g., "Find relevant documents") and GetAsync for state retrieval (e.g., "Get user ID 123").
Ephemeral Storage Expectations:
- The Mistake: Using VolatileMemoryStore in a production environment expecting data persistence.
- The Result: Data disappears on every deployment or server restart, leading to inconsistent user experiences.
- The Fix: VolatileMemoryStore is strictly for unit tests and prototyping. For production, use QdrantMemoryStore, PineconeMemoryStore, or AzureCosmosDBNoSqlMemoryStore.

Visualizing the Memory Flow

The following diagram illustrates how data flows from text input to vector storage and back during a search operation.

A text input is first converted into a vector embedding, which is then used to query a vector database to retrieve the most relevant stored information.

The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon

Loading knowledge check...

Code License: All code examples are released under the MIT License. Github repo.

All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.

Chapter 16: TextMemory and Volatile Memory Store

Theoretical Foundations

The Problem of State in Agentic Systems

The ITextMemory Interface: The Contract of Recall

Vector Storage and Cosine Similarity

The VolatileMemoryStore: Architecture and Trade-offs

Integration with the Semantic Kernel

Visualizing the Memory Flow

Real-World Analogy: The Chef's Mise en Place

Summary

Basic Code Example

Detailed Line-by-Line Explanation

Visualizing the Memory Flow

The `ITextMemory` Interface: The Contract of Recall

The `VolatileMemoryStore`: Architecture and Trade-offs