Chapter 14: Transactions and Rollbacks

Theoretical Foundations

In modern AI applications, data is no longer monolithic. It lives in two distinct realms: the structured, transactional world of relational databases (like SQL Server or PostgreSQL) and the unstructured, high-dimensional world of vector embeddings (like Pinecone, Qdrant, or Weaviate). When we build systems that rely on Retrieval-Augmented Generation (RAG) or maintain persistent conversation memory, we often need to update both stores simultaneously. Consider a user updating a support ticket and attaching a new internal summary document. The relational record for the ticket must be updated, and the vector embedding of the summary must be indexed. If the relational update succeeds but the vector indexing fails, the system is left in an inconsistent state: the database thinks the document is attached, but the vector store cannot retrieve it. This is the classic atomicity problem, but it is complicated by the fact that these two systems are often completely different technologies with no native support for a distributed transaction.

To solve this, we rely on the Unit of Work (UoW) pattern, a concept that coordinates multiple changes across different data sources as a single, logical unit. While the UoW pattern is often associated with Entity Framework Core’s DbContext, its application in AI-driven architectures requires a more sophisticated approach, particularly when dealing with the eventual consistency of vector databases. This section explores the theoretical foundations of managing these distributed data operations, drawing parallels to the Repository Pattern introduced in Chapter 4, "Data Access Abstraction," to illustrate how we abstract away the complexity of data persistence.

The Unit of Work: Orchestrating Distributed Changes

At its core, the Unit of Work pattern is about maintaining a list of objects affected by a business transaction and coordinating the writing out of changes and the resolution of concurrency problems. In a traditional relational application using EF Core, the DbContext itself acts as a Unit of Work. It tracks changes to entities in memory and commits them to the database in a single transaction when SaveChangesAsync() is called.

However, in an AI application, the "work" extends beyond the relational database. Imagine a scenario where an AI agent processes a user request to "summarize this document and store it in the knowledge base." The agent performs two distinct operations:

It generates a vector embedding of the document using an AI model.
It creates a relational record linking that vector ID to the user's account.

If we treat these as two separate, independent operations, we risk data corruption. The Unit of Work pattern provides the scaffolding to treat these as a single logical operation.

Analogy: The Restaurant Kitchen Think of a high-end restaurant kitchen. The head chef (the orchestrator) receives a complex order (the business transaction). The order requires a steak (relational data) and a complex sauce (vector data).

The Steak: Prepared by the grill station (EF Core/SQL).
The Sauce: Prepared by the saucier (Vector Database).

If the steak is perfectly cooked but the sauce burns, the dish cannot be served. In a traditional kitchen, the head chef might hold the steak under a heat lamp while fixing the sauce. In a distributed system, we cannot simply "hold" the steak; we must decide whether to proceed or discard. The Unit of Work pattern acts as the head chef, ensuring that either both components are ready for plating (committed) or neither is (rolled back).

The Challenge of Distributed Transactions

The primary difficulty in AI applications is that vector databases and relational databases rarely support Two-Phase Commit (2PC), the standard protocol for distributed transactions. 2PC requires a coordinator to ask all participants if they can commit, and then instructing them to do so. Most modern NoSQL and vector stores are optimized for speed and availability (AP systems in the CAP theorem) and do not implement the blocking nature of 2PC.

Therefore, we cannot rely on a database-level transaction to guarantee atomicity across the relational and vector stores. We must implement application-level atomicity.

Visualizing the Data Flow The following diagram illustrates the flow of data within a Unit of Work that spans a relational database and a vector store.

This diagram illustrates the application-level atomicity of a Unit of Work, showing how data flows and coordinates between a relational database and a vector store to ensure consistent state management.

Strategies for Atomicity: Compensating Transactions

Since we cannot easily roll back a vector database insertion once it is committed (or at least, it is expensive to do so), we must adopt strategies that mimic atomicity. The most robust approach for non-ACID distributed systems is the implementation of Compensating Transactions.

A compensating transaction is an operation that undoes the effects of a previous operation. It is the "rollback" logic that we must write manually when the database cannot provide it automatically.

The Workflow:

Attempt Phase: Execute operations against the relational DB and the Vector Store.
Validation Phase: Check for errors.
Compensation Phase (if failure occurs): Execute inverse operations to restore the system to its original state.

Analogy: The Travel Booking System Booking a trip involves multiple vendors: a flight, a hotel, and a rental car. If the flight is booked successfully, but the hotel is fully booked, you cannot simply "roll back" the flight reservation in the same transaction scope. Instead, the booking system must issue a cancellation (a compensating transaction) for the flight to return the user to a consistent state.

In C#, we can model this using the Command Pattern combined with the Unit of Work. Each operation (e.g., AddDocumentCommand) knows how to execute itself and how to undo itself.

// Conceptual representation of a command with compensation logic
public interface IDistributedCommand
{
    Task ExecuteAsync();
    Task CompensateAsync(); // The inverse operation
}

// The Unit of Work coordinates these commands
public class DistributedUnitOfWork
{
    private readonly Stack<IDistributedCommand> _executedCommands = new();

    public async Task CommitAsync()
    {
        try
        {
            // In a real scenario, we might batch these, 
            // but for atomicity, we execute sequentially 
            // and track success.
            foreach (var command in _executedCommands)
            {
                await command.ExecuteAsync();
            }
        }
        catch (Exception ex)
        {
            // If any command fails, trigger compensation
            // for all previously successful commands.
            await RollbackAsync();
            throw new TransactionFailedException("Distributed transaction failed. Changes rolled back.", ex);
        }
    }

    private async Task RollbackAsync()
    {
        while (_executedCommands.Count > 0)
        {
            var command = _executedCommands.Pop();
            try
            {
                await command.CompensateAsync();
            }
            catch (Exception compEx)
            {
                // Log critical error: System is in an inconsistent state.
                // This requires manual intervention or a background reconciliation job.
            }
        }
    }

    public void Enqueue(IDistributedCommand command)
    {
        _executedCommands.Push(command);
    }
}

The Role of the Repository Pattern in AI Transactions

In Chapter 4, we discussed the Repository Pattern as a way to abstract the data layer, decoupling the business logic from the specific data access technology. In the context of AI transactions, the Repository acts as the boundary for the Unit of Work.

When we design repositories for AI features, we must be aware that they are not just wrappers around SQL queries. A ConversationMemoryRepository might abstract two distinct stores:

Relational Store: Stores metadata, timestamps, and user IDs.
Vector Store: Stores the embeddings of the conversation turns.

When we call repository.AddConversationAsync(), the repository should not immediately commit to both stores. Instead, it should register the changes with the active Unit of Work. This allows the business layer to orchestrate multiple repository calls (e.g., adding a message and updating a user's credit balance) and commit them all at once.

Why this matters for AI:

Swapping Providers: Just as interfaces allow us to swap between OpenAI and Local Llama for embeddings, the Repository pattern allows us to swap between Pinecone and Qdrant for vector storage without changing the transaction logic.
Consistency: It ensures that the relational data (which tracks what was stored) matches the vector data (which stores how it is represented semantically).

Edge Cases and Architectural Implications

When implementing these patterns in C# for AI applications, several edge cases must be considered:

The "Zombie" Vector: If the relational commit fails but the vector insertion succeeds, the vector exists in the store but is orphaned. Without a relational record pointing to it, it is effectively lost. The compensating transaction must delete this vector.
- Solution: Design the vector store to include metadata that allows for a "garbage collection" process. For example, store a transaction_id in the vector metadata. A background job can periodically scan for vectors whose transaction_id does not exist in the relational database.
Idempotency: Network timeouts can cause retries. If we retry a commit, we must ensure we don't insert the same data twice.
- Solution: Use unique identifiers (UUIDs) for vector points and relational entities. Before inserting, check for existence. This is a trade-off between performance (extra read) and safety.
Eventual Consistency in Vector Search: Unlike SQL, vector indexes are often updated asynchronously. Even if the Insert command returns successfully, the vector might not be immediately searchable.
- Solution: The Unit of Work should account for this latency. For user-facing features, we might need to read-from-write buffers (caches) immediately after the transaction commits, rather than relying solely on the vector store's query interface.

Theoretical Foundations

The integration of AI features into traditional applications introduces a distributed data problem. Because standard ACID transactions do not span relational and vector databases, we must rely on application-level patterns:

Unit of Work: To group disparate operations into a single logical transaction.
Compensating Transactions: To provide rollback capabilities in the absence of two-phase commit.
Repository Pattern: To abstract the complexity and provide a clean interface for the business logic.

By treating the AI vector store as a first-class citizen in the data architecture—subject to the same rigor as the relational database—we ensure that our AI features are robust, reliable, and maintainable. This theoretical foundation sets the stage for the practical implementation of these patterns in C#, which we will explore in the subsequent sections.

Basic Code Example

Here is a self-contained example demonstrating the Unit of Work pattern to manage atomicity between a relational database (EF Core) and a vector store (simulated), including a rollback mechanism.

using Microsoft.EntityFrameworkCore;
using Microsoft.EntityFrameworkCore.ChangeTracking;
using System.Collections.Concurrent;
using System.Text.Json;

// 1. Domain Entities
public record Guideline(string Id, string Content, List<float> Embedding);
public record Conversation(string Id, string UserId, string Message, DateTime Timestamp);

// 2. Vector Store Simulation (Mocking a Vector DB like Pinecone or Qdrant)
// In a real scenario, this would be an external HTTP client.
public class VectorStore
{
    private readonly ConcurrentDictionary<string, List<float>> _vectors = new();

    public Task UpsertAsync(Guideline guideline)
    {
        // Simulate network latency
        Thread.Sleep(50); 
        _vectors[guideline.Id] = guideline.Embedding;
        Console.WriteLine($"[Vector Store] Vector upserted for ID: {guideline.Id}");
        return Task.CompletedTask;
    }

    public Task DeleteAsync(string id)
    {
        Thread.Sleep(50);
        _vectors.TryRemove(id, out _);
        Console.WriteLine($"[Vector Store] Vector deleted for ID: {id}");
        return Task.CompletedTask;
    }

    // Simulates a check to see if the vector exists (for rollback verification)
    public bool Exists(string id) => _vectors.ContainsKey(id);
}

// 3. Relational Context (EF Core)
public class AppDbContext : DbContext
{
    public DbSet<Conversation> Conversations => Set<Conversation>();

    protected override void OnConfiguring(DbContextOptionsBuilder options)
        => options.UseInMemoryDatabase("ChatDb");

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Conversation>().HasKey(c => c.Id);
    }
}

// 4. Unit of Work Implementation
// This coordinates the changes across the relational DB and Vector Store.
public class ChatUnitOfWork : IDisposable
{
    private readonly AppDbContext _relationalContext;
    private readonly VectorStore _vectorStore;

    // Transaction Log: Tracks operations performed during this UoW session
    // to enable rollback (compensating transactions).
    private readonly List<Func<Task>> _compensationActions = new();

    public ChatUnitOfWork(AppDbContext relationalContext, VectorStore vectorStore)
    {
        _relationalContext = relationalContext;
        _vectorStore = vectorStore;
    }

    public async Task<Guid> AddChatEntryAsync(string userId, string message, List<float> embedding)
    {
        var transactionId = Guid.NewGuid();
        var guidelineId = $"guideline-{transactionId}";

        try
        {
            // Step A: Relational Operation
            var conversation = new Conversation(
                Id: transactionId.ToString(),
                UserId: userId,
                Message: message,
                Timestamp: DateTime.UtcNow
            );

            await _relationalContext.Conversations.AddAsync(conversation);
            Console.WriteLine($"[Relational DB] Queued insert for Conversation: {conversation.Id}");

            // Register compensation: Delete the conversation if later steps fail
            _compensationActions.Add(async () => {
                var existing = await _relationalContext.Conversations.FindAsync(transactionId.ToString());
                if (existing != null)
                {
                    _relationalContext.Conversations.Remove(existing);
                    await _relationalContext.SaveChangesAsync(); // Save immediately for compensation
                    Console.WriteLine($"[Compensation] Rolled back Relational Insert: {transactionId}");
                }
            });

            // Step B: Vector Store Operation
            var guideline = new Guideline(guidelineId, message, embedding);
            await _vectorStore.UpsertAsync(guideline);

            // Register compensation: Delete the vector if later steps fail
            _compensationActions.Add(async () => {
                if (_vectorStore.Exists(guidelineId))
                {
                    await _vectorStore.DeleteAsync(guidelineId);
                    Console.WriteLine($"[Compensation] Rolled back Vector Upsert: {guidelineId}");
                }
            });

            // Step C: Commit Phase
            // In a real distributed system, this is where the 2PC (Two-Phase Commit) 
            // would attempt to lock resources. Here, we simulate the commit.
            await _relationalContext.SaveChangesAsync();

            Console.WriteLine(">>> TRANSACTION COMMITTED SUCCESSFULLY <<<");
            return transactionId;
        }
        catch (Exception ex)
        {
            Console.WriteLine($"!!! ERROR DETECTED: {ex.Message}. Initiating Rollback... !!!");
            await RollbackAsync();
            throw; // Re-throw to notify the caller of failure
        }
    }

    private async Task RollbackAsync()
    {
        // Execute compensation actions in Reverse Order (LIFO)
        // This ensures we undo operations in the opposite order they were created.
        for (int i = _compensationActions.Count - 1; i >= 0; i--)
        {
            try
            {
                await _compensationActions[i].Invoke();
            }
            catch (Exception ex)
            {
                // Log critical error: Manual intervention might be required
                Console.WriteLine($"!!! CRITICAL: Compensation action {i} failed: {ex.Message} !!!");
            }
        }
        _compensationActions.Clear();
    }

    public void Dispose()
    {
        _relationalContext.Dispose();
    }
}

// 5. Main Execution
public class Program
{
    public static async Task Main()
    {
        // Setup
        using var relationalContext = new AppDbContext();
        await relationalContext.Database.EnsureCreatedAsync();

        var vectorStore = new VectorStore();
        var uow = new ChatUnitOfWork(relationalContext, vectorStore);

        // Scenario 1: Happy Path
        Console.WriteLine("--- SCENARIO 1: SUCCESSFUL TRANSACTION ---");
        var embedding1 = new List<float> { 0.1f, 0.9f, 0.5f }; // Mock vector
        await uow.AddChatEntryAsync("user-123", "Hello, AI!", embedding1);

        // Scenario 2: Simulated Failure (Vector Store throws exception)
        // We create a new UoW to isolate the failure test
        using var uowFail = new ChatUnitOfWork(new AppDbContext(), vectorStore);
        Console.WriteLine("\n--- SCENARIO 2: FAILED TRANSACTION (Simulated Vector DB Crash) ---");

        // Let's hack the vector store to throw an exception on the next call
        // (In a real app, this would be a network timeout or constraint violation)
        var originalUpsert = vectorStore.UpsertAsync;
        vectorStore.UpsertAsync = (g) => throw new InvalidOperationException("Vector Store Connection Lost");

        try
        {
            var embedding2 = new List<float> { 0.2f, 0.8f, 0.4f };
            await uowFail.AddChatEntryAsync("user-456", "This will fail", embedding2);
        }
        catch (Exception)
        {
            Console.WriteLine("Main caught the exception as expected.");
        }

        // Verification
        Console.WriteLine("\n--- VERIFICATION ---");
        Console.WriteLine($"Total Conversations in DB: {await relationalContext.Conversations.CountAsync()}");
        Console.WriteLine($"Total Vectors in Store: {vectorStore.Exists("guideline-" + Guid.Empty) ? 1 : 0} (Check specific IDs for accuracy)");
    }
}

Explanation

This example solves the problem of maintaining data consistency across two distinct storage systems: a relational database (via EF Core) and a vector database. When an AI application generates a response and stores it, it often needs to save the conversation log (relational) and the semantic embedding (vector) simultaneously. If one succeeds and the other fails, the system enters an inconsistent state.

Here is the step-by-step breakdown of the code:

Domain Entities (Guideline, Conversation):
- These are simple immutable records representing our data.
- Guideline contains the Embedding (a list of floats), which is the data format required by vector databases.
- Conversation represents the chat log stored in the relational database.
Vector Store Simulation:
- Since we cannot connect to a real vector database in a simple snippet, VectorStore simulates one using an in-memory dictionary.
- It mimics asynchronous network calls (Thread.Sleep) to represent latency.
- Crucially, it tracks state. If we insert a vector, it stays there until removed.
Relational Context (AppDbContext):
- Standard EF Core setup using an InMemoryDatabase for portability.
- In a production environment, this would likely be SQL Server, PostgreSQL, or Azure SQL.
The Unit of Work (ChatUnitOfWork):
- Purpose: This class acts as the coordinator. It encapsulates the business logic and ensures that either all operations complete successfully or none do.
- Compensation Actions (_compensationActions):
  - This is the core of the rollback strategy for distributed systems. Since standard ACID transactions (like SQL Server transactions) do not span across different database technologies (e.g., SQL DB and a separate Vector DB service), we implement Compensating Transactions.
  - Every time we perform a "Write" operation (Insert to DB, Upsert to Vector), we immediately register a lambda function in the list that knows how to undo that specific operation.
  - Ordering: The list is treated as a stack (LIFO - Last In, First Out). If we fail at step 3, we undo step 3, then step 2, then step 1. This prevents foreign key violations or orphaned data.
- The Try-Catch Block:
  - The entire operation is wrapped in a try block.
  - If any step fails (e.g., the Vector Store throws a network exception), execution jumps to catch.
  - The catch block invokes RollbackAsync(), which iterates through the registered compensation actions in reverse order.
Execution (Main):
- Scenario 1: A standard flow where both the relational insert and vector upsert succeed. The transaction commits, and the compensation list is cleared.
- Scenario 2: A failure flow. We intentionally sabotage the UpsertAsync method to throw an InvalidOperationException.
- Result: The AddChatEntryAsync method catches the error, triggers the rollback (deleting the conversation entry that was added to the EF Core context), and re-throws the exception. The final verification shows that the database remains clean (or consistent with the failure point).

Common Pitfalls

Memory-Only Rollbacks: A common mistake when mocking Unit of Work is forgetting that EF Core's DbContext tracks entities in memory. If you call context.Conversations.Add(...) and an exception occurs, simply "not saving" is not a rollback—the entity is still sitting in the context's memory. In this example, the compensation action explicitly removes the entity from the context and calls SaveChangesAsync to ensure the database is actually reverted.
Order of Compensation: When registering undo actions, the order matters immensely. If you delete a parent record in step 1 and add a child record in step 2, your compensation list must undo step 2 (delete child) before step 1 (re-add parent). Failing to reverse the order will result in foreign key constraint violations during the rollback process.
Idempotency in Compensation: The compensation actions in this example check if the data exists before trying to delete it (e.g., if (_vectorStore.Exists(guidelineId))). This makes the rollback idempotent. If a rollback fails halfway through and the system retries the rollback later, it won't crash trying to delete data that is already gone.
Long-Running Transactions: The Unit of Work pattern holds resources open while processing. In this example, the VectorStore simulation uses Thread.Sleep. In a real application, holding a database connection open while waiting for an external HTTP call to a vector database is dangerous. It can exhaust connection pools. A better architectural pattern for distributed systems is Saga Pattern, where each step is a separate transaction, and failures trigger asynchronous "compensating events" rather than a synchronous rollback block.

The diagram illustrates the Saga Pattern, where a distributed transaction is broken into a series of independent steps, and if a failure occurs, compensating events are triggered to reverse previous steps without blocking the system.

The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon

Loading knowledge check...

Code License: All code examples are released under the MIT License. Github repo.

All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.