Chapter 8: Storing Embeddings (List) in SQL Server

Theoretical Foundations

At the heart of every modern AI application lies the challenge of understanding meaning, not just keywords. When we ask a question, we rarely use the exact words found in the documentation we need. We rely on context, synonyms, and conceptual relationships. To bridge this gap, AI models convert text, images, or audio into dense numerical representations known as embeddings. These are arrays of floating-point numbers (typically 384, 768, or 1536 dimensions) that capture the semantic essence of the data. Storing and querying these high-dimensional vectors efficiently is the cornerstone of Retrieval-Augmented Generation (RAG) systems. This section lays the theoretical groundwork for how SQL Server, specifically through its vector data type, transforms from a traditional relational database into a high-performance vector store, enabling complex similarity searches that power intelligent data access.

The Nature of Embeddings: A Mathematical Compass

Imagine you are navigating a vast, multi-dimensional library. In a traditional library, books are organized by the Dewey Decimal System—a strict, categorical hierarchy. If you look for "quantum physics," you go to the 500s. If you look for "quantum computing," you might still be in the 500s, but the physical distance between books on different shelves is arbitrary; it doesn't reflect conceptual proximity.

In an embedding space, the library is reorganized. Every document is a point in a high-dimensional room. Points representing documents about "quantum physics" and "quantum computing" are placed close together, while a document about "Victorian poetry" is placed far away. The distance between these points is not arbitrary; it is calculated mathematically to represent semantic similarity.

An embedding is essentially a coordinate in this high-dimensional space. In C#, this is represented as List<float> or float[]. For example, a sentence might be converted into a vector like [0.12f, -0.45f, 0.88f, ...]. The challenge is that these vectors are not human-readable. We cannot look at them and understand the meaning. We must rely on algorithms to calculate the "distance" between these vectors.

Distance Metrics: The Rules of Measurement

To determine how similar two pieces of data are, we must define what "distance" means in this high-dimensional space. This is not as straightforward as measuring the distance between two cities on a map. In the context of AI and vector search, two primary metrics dominate: Euclidean distance and Cosine similarity.

Euclidean Distance is the straight-line distance between two points in space. It is the literal interpretation of distance. Imagine two points in a 2D plane; Euclidean distance is the length of the line segment connecting them. In high dimensions, it measures the magnitude of the difference between two vectors. If two vectors are identical, their Euclidean distance is 0. As they become more dissimilar, the distance increases. This metric is sensitive to the magnitude (length) of the vectors. A document that is very long and detailed might have a vector with a larger magnitude than a short summary, even if they discuss the same topic. Euclidean distance would consider them further apart simply due to this magnitude difference.

Cosine Similarity, conversely, measures the cosine of the angle between two vectors. It focuses on orientation rather than magnitude. Imagine two arrows pointing in the same direction; the angle between them is 0 degrees, and the cosine is 1 (maximum similarity). If they point in opposite directions (180 degrees), the cosine is -1 (maximum dissimilarity). If they are perpendicular (90 degrees), the cosine is 0 (no similarity). This metric is ideal for text embeddings because it ignores the length of the document. A short summary and a long detailed article on the same topic will have vectors pointing in roughly the same direction, resulting in a high cosine similarity score, even if their Euclidean distance is large.

In C# applications, when we retrieve a list of candidate documents for a user query, we calculate these distances to rank the results. The lower the Euclidean distance or the higher the Cosine similarity, the more relevant the document is deemed to be.

The Challenge of High-Dimensional Search

Searching for similar vectors in a database is fundamentally different from searching for a specific value in a column. A standard SQL query like WHERE Title = 'Quantum Physics' looks for an exact match. Vector search, however, looks for the "nearest neighbors" in a multi-dimensional space.

The computational complexity of this search is daunting. If you have a million documents, each with a 768-dimensional vector, comparing a query vector against every single document vector involves millions of floating-point multiplications and additions. A naive linear scan (calculating the distance to every vector in the table) is prohibitively slow for real-time applications like chatbots or search engines.

This is where indexing becomes critical. Just as a B-tree index allows a database to find a specific value in logarithmic time rather than linear time, a vector index allows us to find the nearest neighbors without checking every vector. However, traditional B-tree indexes are ineffective for multi-dimensional data because they are designed for one-dimensional sorting. We need specialized data structures designed for vector space.

Vector Indexing Strategies: Navigating the Library

To speed up similarity searches, databases employ approximate nearest neighbor (ANN) algorithms. These algorithms trade a small amount of accuracy for a massive gain in speed. Instead of finding the absolute closest vector, they find vectors that are "close enough."

One common approach is the Hierarchical Navigable Small World (HNSW) graph. Imagine the library again. Instead of a single index card system, HNSW creates a multi-layered map. The top layer contains a sparse network of "landmark" books that are very far apart conceptually. The next layer contains more landmarks, slightly closer together, and so on, down to the bottom layer where every book is a node.

To find a book similar to your query, you start at the top layer, find the landmark closest to your query, and move to the next layer down, always moving towards the direction of your query. This allows you to traverse the graph quickly, skipping vast sections of the library, and zeroing in on the most relevant books at the bottom layer. HNSW is efficient and accurate but can be memory-intensive.

Another approach, often used in disk-based storage, is Inverted File Indexing (IVF). This is similar to the index at the back of a textbook. The database partitions the vector space into clusters (like chapters). Each cluster has a centroid (a representative vector). When a query comes in, the system first finds the closest cluster centroids (scanning only a few centroids instead of all vectors). Then, it searches only the vectors within those specific clusters. This drastically reduces the search space.

SQL Server's `vector` Data Type and Spatial Indexes

Traditionally, storing List<float> in a relational database was cumbersome. You could store it as a JSON string, a binary blob, or a separate table with a row for each float. However, these methods made it difficult to perform mathematical operations efficiently. You couldn't easily calculate the distance between two vectors using SQL, and you certainly couldn't index them for fast search.

SQL Server 2022 and Azure SQL introduced the vector data type to solve this. The vector type is a first-class citizen in the database engine, optimized for storing dense embeddings. It allows you to define a column as VECTOR(1536), for example, to store a 1536-dimensional vector.

But the true power lies in how SQL Server indexes these vectors. It leverages spatial indexes, specifically designed for multi-dimensional data. While spatial indexes were originally built for geographic coordinates (latitude and longitude), the underlying mathematics of R-trees (or their variants) are perfectly suited for vector embeddings.

When you create a spatial index on a vector column, SQL Server organizes the vectors into a hierarchical structure of bounding boxes. It groups nearby vectors together in larger boxes, which are then grouped into even larger boxes, forming a tree. When a query arrives, the database engine can quickly discard large boxes that are too far away from the query vector, focusing only on the small subset of vectors that might be relevant. This is the SQL Server equivalent of the HNSW or IVF strategies, integrated directly into the query optimizer.

Integrating with EF Core: The Bridge to C

Entity Framework Core (EF Core) acts as the translator between our C# application logic and the SQL Server database. In previous chapters, we learned how EF Core maps C# classes to database tables. Now, we extend this mapping to handle the vector type.

In our C# domain model, we represent an embedding as a List<float> or float[]. When EF Core saves an entity, it must convert this C# list into the SQL Server vector format. Conversely, when reading from the database, it must parse the vector data back into a C# list.

This integration is crucial for building RAG pipelines. A typical RAG flow involves:

Query Embedding: A user asks a question. The application sends this text to an embedding model (like OpenAI's text-embedding-ada-002) which returns a List<float>.
Vector Search: The application uses EF Core to query the database for documents whose embeddings are closest to the query vector.
Context Retrieval: The database returns the top N most relevant documents (e.g., the text content of the documents).
Generation: The application sends the user's question and the retrieved documents to a Large Language Model (LLM) to generate a context-aware response.

Without efficient vector storage and search in the database, step 2 would be too slow for interactive applications. EF Core's support for vector operations allows developers to write LINQ queries that translate into optimized SQL, leveraging the database's spatial indexes.

Theoretical Foundations

Understanding these foundations has profound implications for building AI systems. First, it highlights the importance of choosing the right distance metric. For text search, Cosine similarity is generally preferred because it focuses on semantic direction rather than document length. For image embeddings, Euclidean distance might be more appropriate depending on the model used.

Second, it emphasizes the trade-off between accuracy and speed. ANN indexes like those in SQL Server are approximate. There is a chance that the true nearest neighbor is missed, but for most RAG applications, this is acceptable. The quality of the retrieved context is often more dependent on the embedding model itself than on the perfect recall of the index.

Third, it demonstrates the power of keeping data close to the compute. By storing embeddings directly in SQL Server, we avoid the complexity of maintaining a separate vector database (like Pinecone or Weaviate). We can join vector search results with relational data, enforce transactional integrity, and leverage existing SQL Server security and backup mechanisms.

Visualizing the Vector Space

To better understand how vectors are organized and searched, consider the following diagram. It illustrates a simplified 2D vector space (though real embeddings have hundreds of dimensions) and how a spatial index might group vectors.

A 2D vector space diagram shows vectors (dots) grouped by a spatial index, illustrating how AI organizes high-dimensional data into searchable clusters.

In this visualization, the query vector (yellow diamond) is closest to the "Physics" cluster. A spatial index would first identify the "Physics Cluster" as the most promising region, then scan only V1 and V2 to find the exact nearest neighbors, ignoring the "Chemistry" and "Poetry" clusters entirely.

Conclusion

Storing List<float> in SQL Server is not merely a technical implementation detail; it is a strategic architectural decision that brings the power of vector search directly into the heart of relational data management. By leveraging the vector data type and spatial indexes, we can perform efficient similarity searches that are essential for RAG pipelines. This allows us to build AI applications that are not only intelligent but also scalable, maintainable, and integrated with existing enterprise data systems. The theoretical foundations of distance metrics and indexing strategies provide the compass needed to navigate the complex landscape of high-dimensional data, ensuring that our applications can find the right information at the right time.

Basic Code Example

using Microsoft.Data.SqlClient;
using Microsoft.EntityFrameworkCore;
using Microsoft.EntityFrameworkCore.Metadata.Builders;
using System.Collections.Generic;
using System.ComponentModel.DataAnnotations;
using System.ComponentModel.DataAnnotations.Schema;
using System.Linq;
using System.Threading.Tasks;

// ---------------------------------------------------------
// REAL-WORLD CONTEXT
// ---------------------------------------------------------
// Imagine you are building a semantic search feature for a 
// corporate knowledge base. Employees can ask questions like 
// "How do I reset my VPN?" or "What is the parental leave policy?".
// Instead of keyword matching, we convert these questions and 
// documents into high-dimensional vectors (embeddings).
// 
// The problem: SQL Server's standard string search cannot compare 
// these mathematical representations efficiently. We need to store 
// these vectors natively and perform similarity searches (e.g., 
// finding the document vector closest to the question vector).
// ---------------------------------------------------------

namespace SqlServerVectorDemo
{
    // ---------------------------------------------------------
    // 1. DOMAIN ENTITY
    // ---------------------------------------------------------
    // Represents a document in our knowledge base.
    public class KnowledgeBaseDocument
    {
        [Key]
        public int Id { get; set; }

        public string Title { get; set; } = string.Empty;

        public string Content { get; set; } = string.Empty;

        // CRITICAL: This property holds the semantic representation 
        // of the content. It is a list of floating-point numbers 
        // generated by an AI model (e.g., text-embedding-ada-002).
        // SQL Server 2022+ supports the 'vector' data type to store this natively.
        [Column(TypeName = "vector(1536)")] // Standard size for many OpenAI embeddings
        public List<float>? Embedding { get; set; }
    }

    // ---------------------------------------------------------
    // 2. DB CONTEXT & CONFIGURATION
    // ---------------------------------------------------------
    public class KnowledgeBaseContext : DbContext
    {
        public DbSet<KnowledgeBaseDocument> Documents { get; set; }

        protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        {
            // Replace with your actual connection string.
            // Ensure you are targeting SQL Server 2022 or Azure SQL.
            optionsBuilder.UseSqlServer("Server=localhost;Database=VectorDemo;Trusted_Connection=True;TrustServerCertificate=True;");
        }

        protected override void OnModelCreating(ModelBuilder modelBuilder)
        {
            // We use the Fluent API to configure the vector index.
            // Without an index, searching over high-dimensional vectors is O(N) 
            // and computationally expensive (slow).
            modelBuilder.Entity<KnowledgeBaseDocument>()
                .HasIndex(d => d.Embedding)
                .HasMethod("IVF_FLAT") // Inverted File Index with Flat Compression
                .IsCreatedOnline();    // Allows creation without blocking table access
        }
    }

    // ---------------------------------------------------------
    // 3. SERVICE LAYER (LOGIC)
    // ---------------------------------------------------------
    public class SemanticSearchService
    {
        private readonly KnowledgeBaseContext _context;

        public SemanticSearchService(KnowledgeBaseContext context)
        {
            _context = context;
        }

        // -----------------------------------------------------
        // HELPER: Mock Embedding Generator
        // -----------------------------------------------------
        // In a real app, this calls an AI API (OpenAI, Azure AI).
        // Here, we simulate a 3-dimensional vector for simplicity 
        // (though the config uses 1536 for realism).
        public static List<float> GenerateMockEmbedding(string text)
        {
            // Simulating a vector representation.
            // Hashing the string to generate deterministic "random" floats.
            var hash = text.GetHashCode();
            return new List<float> 
            { 
                Math.Abs(hash % 100) / 100f, 
                Math.Abs(hash * 2 % 100) / 100f, 
                Math.Abs(hash * 3 % 100) / 100f 
            };
        }

        // -----------------------------------------------------
        // OPERATION: Similarity Search (Cosine Distance)
        // -----------------------------------------------------
        // Finds documents semantically closest to the query.
        // We use Euclidean distance (L2) or Cosine Distance. 
        // SQL Server's vector functions handle the math.
        public async Task<List<KnowledgeBaseDocument>> SearchAsync(string query, int topK = 3)
        {
            // 1. Convert query to vector
            var queryEmbedding = GenerateMockEmbedding(query);

            // 2. Execute Vector Search
            // Note: EF Core 8+ supports raw SQL execution, but vector operators 
            // require specific SQL syntax. We use FromSqlRaw for precision here.
            // The 'vector_distance' function calculates the distance between two vectors.
            // We order by ascending distance (closest = smallest distance).
            var sql = @"
                SELECT TOP({0}) 
                    Id, 
                    Title, 
                    Content, 
                    Embedding 
                FROM Documents 
                ORDER BY vector_distance('cosine', Embedding, @p0) ASC";

            // Convert List<float> to a format SQL Server understands: '[1.0, 2.0, 3.0]'
            var vectorLiteral = FormatVectorForSql(queryEmbedding);

            // Execute the query
            var results = await _context.Documents
                .FromSqlRaw(sql, vectorLiteral)
                .AsNoTracking()
                .ToListAsync();

            return results;
        }

        // -----------------------------------------------------
        // HELPER: SQL Vector Formatting
        // -----------------------------------------------------
        // SQL Server expects vector literals in the format: '[v1, v2, v3]'
        private string FormatVectorForSql(List<float> vector)
        {
            if (vector == null || vector.Count == 0) return "[]";
            return "[" + string.Join(", ", vector.Select(v => v.ToString("F6"))) + "]";
        }
    }

    // ---------------------------------------------------------
    // 4. MAIN EXECUTION (HELLO WORLD)
    // ---------------------------------------------------------
    class Program
    {
        static async Task Main(string[] args)
        {
            // Ensure database is created/migrated
            using (var context = new KnowledgeBaseContext())
            {
                await context.Database.EnsureCreatedAsync();

                // Seed Data (if empty)
                if (!context.Documents.Any())
                {
                    var docs = new[]
                    {
                        new KnowledgeBaseDocument 
                        { 
                            Title = "VPN Setup Guide", 
                            Content = "Instructions to configure Cisco AnyConnect.",
                            Embedding = SemanticSearchService.GenerateMockEmbedding("VPN Setup Guide Instructions Cisco AnyConnect")
                        },
                        new KnowledgeBaseDocument 
                        { 
                            Title = "Holiday Calendar", 
                            Content = "List of public holidays for 2024.",
                            Embedding = SemanticSearchService.GenerateMockEmbedding("Holiday Calendar Public Holidays 2024")
                        },
                        new KnowledgeBaseDocument 
                        { 
                            Title = "Expense Policy", 
                            Content = "How to submit travel expenses.",
                            Embedding = SemanticSearchService.GenerateMockEmbedding("Expense Policy Travel Submit Reimbursement")
                        }
                    };
                    context.Documents.AddRange(docs);
                    await context.SaveChangesAsync();
                }
            }

            // Perform Search
            using (var context = new KnowledgeBaseContext())
            {
                var service = new SemanticSearchService(context);

                // We search for "travel costs", which semantically relates to "Expense Policy"
                // but does not contain the keyword "expense".
                string userQuery = "How do I claim money for flights?";

                Console.WriteLine($"Query: \"{userQuery}\"\n");

                var results = await service.SearchAsync(userQuery);

                Console.WriteLine("Semantic Search Results:");
                foreach (var doc in results)
                {
                    Console.WriteLine($" - {doc.Title} (Relevance Score: N/A in this demo)");
                }
            }
        }
    }
}

Detailed Line-by-Line Explanation

1. Domain Entity (`KnowledgeBaseDocument`)

[Column(TypeName = "vector(1536)")]: This is the most critical attribute. It instructs Entity Framework Core to map the List<float> property to the specific SQL Server vector data type.
- Dimensionality: The number 1536 represents the number of dimensions in the vector. Standard AI embeddings (like OpenAI's text-embedding-ada-002) produce vectors of this length.
- Why not varbinary or nvarchar? While you could store floats in a JSON string or binary blob, the native vector type allows SQL Server to perform mathematical operations directly on the data without parsing or conversion overhead.
List<float>?: We use a nullable list to handle cases where a document hasn't been embedded yet.

2. DbContext Configuration (`KnowledgeBaseContext`)

OnConfiguring: Standard EF Core setup. Note: You must be connected to SQL Server 2022, Azure SQL Database, or Azure SQL Managed Instance. Older versions (2019 and below) do not support the vector type natively.
OnModelCreating:
- HasMethod("IVF_FLAT"): This creates a Vector Index. Without this, searching a table with 1 million rows requires calculating the distance between the query vector and every single row (a brute-force scan). This is prohibitively slow.
- IVF (Inverted File Index): This partitions the vector space into clusters (centroids). When you search, SQL Server only calculates distances for vectors in the closest cluster(s), drastically reducing computation.
- IsCreatedOnline(): Ensures the index can be built or rebuilt while the table is in use, preventing downtime.

3. Service Layer (`SemanticSearchService`)

GenerateMockEmbedding: In a production environment, you would call an API (like Azure OpenAI) here. For this "Hello World" example, we simulate the vector generation using a hash function so the results are deterministic.
SearchAsync:
- The SQL Query: We use FromSqlRaw because it gives us precise control over the SQL syntax required for vector operations.
- vector_distance('cosine', Embedding, @p0): This is the core function.
  - Cosine Distance: Measures the angle between two vectors. It is preferred for text embeddings because it captures semantic similarity regardless of document length (magnitude).
  - @p0: This represents the query vector passed as a parameter.
- FormatVectorForSql: SQL Server accepts vector literals in the format [1.0, 2.0, 3.0]. This helper converts a C# List<float> into that string format.

4. Program Execution

EnsureCreatedAsync: Creates the database and tables if they don't exist.
Seeding: We insert three distinct documents. Notice the embedding for "Expense Policy" is generated from keywords related to travel and money.
The Query: The user asks about "claim money for flights".
- Keyword Search Failure: A standard SQL LIKE '%flight%' would find nothing in the "Expense Policy" document.
- Vector Search Success: The vector for "claim money for flights" will mathematically align closely with the vector for "Expense Policy" because they share semantic context (finance, travel), even if the specific words differ.

Common Pitfalls

Using the Wrong SQL Server Version
- Mistake: Attempting to run vector(1536) on SQL Server 2019 or older.
- Result: A runtime error stating that the data type is unknown.
- Fix: Ensure your target environment is SQL Server 2022+ or Azure SQL.
Missing Vector Index
- Mistake: Forgetting to configure HasMethod("IVF_FLAT") (or HNSW) in OnModelCreating.
- Result: As your data grows, the search performance will degrade exponentially. A table with 10,000 rows might be fine, but 1,000,000 rows will cause timeouts.
- Fix: Always index your vector columns.
Dimensionality Mismatch
- Mistake: Defining the column as vector(1024) but trying to insert a list of 1536 floats.
- Result: A SQL error. The vector dimensions must match exactly between the database schema and the application data.
- Fix: Verify the output dimension of your AI embedding model and configure the SQL column accordingly.
Incorrect Distance Metric
- Mistake: Using Euclidean distance (L2) when Cosine similarity is appropriate (or vice versa).
- Result: Search results may not reflect semantic relevance. Cosine is generally the standard for text embeddings; Euclidean is often used for image embeddings or spatial data.
- Fix: Consult your embedding model's documentation. Most text models recommend Cosine.

Visualization of Data Flow

The diagram illustrates how an embedding model converts raw text into numerical vectors, which are then compared using a similarity metric—typically cosine similarity—to measure their semantic closeness.

The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon

Loading knowledge check...

Code License: All code examples are released under the MIT License. Github repo.

All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.

Chapter 8: Storing Embeddings (List) in SQL Server

Theoretical Foundations

The Nature of Embeddings: A Mathematical Compass

Distance Metrics: The Rules of Measurement

The Challenge of High-Dimensional Search

Vector Indexing Strategies: Navigating the Library

SQL Server's vector Data Type and Spatial Indexes

Integrating with EF Core: The Bridge to C

Theoretical Foundations

Visualizing the Vector Space

Conclusion

Basic Code Example

Detailed Line-by-Line Explanation

1. Domain Entity (KnowledgeBaseDocument)

2. DbContext Configuration (KnowledgeBaseContext)

3. Service Layer (SemanticSearchService)

4. Program Execution

Common Pitfalls

Visualization of Data Flow

SQL Server's `vector` Data Type and Spatial Indexes

1. Domain Entity (`KnowledgeBaseDocument`)

2. DbContext Configuration (`KnowledgeBaseContext`)

3. Service Layer (`SemanticSearchService`)