Skip to content

Chapter 10: Hybrid Search - Combining Keywords + Vectors

Theoretical Foundations

Hybrid search is the architectural answer to a fundamental limitation in modern information retrieval: no single retrieval mechanism is universally optimal. Traditional keyword-based search, like BM25 or TF-IDF, excels at exact term matching and is highly interpretable, but it suffers from the "vocabulary mismatch" problem. It fails to understand synonyms, context, or semantic intent. For instance, a user searching for "canine companions" might miss documents that only contain the word "dogs." Conversely, modern vector search (semantic search) using dense embeddings captures the meaning and context behind queries, making it robust to vocabulary mismatches. However, pure vector search can be imprecise for exact matches, proper nouns, or specific IDs, and it lacks the explainability of keyword matching.

The core concept of hybrid search is to combine these two orthogonal signals—lexical (keyword) and semantic (vector)—to produce a result set that is greater than the sum of its parts. In the context of building AI applications with C# and EF Core, this is not merely a theoretical exercise; it is a practical necessity for building robust Retrieval-Augmented Generation (RAG) systems. A RAG system that relies solely on vector search might retrieve a document about "neural networks" when the user asks about "brain neurons," which, while semantically related, is contextually wrong. By blending keyword constraints (e.g., requiring the document to contain the term "biology") with vector similarity, we create a retrieval pipeline that is both intelligent and precise.

The Core Problem: Orthogonal Strengths and Weaknesses

To understand hybrid search, we must first dissect the strengths and weaknesses of its components. This is analogous to a medical diagnosis. A general practitioner (vector search) looks at the holistic symptoms and patient history to form a broad hypothesis. A specialist (keyword search) runs specific tests to confirm or deny that hypothesis. Neither is sufficient alone; the generalist misses nuances, and the specialist lacks context.

Keyword Search (The Specialist): In our EF Core context, we often use Full-Text Search capabilities or simple string.Contains for smaller datasets. The underlying principle is inverted indexing. It answers the question: "Which documents contain these specific tokens?" Its strength is precision for exact terms. Its weakness is recall; it is brittle to typos, synonyms, and paraphrasing.

Vector Search (The Generalist): As discussed in previous chapters on vector databases, we represent text as high-dimensional vectors (embeddings) using models like OpenAI's text-embedding-ada-002 or open-source alternatives. The search then becomes a nearest neighbor problem in vector space, typically measured by Cosine Similarity. Its strength is recall; it finds conceptually related items. Its weakness is precision; it can retrieve irrelevant but semantically proximate results.

The Synergy: When we combine them, we mitigate the weaknesses. Keyword search prunes the vast vector space down to a candidate set that is lexically relevant. Vector search then re-ranks or filters this set based on semantic meaning. This is the essence of hybrid search.

In the context of a .NET application using EF Core, we are not just querying a database; we are orchestrating a multi-modal retrieval strategy. There are two primary architectural patterns for implementing this:

  1. Post-Search Fusion (Reciprocal Rank Fusion - RRF): This is the most flexible and common approach. We run two independent queries in parallel:

    • A keyword query against the database's text index.
    • A vector query against the vector store (which could be the same database using a vector extension, or a separate service like Pinecone). We get two ranked lists of results. We then fuse these lists using an algorithm like Reciprocal Rank Fusion, which assigns a score to each document based on its rank in each list. The final result is a unified, re-ranked list.
  2. Pre-Search Filtering (Vector Search with Keyword Constraints): This pattern uses the keyword query to filter the dataset before performing the vector search. For example, you might first retrieve all documents that contain the keyword "financial report" and then, within that subset, find the one most semantically similar to "quarterly earnings summary." This is more efficient but can be too restrictive if the keyword filter is too narrow.

Reciprocal Rank Fusion (RRF) in Detail

RRF is the cornerstone of modern hybrid search. It is a score-based fusion technique that is robust and does not require tuning weights between the two modalities. The formula is deceptively simple:

\[ \text{RRFScore}(d) = \sum_{r \in \{ \text{keyword\_rank}, \text{vector\_rank} \}} \frac{1}{60 + r} \]

Where d is a document, and r is the rank of that document in a result list (1 for the top result, 2 for the second, etc.). The constant 60 is a hyperparameter that dampens the contribution of lower-ranked results; it ensures that a document appearing at rank 1000 in one list but rank 1 in the other still has a meaningful score.

Why RRF? RRF is advantageous because it is rank-based, not score-based. This means it doesn't matter if the vector similarity score is 0.9 and the keyword relevance score is 10.5; RRF only cares about their relative positions. This makes it incredibly stable across different data distributions and scoring mechanisms.

Analogy: The Election Imagine an election with two parties: the Keyword Party and the Vector Party. Each party ranks candidates. RRF is the final tally. A candidate gets points based on their rank in each party's list. The candidate who is consistently high-ranked across both parties wins, not necessarily the one who won by a landslide in one party but was unknown in the other. This prevents a single modality from dominating the results.

Implementing Hybrid Search with EF Core

In a .NET application, this architecture requires careful orchestration. We cannot rely on a single LINQ query. We need to manage state and combine results in memory.

Data Schema Considerations: To support hybrid search efficiently, our EF Core entity must be designed with both modalities in mind.

using Microsoft.EntityFrameworkCore;
using System.Collections.Generic;
using System.ComponentModel.DataAnnotations;

public class SearchableDocument
{
    public int Id { get; set; }

    [MaxLength(2000)]
    public string Title { get; set; }

    // The raw text for keyword indexing
    public string Content { get; set; }

    // The vector embedding, stored as a byte array or a specialized type
    // depending on the database provider (e.g., pgvector in PostgreSQL)
    public byte[] Embedding { get; set; } 

    // For efficient keyword search, we might also have a pre-computed
    // Full-Text Search vector column (tsvector in PostgreSQL)
    public string SearchVector { get; set; } 
}

public class HybridSearchContext : DbContext
{
    public DbSet<SearchableDocument> Documents { get; set; }

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
    {
        // Configuration for a database that supports both Full-Text Search and Vectors
        // e.g., PostgreSQL with pgvector and pg_trgm extensions.
        optionsBuilder.UseNpgsql("YourConnectionString");
    }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        // Configure the vector column for pgvector
        modelBuilder.Entity<SearchableDocument>()
            .Property(d => d.Embedding)
            .HasColumnType("vector(1536)"); // Dimensionality of the embedding model

        // Configure the GIN index for Full-Text Search
        modelBuilder.Entity<SearchableDocument>()
            .HasIndex(d => d.SearchVector)
            .HasMethod("GIN");
    }
}

Orchestrating the Search: The implementation logic involves three distinct phases: Query Execution, Result Fusion, and Ranking.

  1. Phase 1: Parallel Query Execution We issue two queries. The keyword query uses EF Core's FromSqlRaw or LINQ to execute a full-text search query. The vector query uses a vector similarity operator (e.g., <-> in PostgreSQL). These must be executed asynchronously and potentially in parallel using Task.WhenAll.

  2. Phase 2: Result Fusion (RRF) We receive two List<SearchResult> objects, each containing an Id and a Rank. We then apply the RRF algorithm. A ConcurrentDictionary is useful here to aggregate scores efficiently.

  3. Phase 3: Final Retrieval and Hydration Once we have the final RRF-scored list of document IDs, we perform a final, efficient query to retrieve the full SearchableDocument entities from EF Core, ordered by the RRF score.

The Role of Modern C# Features

In building this AI-driven search system, modern C# features are not just syntactic sugar; they are architectural enablers.

IAsyncEnumerable<T> and Task.WhenAll: Hybrid search is inherently I/O bound. We are waiting for the database to return two different result sets. Using async/await with Task.WhenAll allows us to execute these queries concurrently, significantly reducing latency.

public async Task<IEnumerable<SearchResult>> HybridSearchAsync(string query, string embedding)
{
    // Phase 1: Parallel Execution
    var keywordTask = _context.Documents
        .FromSqlRaw("SELECT * FROM documents WHERE search_vector @@ to_tsquery({0})", query)
        .Select(d => new SearchResult { Id = d.Id, Rank = /* rank from SQL */ })
        .ToListAsync();

    var vectorTask = _context.Documents
        .FromSqlRaw("SELECT * FROM documents ORDER BY embedding <-> {0} LIMIT 50", embedding)
        .Select(d => new SearchResult { Id = d.Id, Rank = /* rank from SQL */ })
        .ToListAsync();

    await Task.WhenAll(keywordTask, vectorTask);

    var keywordResults = await keywordTask;
    var vectorResults = await vectorTask;

    // Phase 2: RRF Fusion
    var fusedScores = new Dictionary<int, double>();

    // Apply RRF to keyword results
    foreach (var result in keywordResults)
    {
        fusedScores[result.Id] = 1.0 / (60 + result.Rank);
    }

    // Apply RRF to vector results
    foreach (var result in vectorResults)
    {
        if (fusedScores.ContainsKey(result.Id))
        {
            fusedScores[result.Id] += 1.0 / (60 + result.Rank);
        }
        else
        {
            fusedScores[result.Id] = 1.0 / (60 + result.Rank);
        }
    }

    // Phase 3: Final Retrieval
    var topIds = fusedScores.OrderByDescending(kvp => kvp.Value).Take(10).Select(kvp => kvp.Key).ToList();

    return await _context.Documents
        .Where(d => topIds.Contains(d.Id))
        .Select(d => new SearchResult { Id = d.Id, /* ... */ })
        .ToListAsync();
}

record Types: For intermediate data transfer objects (DTOs) like SearchResult, using record types provides value-based equality and immutability, which is crucial when shuffling data between query execution and fusion logic.

public record SearchResult(int Id, int Rank, double? Score = null);

IReadOnlyList<T> and Span<T>: When dealing with large result sets from the vector database, using IReadOnlyList<T> prevents unnecessary copying. For high-performance scoring calculations, Span<T> can be used to operate on memory slices without allocations, which is critical when scaling to millions of documents.

Edge Cases and Nuances

  1. Empty Results: What if one query returns no results? RRF handles this gracefully. If the keyword search returns nothing, the final score is just the vector score. This is a feature, not a bug; it allows the system to fall back gracefully to the stronger modality.
  2. Tie-Breaking: If two documents have the exact same RRF score, we need a deterministic tie-breaker. A common strategy is to prefer the document with the higher keyword score (or vector score) or to use a secondary sort by date or relevance.
  3. Query Expansion: In sophisticated systems, the initial query might be expanded. For example, we might use a language model to generate synonyms for the keyword query before execution, or use the vector query to find related concepts to enrich the keyword search. This is a form of "query routing" where the system intelligently modifies the query based on its type.
  4. Performance at Scale: Running two separate queries is expensive. In a production system, we might use a dedicated vector database (like Pinecone or Milvus) that supports hybrid search natively, and use EF Core to orchestrate the process or to retrieve the final entities. The pattern remains the same: query both, fuse, retrieve.

The "Why" in the Context of AI Applications

In the context of AI applications, specifically RAG, hybrid search is non-negotiable. The "retrieval" step is the foundation of RAG's accuracy. If the retrieved context is irrelevant, the LLM's generation will be hallucinated or incorrect.

Consider a legal document retrieval system. A user asks: "What is the precedent for res ipsa loquitur in California?"

  • Pure Keyword: Might miss documents that discuss "the thing speaks for itself" without the Latin term.
  • Pure Vector: Might retrieve documents about general legal principles in California but miss the specific tort doctrine.
  • Hybrid: The keyword search ensures we are in the "tort law" domain (by matching "precedent" and "California"), while the vector search finds the specific doctrine of res ipsa loquitur even if the Latin term isn't used.

By using EF Core to manage the structured data and orchestrate this hybrid retrieval, we maintain a clean separation of concerns. The database handles the storage and indexing, while the C# application logic handles the fusion and business rules. This architecture is scalable, maintainable, and leverages the full power of both traditional and modern search paradigms.

Visualization of the Hybrid Search Flow

The following diagram illustrates the data flow and decision points in a hybrid search system.

The diagram illustrates a hybrid search flow where a user query is processed through both traditional keyword-based retrieval and modern vector-based semantic search, with their respective results being fused and re-ranked before a final answer is generated.
Hold "Ctrl" to enable pan & zoom

The diagram illustrates a hybrid search flow where a user query is processed through both traditional keyword-based retrieval and modern vector-based semantic search, with their respective results being fused and re-ranked before a final answer is generated.

This flow demonstrates the decoupling of retrieval strategies. The keyword and vector queries are independent, allowing them to be optimized separately (e.g., using different database indexes). The fusion step is purely computational and can be scaled horizontally. The final retrieval leverages EF Core's change tracking and materialization capabilities to provide fully hydrated entities to the application layer.

Conclusion

The theoretical foundation of hybrid search rests on the principle of complementarity. By combining the lexical precision of keyword search with the semantic recall of vector search, we create a retrieval system that is robust, accurate, and context-aware. In the .NET ecosystem, EF Core serves as the ideal orchestrator, providing a unified interface to manage structured data while integrating with specialized vector storage backends. The use of modern C# features like IAsyncEnumerable, record types, and parallel task execution ensures that this complex orchestration is both performant and maintainable, forming a solid foundation for building next-generation AI applications.

Basic Code Example

Here is a self-contained, "Hello World" level code example demonstrating a basic hybrid search implementation using EF Core and an in-memory vector calculation.

Imagine you are building a search engine for a digital library of technical documentation. Users need to find articles based on two criteria:

  1. Keyword Match: The article title or summary contains specific terms (e.g., "Performance").
  2. Semantic Similarity: The meaning of the user's query matches the article's content, even if the exact words don't appear (e.g., query "How to make code run faster" matching an article titled "Optimizing CPU Cycles").

This example simulates a simplified version of this hybrid retrieval system.

using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.DependencyInjection;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;

// 1. Define the Data Model
// We represent a document with a text summary and a pre-computed vector embedding.
// In a real scenario, the vector would be a float[] or a specialized type like pgvector's Vector.
public class LibraryDocument
{
    public int Id { get; set; }
    public string Title { get; set; } = string.Empty;
    public string Summary { get; set; } = string.Empty;

    // Simulating a vector embedding (e.g., from Azure OpenAI or local ONNX model)
    // For this "Hello World", we use a simple 3-dimensional vector for readability.
    public string VectorEmbedding { get; set; } = string.Empty; 
}

// 2. Define the DbContext
public class LibraryContext : DbContext
{
    public DbSet<LibraryDocument> Documents { get; set; }

    public LibraryContext(DbContextOptions<LibraryContext> options) : base(options) { }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        // Configure the vector storage. 
        // Note: EF Core doesn't natively support vector types out-of-the-box.
        // We store it as a string for this simulation, but in production (e.g., with pgvector),
        // you would map this to a 'vector' column type.
        modelBuilder.Entity<LibraryDocument>()
            .Property(e => e.VectorEmbedding)
            .HasColumnType("text");
    }
}

// 3. The Hybrid Search Service
public class HybridSearchService
{
    private readonly LibraryContext _context;

    public HybridSearchService(LibraryContext context)
    {
        _context = context;
    }

    // Main entry point for the hybrid query
    public async Task<List<SearchResult>> SearchAsync(string query)
    {
        // Step A: Keyword Search (BM25/Full-text approximation)
        // We filter documents where the title or summary contains the query terms.
        var keywordResults = await _context.Documents
            .Where(d => d.Title.Contains(query) || d.Summary.Contains(query))
            .Select(d => new { d.Id, d.Title })
            .ToListAsync();

        // Step B: Vector Search (Semantic Similarity)
        // We fetch all documents to calculate cosine similarity in memory (for this demo).
        // In production, this logic is pushed down to the DB (e.g., using 'pgvector').
        var allDocs = await _context.Documents.ToListAsync();

        // Convert query string to a vector (Mocked for this example)
        var queryVector = MockEmbeddingGenerator.Generate(query);

        var vectorResults = allDocs
            .Select(d => new 
            { 
                d.Id, 
                d.Title, 
                Score = CalculateCosineSimilarity(
                    queryVector, 
                    ParseVector(d.VectorEmbedding)
                ) 
            })
            .Where(x => x.Score > 0.1) // Threshold to filter noise
            .ToList();

        // Step C: Result Fusion (Reciprocal Rank Fusion - Simplified)
        // We combine the two lists, giving weight to both keyword matches and semantic matches.
        var fusedResults = new Dictionary<int, (string Title, double FinalScore)>();

        // Add Keyword Scores (Rank-based weighting)
        for (int i = 0; i < keywordResults.Count; i++)
        {
            // RRF formula: 1 / (rank + k) (k=60 is standard, simplified here)
            double score = 1.0 / (i + 1 + 60); 
            fusedResults[keywordResults[i].Id] = (keywordResults[i].Title, score);
        }

        // Add Vector Scores (Similarity-based weighting)
        foreach (var vRes in vectorResults)
        {
            // We normalize vector scores to a 0-1 range and apply a weight
            double weight = 0.5; // Give vector search 50% importance
            double score = vRes.Score * weight;

            if (fusedResults.ContainsKey(vRes.Id))
            {
                var existing = fusedResults[vRes.Id];
                fusedResults[vRes.Id] = (existing.Title, existing.FinalScore + score);
            }
            else
            {
                fusedResults[vRes.Id] = (vRes.Title, score);
            }
        }

        // Step D: Sort and Return
        return fusedResults
            .OrderByDescending(x => x.Value.FinalScore)
            .Select(x => new SearchResult { Id = x.Key, Title = x.Value.Title, Score = x.Value.FinalScore })
            .ToList();
    }

    // Helper: Parse string "1.0,2.0,3.0" to double[]
    private double[] ParseVector(string vectorStr)
    {
        return vectorStr.Split(',').Select(double.Parse).ToArray();
    }

    // Helper: Calculate Cosine Similarity
    private double CalculateCosineSimilarity(double[] vecA, double[] vecB)
    {
        if (vecA.Length != vecB.Length) throw new ArgumentException("Vectors must be same length");

        double dotProduct = 0.0;
        double magnitudeA = 0.0;
        double magnitudeB = 0.0;

        for (int i = 0; i < vecA.Length; i++)
        {
            dotProduct += vecA[i] * vecB[i];
            magnitudeA += vecA[i] * vecA[i];
            magnitudeB += vecB[i] * vecB[i];
        }

        magnitudeA = Math.Sqrt(magnitudeA);
        magnitudeB = Math.Sqrt(magnitudeB);

        if (magnitudeA == 0 || magnitudeB == 0) return 0;
        return dotProduct / (magnitudeA * magnitudeB);
    }
}

// 4. Mock Data Generator (To make the example runnable without external services)
public static class MockEmbeddingGenerator
{
    // Simulates an AI model turning text into numbers.
    // "Optimization" -> [0.9, 0.1, 0.2]
    // "Performance"  -> [0.8, 0.2, 0.3]
    public static double[] Generate(string text)
    {
        // Simple hash-based generation for deterministic output
        double x = text.Length * 0.1;
        double y = text.Contains("Optimization") ? 0.9 : 0.1;
        double z = text.Contains("Performance") ? 0.8 : 0.2;
        return new[] { x, y, z };
    }
}

public class SearchResult
{
    public int Id { get; set; }
    public string Title { get; set; } = string.Empty;
    public double Score { get; set; }
}

// 5. Main Execution
class Program
{
    static async Task Main(string[] args)
    {
        // Setup Dependency Injection (Simulated)
        var services = new ServiceCollection();
        services.AddDbContext<LibraryContext>(options => 
            options.UseInMemoryDatabase("HybridSearchDb"));
        services.AddScoped<HybridSearchService>();

        var serviceProvider = services.BuildServiceProvider();

        // Seed Data
        using (var scope = serviceProvider.CreateScope())
        {
            var context = scope.ServiceProvider.GetRequiredService<LibraryContext>();
            await context.Database.EnsureCreatedAsync();

            // Vector embeddings are pre-calculated. 
            // In reality, these come from a model (e.g., text-embedding-ada-002).
            context.Documents.AddRange(
                new LibraryDocument { Title = "Intro to C#", Summary = "Basics of the language", VectorEmbedding = "0.1,0.1,0.1" },
                new LibraryDocument { Title = "Performance Tuning", Summary = "How to optimize code", VectorEmbedding = "0.9,0.9,0.1" }, // High similarity to "Optimization"
                new LibraryDocument { Title = "Database Indexing", Summary = "Improving query speed", VectorEmbedding = "0.8,0.8,0.2" }  // Medium similarity
            );
            await context.SaveChangesAsync();
        }

        // Execute Search
        using (var scope = serviceProvider.CreateScope())
        {
            var searchService = scope.ServiceProvider.GetRequiredService<HybridSearchService>();

            // User Query: "Optimization" (Keywords match 'Optimize', Vector matches 'Performance Tuning')
            string userQuery = "Optimization";
            Console.WriteLine($"Searching for: '{userQuery}'\n");

            var results = await searchService.SearchAsync(userQuery);

            foreach (var result in results)
            {
                Console.WriteLine($"[Score: {result.Score:F4}] {result.Title}");
            }
        }
    }
}

Detailed Line-by-Line Explanation

1. Data Modeling

  • LibraryDocument Class: This represents our database entity.
    • VectorEmbedding: We store the vector as a string (e.g., "0.1,0.2,0.3"). In a production PostgreSQL database using the pgvector extension, EF Core would map this to a native vector type. For this "Hello World", the string format allows us to simulate the concept without requiring complex database setup.
  • LibraryContext Class: This is the standard EF Core DbContext.
    • OnModelCreating: We configure the VectorEmbedding property. By setting HasColumnType("text"), we ensure EF Core creates a text column in the database (or uses a string in InMemory provider) to hold our serialized vector data.

2. The Hybrid Search Logic (HybridSearchService)

This is the core engine of the example.

  • SearchAsync Method:

    • Input: Takes a raw text string (query).
    • Output: Returns a list of SearchResult objects, ranked by a combined relevance score.
  • Step A: Keyword Search:

    • var keywordResults = ...: We use standard LINQ to SQL.
    • .Where(d => d.Title.Contains(query) || d.Summary.Contains(query)): This performs a substring search. In a real SQL database, this would translate to a LIKE operation or Full-Text Search (FTS). This handles the "literal match" requirement.
  • Step B: Vector Search:

    • var allDocs = ...: For this simple example, we pull all documents into memory to calculate similarity. In a high-scale production system, you would push this calculation to the database engine (e.g., ORDER BY vector <=> @query_vector in PostgreSQL).
    • MockEmbeddingGenerator.Generate(query): Since we don't have a live AI model running, this helper function deterministically generates a vector based on the input text string to simulate an embedding.
    • CalculateCosineSimilarity: This implements the standard mathematical formula for Cosine Similarity: \(\frac{A \cdot B}{\|A\| \|B\|}\). It measures the angle between two vector embeddings. A score of 1.0 means identical direction (semantic meaning), while 0.0 means orthogonal (no relation).
  • Step C: Result Fusion:

    • The Problem: Keyword search returns a binary "match/no-match" or a rank. Vector search returns a continuous similarity score (0.0 to 1.0). You cannot simply add them together because the scales are different.
    • The Solution (Reciprocal Rank Fusion - Simplified):
      • We use a Dictionary to aggregate scores by Document ID.
      • Keyword Scoring: We assign a score based on the position in the list (Rank). The higher it appears in the keyword match list, the higher the score. Formula: \(1 / (rank + constant)\).
      • Vector Scoring: We take the raw cosine similarity and multiply it by a weight (e.g., 0.5). This allows us to tune how much we trust the AI's semantic understanding versus the literal keyword match.
      • Aggregation: If a document appears in both lists, we sum their scores. This boosts documents that are both textually and semantically relevant.
  • Step D: Sorting:

    • We order the final dictionary by FinalScore descending to show the most relevant results first.

3. Execution (Program.cs)

  • Dependency Injection: We set up a standard .NET DI container. We use UseInMemoryDatabase so you can run this code immediately without installing PostgreSQL or SQL Server.
  • Seeding:
    • We insert three documents.
    • Notice the VectorEmbedding values. "Performance Tuning" has a vector [0.9, 0.9, 0.1]. The mock generator creates a query vector for "Optimization" as [0.9, 0.9, 0.1]. They are mathematically identical in this simulation, guaranteeing a high semantic score.
  • Search Execution:
    • We search for "Optimization".
    • Keyword Check: "Performance Tuning" does not contain the word "Optimization" (in this strict string match), so it gets a keyword score of 0.
    • Vector Check: "Performance Tuning" has a vector identical to the query vector, so it gets a high vector score.
    • Result: The document "Performance Tuning" appears in the results because the vector search found it semantically relevant, even though the keyword search missed it. This demonstrates the power of hybrid search.

Common Pitfalls

  1. Vector Dimension Mismatch:

    • The Mistake: Assuming you can compare vectors of different sizes (e.g., comparing a 384-dimensional vector from a Sentence-BERT model with a 1536-dimensional vector from OpenAI).
    • The Consequence: The CalculateCosineSimilarity function will throw an ArgumentException. In a real system, ensure all embeddings in your database were generated by the same model.
  2. The "Linear Combination" Fallacy:

    • The Mistake: Simply adding the raw keyword count to the vector similarity score: Score = KeywordCount + VectorSimilarity.
    • The Consequence: Keyword counts can be high (e.g., 10 occurrences of a word), while vector similarities are bounded between -1 and 1. The keyword score will completely dominate the vector score, rendering the semantic search useless.
    • The Fix: Normalize both scores to a similar range (e.g., 0.0 to 1.0) or use a ranking fusion algorithm like Reciprocal Rank Fusion (RRF) as demonstrated in the code.
  3. Filtering Before Fusion:

    • The Mistake: Applying a strict WHERE clause to the vector search (e.g., WHERE Similarity > 0.8) before combining it with keyword results.
    • The Consequence: You might discard a document that is a perfect keyword match but has a slightly lower semantic score. This breaks the "hybrid" nature of the search.
    • The Fix: Retrieve a broader set of results from both strategies, fuse them, and then apply the final top-N limit.

Visualization of Data Flow

The diagram illustrates the data flow where initial retrieval strategies generate a combined set of results, which are then fused before a final top-N limit is applied to the output.
Hold "Ctrl" to enable pan & zoom

The diagram illustrates the data flow where initial retrieval strategies generate a combined set of results, which are then fused before a final top-N limit is applied to the output.

The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon


Loading knowledge check...



Code License: All code examples are released under the MIT License. Github repo.

Content Copyright: Copyright © 2026 Edgar Milvus | Privacy & Cookie Policy. All rights reserved.

All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.