Chapter 14: Citations - Linking Answers Back to Sources

Theoretical Foundations

When we build a Retrieval-Augmented Generation (RAG) system, we are essentially giving an LLM access to a private library of information it wasn't trained on. We ask it a question, it looks up relevant documents, and it synthesizes an answer. However, there is a fundamental trust gap. The model is a black box. It might provide a confident, articulate, and seemingly correct answer, but as a user, you are left with a critical question: "How do I know that's true?"

This is the problem of hallucination. Without a mechanism to verify the information, the user has no choice but to either blindly trust the AI or spend time manually searching the source documents to fact-check the response. This defeats the purpose of an efficiency tool. If the user has to do the work anyway, the value proposition of the RAG system collapses.

Citations are the solution. They are the digital equivalent of a research paper's bibliography or a lawyer's reference to case law. They are the "receipts" that prove the AI's answer is not a creative invention but a grounded synthesis of verifiable source material. By linking every part of an answer back to its origin, we transform the system from a "magic box" into a transparent, trustworthy research assistant.

The Analogy: The Expert Researcher vs. The Confident Storyteller

Imagine you hire two researchers to answer a complex question about your company's historical financial data.

Researcher A (The RAG System without Citations): This person locks themselves in a room with a massive archive of all your company's financial reports. They come out an hour later with a beautifully written, confident summary. They state, "In Q3 of 2019, revenue increased by 15% due to a successful product launch." It sounds great. But when you ask, "Where did you see that?", they just shrug and say, "I read it in the archives." You have no way to verify their claim without re-reading all the archives yourself. You might start to doubt their other statements.
Researcher B (The RAG System with Citations): This person does the same thing but presents their findings differently. They say, "In Q3 of 2019, revenue increased by 15% [1] due to a successful product launch [2]." At the bottom of the page, they provide a list of sources:
1. 2019 Annual Report, page 47, paragraph 2.
2. Product Launch Post-Mortem Memo, October 2019, Executive Summary.

Now, you have complete trust. You can instantly jump to page 47 of the annual report to see the number for yourself. You can read the product launch memo to confirm the correlation. Researcher B is not just an answer generator; they are a trusted partner in your discovery process.

The Web Development Analogy: Embeddings as a Global Hash Map

In Book 2, Chapter 11, we explored how vector embeddings work. Let's revisit that concept with a web development analogy to understand the mechanics of citations.

Think of your entire corpus of documents (PDFs, web pages, text files) as a massive, unstructured database. Searching this database with traditional keyword search is like trying to find a value in a database without an index—you have to do a full table scan.

Vector embeddings transform this unstructured mess into a highly structured, searchable index. The analogy is a global hash map.

The Key: The "key" in this hash map is the semantic meaning of a piece of text (the "chunk"). This meaning is represented by a long list of numbers called a vector. It's like the hash of a document's concept.
The Value: The "value" is not just the text itself. The value is a rich object containing:
1. The text content of the chunk.
2. Metadata: Crucial information about where this chunk came from.
3. A Unique Identifier: A primary key for this specific chunk.

// A conceptual representation of what's stored in our vector database for each chunk.
type VectorStoreRecord = {
  // The vector embedding (the "hash key") for semantic search.
  embedding: number[]; 

  // The actual text content (the "hash value").
  content: string; 

  // The metadata, which is the foundation of our citation system.
  metadata: {
    source: string; // e.g., "2023-Annual-Report.pdf"
    chunkId: string; // e.g., "2023-Annual-Report.pdf_chunk_42"
    page: number; // e.g., 42
    title: string; // e.g., "Annual Report 2023"
    url?: string; // If the source is a web page
    lastModified: Date;
  };
};

When a user asks a question, we don't just throw the question at the LLM. We first: 1. Embed the question: Convert the user's query into a vector (the hash key). 2. Query the Hash Map: Use a similarity search algorithm (like cosine similarity) to find the "keys" (vectors) in our database that are closest to the query's key. 3. Retrieve the Values: We get back the content and, most importantly, the metadata for the top N most relevant chunks.

This retrieved metadata is the raw ingredient for our citations. The chunkId is the unique pointer that allows us to say, "This specific piece of information came from this exact spot in the source material."

The Core Challenge: Bridging the Gap Between Retrieval and Generation

So, we have the user's question, and we have a set of relevant source chunks with their metadata. The next step is to feed this to the LLM to generate an answer. But how do we ensure the final answer is linked back to these sources?

The LLM doesn't inherently know about our VectorStoreRecord objects or our chunkIds. It just sees a block of text we provide as context. The core challenge is to design a system where the LLM's generated text is "annotated" with the source information in a way that is both machine-readable (for rendering in a UI) and human-readable (for verification).

There are two primary strategies for this, which we can think of as "Frontend" vs. "Backend" approaches to the citation problem:

Post-Processing (The "Backend" Approach): The LLM generates a clean, unadorned text response. After the generation is complete, our application code analyzes the generated text, matches it back to the original retrieved chunks, and programmatically inserts citations. This is like a compiler that injects source map information after the code is generated.
In-Generation (The "Frontend" Approach): We prompt the LLM to generate the answer and the citations simultaneously. We might ask it to output a special format like Markdown ([citation]) or JSON, which includes the source identifiers directly in the response. This is like writing JSX where the component structure and its source are intrinsically linked.

The Importance of Unique Chunk Identifiers

The linchpin of the entire citation system is the Unique Chunk Identifier. Without it, the system falls apart. When we retrieve chunk_42 from 2023-Annual-Report.pdf, we must pass this ID along with the chunk's text to the LLM. The LLM might synthesize information from chunk_42 and chunk_118. Our system needs a way to know that the final synthesized statement is backed by both of these original sources.

This chunkId is the bridge between the raw data store and the final user interface. It's the foreign key that connects the generated answer back to the source of truth.

Visualizing the Citation Flow

The following diagram illustrates the complete flow of information in a RAG system designed for robust citations. Notice the cyclical nature of the generation loop, which allows for iterative refinement if necessary.

This diagram illustrates the cyclical flow of a RAG system, showing how retrieved documents are used to generate a response, which can then be iteratively refined to ensure robust citations.

The "Why": Building Trust, Ensuring Verifiability, and Enabling Traceability

Ultimately, the implementation of citations is not a technical "nice-to-have"; it is a fundamental requirement for any serious enterprise RAG application.

Combating Hallucination: By forcing the model to ground its answers in provided, verifiable text, we dramatically reduce the chance it will invent facts. The citation system acts as a constraint, keeping the model "honest."
Building User Trust: When users can see the sources, they can trust the system. They understand the system's capabilities and limitations. This transparency is the foundation of a positive user experience with AI.
Enabling Verification and Traceability: In a business context, decisions are made based on data. If an AI suggests a course of action based on a report, a user must be able to audit that report. Citations provide an audit trail, answering the question "Where did this information come from?" instantly.
Facilitating Feedback and Correction: If a user spots an error in the AI's interpretation of a source, the citation allows them to pinpoint the exact chunk that caused the error. This feedback is invaluable for improving the chunking strategy, the embedding model, or the LLM prompt.

Basic Code Example

In a Retrieval-Augmented Generation (RAG) system, the Large Language Model (LLM) generates an answer based on retrieved context chunks. To ensure trust and verifiability, we must programmatically attach citations to the generated text. This is achieved by mapping specific phrases or sentences in the LLM's response back to the unique identifiers of the chunks used to generate them.

The most robust way to handle this in a TypeScript environment is to force the LLM to output a structured response (JSON) that contains both the text and an array of source references.

The Data Structure

Before writing the code, we must define the shape of our data. We will assume a Chunk object represents a piece of text from a document.

/**
 * Represents a single segment of text retrieved from a vector database.
 */
type SourceChunk = {
  id: string;          // Unique identifier (e.g., UUID or Vector ID)
  content: string;     // The actual text content
  metadata: {
    source: string;    // e.g., "report_2023.pdf" or "https://example.com"
    page?: number;     // Optional page number
  };
};

The Implementation

Here is a self-contained TypeScript example simulating the retrieval, generation, and citation mapping process.

/**
 * ============================================================================
 * 1. TYPE DEFINITIONS
 * ============================================================================
 */

// The structure of the raw data stored in our vector database.
type VectorChunk = {
  id: string;
  content: string;
  metadata: { source: string; page: number };
};

// The structured output we expect from the LLM to ensure citations are handled.
// We instruct the LLM to output JSON matching this interface.
interface LLMStructuredResponse {
  answer: string; // The natural language response
  citations: string[]; // Array of IDs referencing the SourceChunk.id
}

/**
 * ============================================================================
 * 2. MOCK SERVICES (Simulating Vector DB & LLM)
 * ============================================================================
 */

// Mock Vector Database Retrieval
function mockVectorSearch(query: string): VectorChunk[] {
  // In a real app, this queries Pinecone, Weaviate, or Qdrant.
  return [
    {
      id: "chunk_001",
      content: "The sky is blue because of Rayleigh scattering.",
      metadata: { source: "science_manual.pdf", page: 1 },
    },
    {
      id: "chunk_002",
      content: "Photosynthesis requires sunlight, water, and carbon dioxide.",
      metadata: { source: "biology_101.pdf", page: 42 },
    },
  ];
}

// Mock LLM Inference with Citation Generation
async function mockLLMInference(
  query: string,
  context: VectorChunk[]
): Promise<LLMStructuredResponse> {
  // Simulate network delay
  await new Promise((r) => setTimeout(r, 100));

  // In a real scenario, you would send a prompt like:
  // "Answer the question based ONLY on the context below.
  //  Output JSON: { answer: string, citations: string[] }"

  // Here, we mock the LLM correctly identifying that the answer 
  // comes from chunk_001.
  return {
    answer: "The sky appears blue due to a phenomenon called Rayleigh scattering.",
    citations: ["chunk_001"], 
  };
}

/**
 * ============================================================================
 * 3. CORE LOGIC: RECONCILING DATA WITH CITATIONS
 * ============================================================================
 */

/**
 * Fetches data, generates an answer, and hydrates the response with source links.
 * This mimics a Next.js Server Action or API Route handler.
 */
async function generateCitedAnswer(userQuery: string) {
  // Step 1: Retrieve relevant chunks from Vector DB
  const retrievedChunks = mockVectorSearch(userQuery);

  // Step 2: Send query + context to LLM to get structured answer + citation IDs
  const llmResponse = await mockLLMInference(userQuery, retrievedChunks);

  // Step 3: Reconciliation (The Citation Step)
  // We map the citation IDs from the LLM back to the full metadata 
  // of the retrieved chunks.
  const sources = llmResponse.citations.map((id) => {
    const foundChunk = retrievedChunks.find((c) => c.id === id);

    if (!foundChunk) {
      // This happens if the LLM hallucinates an ID that wasn't retrieved.
      console.warn(`Hallucinated citation ID: ${id}`);
      return null;
    }

    return {
      id: foundChunk.id,
      sourceName: foundChunk.metadata.source,
      page: foundChunk.metadata.page,
    };
  }).filter(Boolean); // Remove nulls

  // Step 4: Return the fully hydrated object to the frontend
  return {
    answer: llmResponse.answer,
    sources: sources,
  };
}

/**
 * ============================================================================
 * 4. FRONTEND RENDERING (React Component Simulation)
 * ============================================================================
 */

// Simulating a React Component rendering the result
function renderUI(result: Awaited<ReturnType<typeof generateCitedAnswer>>) {
  console.log("\n--- RENDERED UI ---");
  console.log(`Answer: ${result.answer}`);

  if (result.sources.length > 0) {
    console.log("Sources:");
    result.sources.forEach((src) => {
      // In a real React app, this would be: <a href={`/view/${src.id}`}>Source</a>
      console.log(`- [${src.sourceName}, Page ${src.page}] (ID: ${src.id})`);
    });
  }
  console.log("-------------------\n");
}

/**
 * ============================================================================
 * 5. EXECUTION FLOW
 * ============================================================================
 */

(async () => {
  const userQuery = "Why is the sky blue?";

  // 1. Fetch and process
  const finalData = await generateCitedAnswer(userQuery);

  // 2. Render (Simulate UI update)
  renderUI(finalData);
})();

Detailed Line-by-Line Explanation

Type Definitions (VectorChunk, LLMStructuredResponse):
- We strictly type our data. VectorChunk represents the raw data from the database.
- LLMStructuredResponse is the crucial interface. By defining this, we are setting a contract for the LLM. We are forcing the model to return an array of citations (IDs) alongside the text. This prevents the LLM from simply hallucinating sources.
mockVectorSearch:
- This simulates the semantic search step. It takes a query (e.g., "Why is the sky blue?") and returns a list of relevant document chunks.
- Note: In a real system, this function queries a Vector Database like Pinecone or Qdrant.
mockLLMInference:
- This simulates the LLM call. In a production environment (using Vercel AI SDK or OpenAI), you would pass the retrievedChunks as context.
- Critical: The prompt engineering here is vital. You must instruct the LLM: "Analyze the context. Generate an answer. Identify the specific chunk IDs that support your answer. Output valid JSON."
- The mock returns { answer: "...", citations: ["chunk_001"] }.
generateCitedAnswer (The Reconciliation Logic):
- Step 1 & 2: We orchestrate the search and the generation.
- Step 3 (Reconciliation): This is the most important logic block.
  - We iterate through the IDs returned by the LLM (llmResponse.citations).
  - We look up these IDs in our original list of retrieved chunks.
  - Why? The LLM only knows the ID (e.g., "chunk_001"). It doesn't know the source filename ("science_manual.pdf") or the page number. We use the ID to fetch that metadata.
  - Safety Check: We filter out null values. If the LLM hallucinates an ID (returns "chunk_999" which wasn't in the search results), we discard it to prevent broken links.
renderUI:
- This simulates the frontend. The data structure returned from the backend is now clean and ready for rendering.
- We can confidently render a clickable link or a tooltip because the metadata has been verified against the actual database records.

Visualizing the Data Flow

The diagram illustrates the clean data flow from the backend to the frontend, where verified metadata is used to confidently render interactive UI elements like clickable links and tooltips.

Common Pitfalls

When implementing citations in TypeScript-based RAG systems, watch out for these specific issues:

Hallucinated JSON / IDs:
- The Issue: The LLM returns a string instead of valid JSON, or it returns a citation ID that doesn't exist in your retrieved context (e.g., ["chunk_999"] when you only retrieved ["chunk_001"]).
- The Fix: Use Zod or JSON5 to parse and validate the LLM's output. If parsing fails, retry the request or fall back to a generic "No sources available" state. Never trust the LLM to output perfect JSON without validation.
Vercel/AWS Timeouts (Streaming):
- The Issue: If you perform the reconciliation (Step 3) after the LLM stream finishes, you might hit the 10-second timeout limit on serverless functions (Vercel Edge/Lambda) for long responses.
- The Fix: If you are streaming, perform the vector search before the stream starts. Pass the retrieved chunks into the stream generation. If you need to map citations during the stream, you must embed the IDs directly into the stream tokens (e.g., The sky is blue [1]) and parse them on the client side, or use a background job to hydrate citations later.
Async/Await Loops in Rendering:
- The Issue: Trying to fetch citation metadata inside a .map() function in a React component (e.g., data.citations.map(id => fetchMetadata(id))). This creates a "waterfall" of requests and breaks the rules of React Server Components.
- The Fix: Always resolve all citations on the server (in the generateCitedAnswer function) before sending the data to the client. The client should only receive the final, fully hydrated array of sources.

The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon

Loading knowledge check...

Code License: All code examples are released under the MIT License. Github repo.

All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.