Chapter 6: The Vercel AI SDK - Streaming from Edge to Client

Theoretical Foundations

In traditional web development, when a user submits a form or clicks a button that triggers a server-side process, the interaction typically follows a "request-response" cycle. The client sends a request, the server processes it, and once the entire computation is complete, the server sends back a single, monolithic block of data. This is analogous to ordering a custom-made piece of furniture: you place the order, the workshop builds it in its entirety, and only when it is 100% finished is it shipped to your door. You see nothing until the final product arrives.

Generative AI, particularly Large Language Models (LLMs), fundamentally changes this dynamic. Generating a coherent, multi-paragraph response can take several seconds. If we were to wait for the model to generate the entire response before sending it to the user, the user would be staring at a loading spinner for an uncomfortably long time, leading to a poor user experience and a perception of slowness.

This is where streaming becomes essential. Instead of waiting for the entire response, the server begins sending data to the client as soon as the first token (a word, or part of a word) is generated by the model. The client receives these tokens in real-time and renders them incrementally. This transforms the user experience from a long, silent wait into an engaging, live event, much like watching a typist type in real-time rather than waiting for a finished document.

The Vercel AI SDK provides the standard primitives and client-side tools to manage this complex flow of data from a serverless Edge function to the browser, abstracting away the low-level complexities of handling streams, managing backpressure, and parsing partial data.

The Architectural Analogy: The Assembly Line vs. The Finished Product Factory

To understand the shift the Vercel AI SDK facilitates, let's use an analogy of manufacturing.

Traditional Request-Response (The Finished Product Factory): Imagine a factory that produces cars. A customer places an order. The factory receives the order, manufactures the entire car from scratch—chassis, engine, body, interior—and only then ships the completed vehicle to the customer. The customer has zero visibility into the process and must wait for the entire production cycle to complete. This is analogous to a standard API call where the server waits for the LLM to complete its entire generation before sending the final 200 OK response.
Streaming with the Vercel AI SDK (The Assembly Line): Now, imagine the same factory reconfigured as a dynamic assembly line. As soon as the order is received, the first parts (e.g., the chassis) start moving down the line. The customer is given a live video feed of the assembly. As the engine is lowered in, the body panels are added, and the interior is installed, the customer sees the car taking shape piece by piece. The car is "delivered" incrementally. The final moment is just the last bolt being tightened, but the customer has been engaged with the product's creation from the very first second. This is streaming. The Vercel AI SDK is the infrastructure that manages this assembly line, ensuring the parts (tokens) flow smoothly from the generator (the Edge function running the LLM) to the observer (the client's browser) without interruption.

The Mechanics of Streaming: Chunks, Buffers, and State

Under the hood, streaming over HTTP is typically handled using Server-Sent Events (SSE) or a similar chunked transfer encoding mechanism. The Vercel AI SDK abstracts this, but it's crucial to understand the underlying principle.

Chunking: The LLM doesn't produce a sentence at once; it generates one token at a time. A "token" can be a word, a punctuation mark, or even a sub-word unit (like "ing" in "running"). The server-side process (in our case, an Edge function) captures each token as it's generated by the model's inference engine.
The Stream: Instead of holding these tokens in a buffer until the generation is complete, the server immediately writes each token to the HTTP response stream. It sends this data as a series of small chunks to the client.
The Client-Side Buffer: The client (a web browser running JavaScript) receives these chunks. However, simply appending each chunk to the DOM (Document Object Model) as it arrives can be inefficient and can cause performance issues due to frequent re-renders. The Vercel AI SDK's client-side hooks manage an internal buffer. They accumulate the incoming tokens and use mechanisms like requestAnimationFrame or React's concurrent features to batch updates to the UI, ensuring a smooth, non-blocking rendering experience.

The Vercel AI SDK: A Standardized Bridge

The Vercel AI SDK introduces a standard format for these streamed values, primarily StreamableValue. This is a critical innovation. Without a standard, every developer would have to invent their own format for chunking data, parsing it on the client, and managing the state. This would lead to fragmentation and incompatibility.

The SDK provides two key components:

Server-Side (Edge Runtime): A set of helpers to easily stream responses from models (like OpenAI's GPT-4) directly from a serverless Edge function. It handles the connection to the AI provider and transforms the model's output into the standard StreamableValue format.
Client-Side (React/Next.js): A set of hooks, like useChat and useCompletion, that abstract the complexity of managing the stream. They provide a simple interface to send messages, receive the streamed response, and update the UI state automatically.

Visualizing the Data Flow

The following diagram illustrates the complete flow from the user's action to the rendered text on the screen.

The diagram visually traces the complete data flow, starting from the user's action, moving through the message sending and streaming response processes, and culminating in the automatic UI state update that renders the final text on the screen.

Explicit Reference to Previous Concepts

In Chapter 5, we discussed LangChain.js and its Agents. An Agent is a reasoning engine that can decide on a sequence of actions to take, such as searching the web or querying a database, to answer a user's question.

Now, consider an Agent that needs to perform a web search to answer a complex query. The process is multi-step: 1. The Agent receives the user's question. 2. It decides a web search is necessary. 3. It formulates a search query and executes it. 4. It receives search results. 5. It reasons over the results to formulate the final answer.

Without streaming, the user would wait for all these steps to complete before seeing any output. This could take many seconds.

With the Vercel AI SDK, we can stream the Agent's entire thought process. As the Agent decides to search, we can stream that thought: "I need to search for the latest information on this topic...". As it gets results, we can stream a summary: "Based on the search results, I found that...". Finally, we stream the conclusive answer. This makes the Agent's complex reasoning transparent and engaging, rather than a black box that takes a long time to respond. The Vercel AI SDK is the delivery mechanism for the Agent's internal monologue.

Why This Matters: Latency, UX, and Efficiency

Perceived Performance: Streaming drastically reduces the perceived latency. Even if the total time to generate a full response is 5 seconds, the user starts seeing output in the first 200ms. This creates a feeling of immediacy and responsiveness.
State Management on the Client: The SDK's hooks (useChat, useCompletion) manage the complex state of a streaming conversation. They handle:
- Appending new tokens to the existing message.
- Managing the lifecycle of the stream (loading, error, finished states).
- Updating the UI without manual DOM manipulation.
Efficiency: Edge functions are stateless and can be geographically distributed. Streaming allows the work to be done close to the user, and the incremental nature of the data transfer is more efficient than waiting for a large payload. It also allows the client to start processing the data (e.g., rendering text) before the entire payload has been received, which is a key principle of modern web performance optimization.

In essence, the Vercel AI SDK provides the standardized, high-level abstractions needed to build modern, real-time AI applications that feel alive and interactive, bridging the gap between powerful server-side models and the client-side user experience.

Basic Code Example

In a modern SaaS application, providing immediate feedback is crucial for user experience. Instead of waiting for a full server response, we can stream data incrementally. The Vercel AI SDK provides a standardized way to stream values from serverless Edge functions to the client. This example demonstrates a simple "Hello World" scenario where the server streams a text response, and the client renders it character by character.

The architecture involves two distinct parts: 1. The Edge Function (Server): An API route that generates a response and writes it to a ReadableStream. 2. The Client (Browser): A component that fetches the stream and updates the UI in real-time.

The Edge Function Implementation

This TypeScript code runs on Vercel's Edge runtime. It simulates a slow AI response by yielding characters one by one.

// File: app/api/stream/route.ts
import { NextRequest } from 'next/server';

/**
 * Handles GET requests to the /api/stream endpoint.
 * This function simulates a streaming response from an LLM.
 * @param {NextRequest} request - The incoming HTTP request object.
 * @returns {Response} A streaming HTTP response.
 */
export async function GET(request: NextRequest) {
  // 1. Define the data to stream.
  const textToStream = "Hello, World! This is a streamed response from the Edge.";

  // 2. Create a ReadableStream.
  // The stream controller allows us to enqueue data chunks.
  const stream = new ReadableStream({
    async start(controller) {
      // 3. Loop through the text and enqueue each character.
      // We use a loop with a delay to simulate network latency or processing time.
      for (const char of textToStream) {
        // Enqueue the character as a Uint8Array (standard for byte streams).
        controller.enqueue(new TextEncoder().encode(char));

        // Simulate a 50ms delay between characters.
        await new Promise((resolve) => setTimeout(resolve, 50));
      }

      // 4. Close the stream when done.
      controller.close();
    },
  });

  // 5. Return the stream as the HTTP response.
  return new Response(stream, {
    headers: {
      // Standard streaming header.
      'Content-Type': 'text/plain; charset=utf-8',
      // Enable CORS if needed (common in SaaS apps).
      'Access-Control-Allow-Origin': '*',
    },
  });
}

The Client Component Implementation

This React component (using Next.js App Router conventions) fetches the stream and renders the text incrementally.

// File: app/page.tsx
'use client'; // This is a Client Component

import { useState, useEffect } from 'react';

/**
 * A simple page component that fetches and displays a streamed response.
 */
export default function StreamPage() {
  // 1. State to hold the accumulated text from the stream.
  const [streamedText, setStreamedText] = useState<string>('');

  // 2. State to track loading status.
  const [isLoading, setIsLoading] = useState<boolean>(false);

  /**
   * Fetches the stream from the Edge function and processes it.
   */
  const fetchStream = async () => {
    setIsLoading(true);
    setStreamedText(''); // Reset text

    try {
      // 3. Make the request to the API route.
      const response = await fetch('/api/stream');

      if (!response.ok) {
        throw new Error('Network response was not ok');
      }

      // 4. Get the ReadableStream from the response body.
      const reader = response.body?.getReader();
      if (!reader) return;

      // 5. Create a TextDecoder to convert Uint8Array chunks to strings.
      const decoder = new TextDecoder();

      // 6. Loop to read the stream chunks.
      while (true) {
        const { done, value } = await reader.read();

        if (done) {
          // Stream finished.
          break;
        }

        // 7. Decode the chunk and append to state.
        const chunkText = decoder.decode(value, { stream: true });
        setStreamedText((prev) => prev + chunkText);
      }
    } catch (error) {
      console.error("Error fetching stream:", error);
    } finally {
      setIsLoading(false);
    }
  };

  return (
    <div style={{ padding: '20px', fontFamily: 'sans-serif' }}>
      <h1>Vercel AI SDK: Basic Streaming Example</h1>

      <button 
        onClick={fetchStream} 
        disabled={isLoading}
        style={{ 
          padding: '10px 20px', 
          fontSize: '16px', 
          cursor: isLoading ? 'not-allowed' : 'pointer' 
        }}
      >
        {isLoading ? 'Streaming...' : 'Start Stream'}
      </button>

      <div style={{ marginTop: '20px', padding: '15px', border: '1px solid #ccc', minHeight: '100px' }}>
        <strong>Response:</strong>
        <br />
        {/* 8. Render the accumulated text */}
        {streamedText}
        {/* 9. Show a blinking cursor while loading */}
        {isLoading && <span style={{ animation: 'blink 1s infinite' }}>|</span>}
      </div>
    </div>
  );
}

Visualizing the Data Flow

The flow of data from the Edge function to the client involves reading chunks from the stream and writing them to the UI state.

A diagram illustrates the data flow from an Edge function to the client, showing how streamed text chunks are progressively accumulated and rendered in the UI while a loading indicator is displayed.

Line-by-Line Explanation

Edge Function (`app/api/stream/route.ts`)

export async function GET(request: NextRequest): Defines a standard Next.js API route handler for HTTP GET requests. It receives the NextRequest object.
const textToStream = ...: Defines the static string we intend to stream to the client. In a real app, this would be the output of an LLM call.
new ReadableStream({ async start(controller) { ... } }): Creates a new web standard ReadableStream. The start method is called immediately when the stream is constructed. The controller argument provides methods to manipulate the stream (e.g., enqueue, close).
for (const char of textToStream): Iterates over every character in the string.
controller.enqueue(new TextEncoder().encode(char)): Converts the string character into a Uint8Array (binary data) and adds it to the stream's internal queue. This is the chunk of data sent over the network.
await new Promise((resolve) => setTimeout(resolve, 50)): Pauses execution for 50 milliseconds. This simulates network latency or the time an AI model takes to generate the next token, making the streaming effect visible to the user.
controller.close(): Signals that no more data will be added to the stream. This is crucial; without it, the client will hang waiting for more data.
return new Response(stream, ...): Returns the ReadableStream directly as the HTTP response body. Next.js handles the transmission of these chunks to the client.

Client Component (`app/page.tsx`)

'use client': Informs Next.js that this component must be rendered on the client side, allowing the use of useState and browser APIs.
const [streamedText, setStreamedText] = useState(''): Initializes state to store the accumulated text received from the stream.
const fetchStream = async () => { ... }: Defines the asynchronous function triggered by the button click.
const response = await fetch('/api/stream'): Initiates the HTTP request to our Edge function. This returns a standard Response object containing a ReadableStream in the body.
const reader = response.body?.getReader(): Accesses the ReadableStream from the response and creates a ReadableStreamDefaultReader. This reader allows us to read the stream chunk by chunk.
const decoder = new TextDecoder(): Creates a utility to convert binary Uint8Array data (received from the network) into standard JavaScript strings.
while (true) { const { done, value } = await reader.read() }: An infinite loop that continues until the stream ends. reader.read() returns a promise that resolves with an object containing:
- done: A boolean indicating if the stream has finished.
- value: The chunk of data (a Uint8Array).
const chunkText = decoder.decode(value, { stream: true }): Decodes the binary chunk into a string. The { stream: true } option is important for handling multi-byte characters that might be split across chunks.
setStreamedText((prev) => prev + chunkText): Updates the React state with the new chunk. React batches these updates, but due to the await in the loop, the UI updates frequently, creating the streaming effect.
{isLoading && <span style={{...}}>|</span>}: A simple visual indicator (blinking cursor) to show the user that data is still being fetched.

Common Pitfalls

Vercel Edge Timeouts (25s Limit):
- Issue: Vercel Edge functions have a maximum execution time of 25 seconds. If your stream takes longer (e.g., generating a long report), the connection will be terminated abruptly.
- Solution: For long-running generations, consider using Vercel's ai/streamText with a persistent storage mechanism (like Redis) or break the generation into smaller, chained Edge function calls.
Async/Await Loops in Streams:
- Issue: Using await inside a for loop on the server (e.g., await fetchExternalAPI()) can block the stream. While the loop is waiting, no data is sent to the client, defeating the purpose of streaming.
- Solution: Ensure any heavy processing is non-blocking. If you must fetch data, use Promise.all or stream the external API response directly if possible.
TextDecoder Misuse (Multi-byte Characters):
- Issue: If a character (like an emoji 😊) is split across two network chunks, a naive decoder might produce garbage characters or errors.
- Solution: Always use new TextDecoder().decode(chunk, { stream: true }). The stream: true flag tells the decoder to hold onto incomplete bytes until the next chunk arrives, ensuring correct reconstruction.
Missing Content-Type Header:
- Issue: If the server doesn't set Content-Type: text/plain; charset=utf-8, the browser might buffer the entire response before processing it, or try to interpret it as JSON/XML, causing parsing errors.
- Solution: Explicitly set the content type in the Response headers on the server.
React State Batching:
- Issue: React may batch multiple setStreamedText calls into a single render update. While generally good for performance, it can make the stream appear "chunky" rather than smooth.
- Solution: For very high-frequency updates, consider using a requestAnimationFrame loop to throttle UI updates, though for typical text streaming, the default behavior is usually sufficient.

The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon

Loading knowledge check...

Code License: All code examples are released under the MIT License. Github repo.

All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.

Chapter 6: The Vercel AI SDK - Streaming from Edge to Client

Theoretical Foundations

The Architectural Analogy: The Assembly Line vs. The Finished Product Factory

The Mechanics of Streaming: Chunks, Buffers, and State

The Vercel AI SDK: A Standardized Bridge

Visualizing the Data Flow

Explicit Reference to Previous Concepts

Why This Matters: Latency, UX, and Efficiency

Basic Code Example

The Edge Function Implementation

The Client Component Implementation

Visualizing the Data Flow

Line-by-Line Explanation

Edge Function (app/api/stream/route.ts)

Client Component (app/page.tsx)

Common Pitfalls

Edge Function (`app/api/stream/route.ts`)

Client Component (`app/page.tsx`)