Skip to content

Chapter 15: Handling File Uploads & Attachments in Chat

Theoretical Foundations

In previous chapters, we established the foundation of streaming AI responses using the useChat hook. We treated the Large Language Model (LLM) as a highly sophisticated text processing engine: you feed it a string of text, and it streams back a string of text. This is the "Hello, World" of generative AI. However, the real world is not comprised solely of text. It is a rich tapestry of images, PDFs, code snippets, and audio. To build truly useful AI applications, we must expand our mental model from a "text terminal" to a "multi-modal workspace."

This chapter introduces the concept of Multi-Part Messages. Just as an HTTP request can contain a body, headers, and metadata, a message sent to an AI model is evolving into a container that holds various data types simultaneously. The core challenge is no longer just managing a string in a React state (useState<string>), but rather managing a complex object that might contain a text prompt and a reference to a binary file.

The Analogy: The Executive Assistant and the Briefcase

Imagine you are a busy executive (the User), and you have an incredibly powerful assistant (the AI Model) who works behind a frosted glass window. In the past, you would slide a piece of paper under the door with a written question, and slide it back to receive a written answer. This was our previous chat interface.

Now, imagine you need to ask your assistant to analyze a competitor's marketing brochure (a PDF file) and a photo of a new product prototype (an image file). You cannot simply write "analyze this photo" on a piece of paper; the assistant needs to see the photo.

File Uploads & Attachments represent the act of sliding a briefcase through the door along with your note.

  1. The Note (Text Prompt): "Please analyze the attached brochure and photo, and draft a response comparing their product to ours."
  2. The Briefcase (Attachments): This contains the physical objects (the PDF and the image).

The assistant (AI) must now: 1. Accept the Briefcase: Unlock the door and pull it in (Server-side file ingestion). 2. Inspect the Contents: Open the briefcase and understand that one item is a document and the other is an image (File type parsing). 3. Synthesize the Information: Read the text of the brochure and "look" at the photo to form a mental concept (Multi-modal processing). 4. Write the Response: Generate the draft based on both sources (Streaming the response).

If the assistant refuses to open the briefcase, or if the briefcase is locked with a key the assistant doesn't have (security restrictions), the entire interaction fails.

Under the Hood: The Data Flow of a Multi-Part Message

When we move from text-only to text-plus-files, the architecture of the useChat hook interaction changes significantly. It is no longer a simple linear flow of strings.

1. The Client-Side: From File System to Data URL

On the client side, the user initiates a file selection. This is typically handled by a hidden <input type="file" /> element. When a file is selected, the browser provides a File object. This object is a reference to data in the user's temporary memory.

To send this to our Next.js server, we cannot just "paste" the file into a JSON string. We must serialize it. There are two primary ways this happens in the modern stack:

  • Base64 Encoding: The binary data of the file is converted into an ASCII string (Base64). This string can be embedded directly into a JSON payload. It's like translating a binary photo into a long string of letters and numbers.
  • Multipart Form Data: This is the traditional web way. The request body is split into distinct parts, separated by a "boundary" string. One part contains the text, and another part contains the raw binary data of the file.

The Vercel AI SDK abstracts this complexity. When you pass a File object or a Data URL into the messages array, the SDK automatically detects the content type and formats the request correctly.

2. The Server-Side: Ingestion and Normalization

Once the request hits your Next.js server (via the API route handler), the SDK's ai function (or the specific provider adapter) intercepts the payload.

It performs Normalization. This is the process of converting the incoming file data into a format the underlying Language Model can understand. LLMs generally do not speak "HTTP" or "File Systems"; they speak "Tokens" and "Embeddings."

  • For Images: The model might need the image decoded into a pixel array or converted into a specialized token format (like OpenAI's gpt-4-vision which uses a specific sequence of tokens to represent images).
  • For Text Files (PDF, DOCX): The server must extract the raw text from the binary file structure. This is often called "parsing" or "chunking."

The Visual Data Flow

The following diagram illustrates how a file moves from the user's computer to the AI model's context window.

This diagram illustrates the sequential data flow of a user's file being parsed and chunked into manageable segments before being processed within the AI model's context window.
Hold "Ctrl" to enable pan & zoom

This diagram illustrates the sequential data flow of a user's file being parsed and chunked into manageable segments before being processed within the AI model's context window.

Security and Storage: The "Bouncer" and the "Locker"

Allowing users to upload files is one of the most dangerous features you can add to a web application. If you aren't careful, a user could upload a malicious script, a massive file that crashes your server, or illegal content.

This is why the theoretical foundation of this chapter emphasizes Validation and Storage Strategies.

1. Validation (The Bouncer)

Before a file is ever allowed to be processed by the AI or stored on your server, it must pass a strict security check. This happens in two layers: * Client-side: Immediate feedback to the user ("File too large" or "Wrong file type"). This is for User Experience (UX). * Server-side: The definitive check. You must verify the MIME type (e.g., image/png) and the file size. Never trust the client.

Analogy: Think of this as a bouncer at a nightclub. The bouncer checks the ID (file extension) and the dress code (file size) before letting anyone inside. If they pass, they get a wristband (a secure token) allowing them to proceed.

2. Storage (The Locker)

Once validated, where does the file live?

  • Temporary (In-Memory): For simple, one-off chats, you might keep the file in the server's RAM just long enough to process it and send it to the LLM. Once the request finishes, the file is gone. This is fast but volatile.
  • Ephemeral Storage (Vercel Blob): For slightly longer interactions or to handle the "streaming" nature of LLMs (where the model might need to "look" at the file multiple times during generation), you upload the file to a temporary blob store like Vercel Blob.
  • Permanent Storage (Database): If the chat is part of a persistent history (e.g., a legal document review app), the file is stored permanently, often linked to a user ID in a database.

The useChat Hook Evolution

To support this, the useChat hook has evolved conceptually. It is no longer just a manager of string arrays. It is a manager of Message Parts.

A single message object in the state might look like this conceptually:

// Conceptual representation of a multi-part message structure
interface MessagePart {
  type: 'text' | 'image' | 'file';
  data: string; // The text content or the Data URL/Path
  mimeType?: string; // e.g., 'image/png', 'application/pdf'
}

interface Message {
  id: string;
  role: 'user' | 'assistant';
  content: string | MessagePart[]; // The content can be a simple string OR an array of parts
}

When you use the useChat hook's append function, you are now capable of pushing a message that contains both a text prompt and an array of file objects. The hook handles the complexity of bundling this data and sending it to your API endpoint.

Theoretical Foundations

Handling file uploads in chat is the bridge between a simple text-based Q&A bot and a true AI assistant. It requires a shift in thinking from linear text streams to complex, multi-part data structures.

  1. Serialization: Converting binary files into transmittable data (Base64/Blobs).
  2. Normalization: Converting files into a format the LLM can ingest (Text extraction/Image tokenization).
  3. Security: Validating inputs rigorously to prevent malicious attacks.
  4. Storage: Deciding where files live during the lifecycle of the request.

By mastering these concepts, you enable your application to see and read, not just listen and write.

Basic Code Example

In a modern SaaS application, allowing users to attach files (like PDFs, images, or logs) to a chat is a standard requirement. The Vercel AI SDK simplifies this by extending the useChat hook to handle multipart messages. This example focuses on the client-side implementation: selecting a file, converting it to a format the SDK can digest, and sending it alongside text.

The core mechanism relies on the handleInputChange for text and a custom handler for files that updates the useChat hook's internal files state before the message is sent.

'use client';

import { useChat } from 'ai/react';
import { useState, ChangeEvent, FormEvent } from 'react';

/**
 * FileChatInterface Component
 * 
 * A minimal client-side component demonstrating how to attach files
 * to a message using the Vercel AI SDK's `useChat` hook.
 */
export default function FileChatInterface() {
  // 1. Initialize the useChat hook.
  // The hook manages message history, input state, and API communication.
  const { 
    messages, 
    input, 
    handleInputChange, 
    handleSubmit, 
    isLoading, 
    error 
  } = useChat({
    api: '/api/chat', // The backend route handling the AI logic
  });

  // 2. Local state to visualize the selected file before sending.
  // In a production app, you might not need this if you rely solely on the hook's internal state.
  const [selectedFile, setSelectedFile] = useState<File | null>(null);

  /**
   * Handles the file selection from the <input type="file"> element.
   * 
   * @param event - The change event from the file input.
   */
  const handleFileChange = (event: ChangeEvent<HTMLInputElement>) => {
    if (event.target.files && event.target.files.length > 0) {
      const file = event.target.files[0];
      setSelectedFile(file);

      // CRITICAL: The `useChat` hook accepts an optional `files` array 
      // in its submit handler. We don't need to manually convert to Base64 here;
      // the SDK handles the conversion or streaming preparation.
      // However, to make the hook aware, we typically pass the file object 
      // directly in the custom submit handler (see below).
    }
  };

  /**
   * Custom submit handler to combine text input and file attachments.
   * 
   * @param event - The form submission event.
   */
  const handleCustomSubmit = (event: FormEvent<HTMLFormElement>) => {
    event.preventDefault();

    // Check if we have a file to attach
    if (selectedFile) {
      // The `handleSubmit` function provided by `useChat` accepts an options object.
      // We pass the `files` property here to attach the file to the outgoing message.
      handleSubmit(event, {
        files: [selectedFile],
      });

      // Reset local UI state
      setSelectedFile(null);
    } else {
      // Standard text-only submission
      handleSubmit(event);
    }
  };

  return (
    <div className="chat-container">
      {/* Message Display Area */}
      <div className="messages-list">
        {messages.map((m) => (
          <div key={m.id} className={`message ${m.role}`}>
            <strong>{m.role === 'user' ? 'You: ' : 'AI: '}</strong>
            {/* If the message has attachments, we can display them (or a placeholder) */}
            {m.experimental_attachments && m.experimental_attachments.length > 0 && (
              <span>[Attachment: {m.experimental_attachments[0].name}] </span>
            )}
            <span>{m.content}</span>
          </div>
        ))}
      </div>

      {/* Input Area */}
      <form onSubmit={handleCustomSubmit} className="input-area">
        <input
          type="text"
          value={input}
          onChange={handleInputChange}
          placeholder="Type a message..."
          disabled={isLoading}
        />

        {/* File Input: Hidden styling, triggered by a label for better UX */}
        <label htmlFor="file-upload" className="file-label">
          {selectedFile ? `Selected: ${selectedFile.name}` : '📎 Attach File'}
        </label>
        <input
          id="file-upload"
          type="file"
          onChange={handleFileChange}
          style={{ display: 'none' }}
          disabled={isLoading}
        />

        <button type="submit" disabled={isLoading || !input.trim()}>
          {isLoading ? 'Sending...' : 'Send'}
        </button>
      </form>

      {error && <div className="error">Error: {error.message}</div>}
    </div>
  );
}

Line-by-Line Explanation

  1. 'use client';: This directive marks the file as a Client Component in the Next.js App Router. This is necessary because we are using React hooks (useState, useChat) and handling browser-specific events like file selection.
  2. import { useChat } from 'ai/react';: Imports the primary hook from the Vercel AI SDK. This hook abstracts away the complexities of managing conversation state and streaming HTTP responses.
  3. const { ... } = useChat({ api: '/api/chat' });: We destructure necessary properties from the hook.
    • messages: Array of current chat history.
    • input: The current value of the text input field (controlled component).
    • handleInputChange: A pre-built handler that updates the input state.
    • handleSubmit: The function that triggers the API call.
    • isLoading: Boolean indicating if a request is in flight.
  4. const [selectedFile, setSelectedFile] = useState<File | null>(null);: A standard React state to hold the actual File object selected by the user. This is used purely for local UI feedback (showing the filename).
  5. handleFileChange: This function is triggered when the user selects a file.
    • It accesses event.target.files[0].
    • It updates the local state selectedFile so the UI can show "Selected: filename.pdf".
    • Note: We do not manually convert the file to Base64 here. The useChat hook handles the heavy lifting of encoding and multipart formatting internally when we pass the file object in step 8.
  6. handleCustomSubmit: This overrides the default form submission behavior.
    • event.preventDefault(): Stops the browser from reloading the page.
    • The Logic Branch:
      • If selectedFile exists: We call handleSubmit(event, { files: [selectedFile] }). This is the magic line. It tells the SDK: "Send this request, and include this file as a multipart attachment alongside the text input."
      • Else: We call the standard handleSubmit(event) for text-only messages.
  7. The JSX (Return Statement):
    • Messages List: Maps over the messages array. It checks for m.experimental_attachments (the current property name for attachments in the SDK) to display that a file was sent.
    • File Input: We use a hidden <input type="file"> and a <label> for styling. The onChange triggers our handleFileChange.
    • Submit Button: Disabled based on isLoading or empty input.

The Flow Diagram

This diagram illustrates the data flow from the user's file selection to the API call.

This diagram illustrates the data flow from a user selecting a file, through validation that disables the submit button if the input is empty or a request is already in progress, to the final API call.
Hold "Ctrl" to enable pan & zoom

This diagram illustrates the data flow from a user selecting a file, through validation that disables the submit button if the input is empty or a request is already in progress, to the final API call.

Common Pitfalls

When implementing file uploads with the AI SDK, several specific JavaScript and infrastructure issues can arise:

  1. The "Async Void" Trap in handleSubmit

    • The Issue: Developers often try to wrap handleSubmit in a useEffect or an async function to "process" the file before sending.
    • Why it fails: useChat handles the asynchronous network request internally. If you manually call await on handleSubmit without returning the promise correctly, or if you try to manipulate the file stream synchronously, the browser's native file reader might lock the main thread, or the request might fire before the file is fully read.
    • Fix: Trust the hook. Pass the file object directly to handleSubmit(event, { files: [...] }). Do not try to manually convert the file to a data URL (Base64) before passing it; the SDK does this efficiently.
  2. Vercel/Next.js Timeouts (4MB Limit)

    • The Issue: Vercel Serverless Functions have a default payload limit (often 4.5MB on the Hobby plan). Sending a large PDF or high-res image results in a 413 Payload Too Large or a generic timeout.
    • Why it fails: The file is being sent as part of the request body. If it exceeds the infrastructure limit, the request is rejected before it even hits your API logic.
    • Fix: For large files, do not send them directly to the AI API endpoint. Instead:
      1. Upload the file to Vercel Blob (or AWS S3) first.
      2. Get the URL of the uploaded file.
      3. Send only the URL to the /api/chat endpoint.
      4. The server then fetches the file from the URL and streams it to the model.
  3. Runtime Validation Neglect (Zod)

    • The Issue: You trust that event.target.files[0] is safe.
    • Why it fails: A malicious user could modify the HTML or use Postman to send a file with a fake extension (e.g., virus.exe renamed to image.png). If your backend blindly accepts any file and passes it to an LLM or stores it, you have a security vulnerability.
    • Fix: Always validate file types and sizes on the server (Runtime Validation).
      // Inside your API route
      import { z } from 'zod';
      
      const fileSchema = z.object({
        name: z.string().min(1),
        size: z.number().max(5 * 1024 * 1024), // 5MB limit
        type: z.enum(['image/png', 'application/pdf']), // Whitelist types
      });
      
  4. Missing 'use client' Directive

    • The Issue: You attempt to use useChat or useState in a default Server Component (.tsx file without directives).
    • Why it fails: React Server Components run exclusively on the server. They do not have access to browser APIs like window or File, nor do they support React hooks.
    • Fix: Ensure the file containing the file input logic starts with 'use client';.

The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon


Loading knowledge check...



Code License: All code examples are released under the MIT License. Github repo.

Content Copyright: Copyright © 2026 Edgar Milvus | Privacy & Cookie Policy. All rights reserved.

All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.