Chapter 7: The 'tool-call' Render Pattern

Theoretical Foundations

In the previous chapter, we explored the fundamental mechanics of streaming UI. We learned how to send a continuous flow of tokens from the server to the client, allowing the user to see the LLM's response as it is being generated, rather than waiting for a complete, static block of text. This pattern dramatically improves perceived performance and user engagement. However, this streaming model is fundamentally passive; the server sends data, and the client consumes it. The client remains a spectator.

The tool-call render pattern transforms this passive consumption into an active, collaborative execution. It introduces a mechanism where the LLM does not merely generate text but can request the execution of specific, server-side functions. The results of these functions are then streamed back into the UI, not as raw text, but as structured, interactive components.

Imagine the difference between reading a recipe and having a sous-chef. In the streaming model from the previous chapter, you are reading a recipe line-by-line as it's written. In the tool-call pattern, you are the head chef, and the LLM is your sous-chef. You tell it, "I need a list of nearby restaurants," and instead of it just describing the process, it goes to the kitchen (the server), executes the function to fetch that list, and brings the prepared ingredients (the data) back to you, ready for the next step in your cooking process.

The "Why": Bridging the Gap Between Language and Action

The primary limitation of a pure Large Language Model is its isolation. It is a brilliant text predictor, but it has no hands. It cannot query a database, call an external API, or perform a calculation. It exists in a vacuum of pure language.

The tool-call pattern solves this by providing the LLM with a hands-off interface to the real world. This is achieved through a concept known as Function Calling.

Function Calling is the standardized protocol that allows an LLM to interact with external tools. It is the bridge between the probabilistic world of language and the deterministic world of code. When an LLM determines that a user's request requires information or an action it doesn't possess, it can request the invocation of a pre-defined function. It specifies the function name and provides the necessary parameters, structured according to a schema you provide.

This is a profound shift. The LLM is no longer just a conversational partner; it becomes a planner and a coordinator. It can reason about which tool to use and when, effectively breaking down a complex user intent into a series of executable steps.

To understand this, let's use an analogy from web development: The LLM as a Client-Side Router and the Tools as an API Layer.

The Old Way (Pure Streaming): A user clicks a link, and the server sends back a fully rendered HTML page. The user is a passive recipient of a complete document.
The tool-call Way: A user makes a request (e.g., "Show me my recent orders"). The LLM (acting as the client-side router) analyzes the request and determines it needs data from an API. It constructs a request to the /api/orders endpoint (the tool call). The server executes this function, fetches the data, and sends it back (the observation). The LLM then uses this data to construct the final UI (the success message or a list of orders).

This pattern allows us to build systems that are not just conversational but genuinely functional. The UI becomes a direct reflection of the LLM's internal reasoning and execution process, making the entire application feel alive and responsive.

The Underlying Mechanism: The ReAct Framework and the Thought-Action-Observation Loop

The tool-call pattern is a practical implementation of a powerful reasoning framework called ReAct (Reasoning and Acting). The core idea of ReAct is to interleave reasoning traces and actions. The LLM doesn't just jump to a conclusion; it thinks, acts, observes, and then thinks again. This creates a feedback loop that allows it to solve complex problems iteratively.

This loop is composed of three atomic units:

Thought: The LLM's internal monologue. It analyzes the user's prompt, its available tools, and the history of the conversation. It reasons about what needs to be done next. This is often hidden from the final UI but is crucial for the model's decision-making process.
Action: The LLM decides to use a specific tool. It generates a structured output (a JSON object) that specifies the tool's name and the parameters it requires. This is the point where the LLM hands off control to your server-side code.
Observation: The server executes the function called in the Action step. The result (the data, a success message, or an error) is captured. This result is then fed back into the LLM's context as an "Observation," providing it with the new information it needs to continue its reasoning.

This cycle repeats until the LLM determines the user's request has been fulfilled.

This diagram illustrates the iterative ReAct reasoning cycle, where an LLM generates an action, the environment executes it to produce an observation, and that observation is fed back into the context to inform the next step until the user's request is satisfied.

The Blueprint: Function Calling Schema

Before the LLM can perform an Action, it must know what actions are possible. This is where the Function Calling Schema comes into play. This schema is a formal, machine-readable description of your available tools. It acts as a contract between your application and the LLM, defining the "API" that the LLM can call.

This schema is typically a JSON object or an array of objects, where each object describes a single tool. Key properties include:

name: A unique identifier for the function (e.g., get_weather, fetch_user_profile).
description: A clear, human-readable explanation of what the function does. This is critical, as the LLM uses this description to decide which tool is appropriate for a given task. A good description is like a good API documentation.
parameters: A schema (often following JSON Schema draft standards) that defines the structure of the input the function expects. It details the required properties, their data types (string, number, boolean), and optional descriptions for each parameter.

For example, a tool to fetch a user's profile might have a schema like this:

// A conceptual representation of a function schema
const userProfileTool = {
  name: "get_user_profile",
  description: "Retrieves the public profile information for a specific user.",
  parameters: {
    type: "object",
    properties: {
      username: {
        type: "string",
        description: "The unique username of the user to look up.",
      },
    },
    required: ["username"],
  },
};

When the LLM receives a prompt like "What is the profile for user 'alice123'?", it analyzes this schema, recognizes that get_user_profile is the appropriate tool, and generates the Action step: a call to this function with the parameter { "username": "alice123" }.

The Render Pattern: Making Execution Visible

This is where the tool-call render pattern truly distinguishes itself. It's not enough to simply execute the function and return the result at the end. The pattern dictates that the state of the tool's execution must be streamed and rendered in the UI in real-time.

This creates a transparent and informative user experience. The user doesn't just see a final answer; they witness the process of how that answer was derived. The UI is broken down into distinct states:

Initiation: When the LLM decides to call a tool, the UI can immediately render a placeholder. This could be a loading spinner, a skeleton loader, or a message like "Checking the database for user 'alice123'...". This tells the user that something is happening behind the scenes.
Execution: While the server-side function is running, the streaming connection keeps the UI element active. The user knows the system is working.
Completion (Success): Once the function returns its data, the streaming payload delivers this result. The loading state is replaced by the actual content. This could be a beautifully formatted user profile card, a list of items, or a success message. The key is that this content is rendered as a distinct, interactive component within the larger conversation flow.
Error Handling: If the function fails (e.g., the user 'alice123' doesn't exist), the error is also streamed back and rendered, perhaps as a red alert box, providing immediate feedback.

This pattern turns the UI into a dynamic dashboard of the LLM's actions. Each tool call is a self-contained unit of work, with its own loading and success states, all orchestrated by the LLM's reasoning process.

Chaining: The Power of Multi-Step Workflows

The true power of this pattern emerges when we chain multiple tool calls together. The Observation from the first tool call becomes the context for the LLM's next Thought, which may lead to a second Action (a call to a different tool).

Consider a complex request: "Find all open pull requests for the 'main' branch in the 'vercel/ai' repository and summarize the latest comment on each."

A single tool call is insufficient. The tool-call pattern allows for a sequence:

Action 1: The LLM calls a list_pull_requests tool with parameters { repo: 'vercel/ai', branch: 'main' }.
Observation 1: The server returns a list of PRs (e.g., PR #123, PR #124).
Thought 2: The LLM now knows the PRs exist. It needs to fetch comments for each one. It can decide to loop.
Action 2 (for PR #123): The LLM calls a get_pr_comments tool with parameters { repo: 'vercel/ai', pr_number: 123 }.
Observation 2: The server returns the comments for PR #123. The UI streams a loading state for this specific PR, then the comment summary.
Action 3 (for PR #124): The LLM calls get_pr_comments again with { repo: 'vercel/ai', pr_number: 124 }.
Observation 3: The server returns the comments for PR #124. The UI updates again.
Final Thought: The LLM synthesizes all the observations into a final summary and streams it to the user.

By chaining these tool calls, we can build sophisticated, multi-step workflows that feel like a seamless conversation. The LLM acts as an intelligent agent, navigating a series of tools to gather information and perform actions, with the UI faithfully rendering each step of its journey. This moves us beyond simple Q&A and into the realm of true AI-powered application logic.

Basic Code Example

This example demonstrates a minimal implementation of the tool-call render pattern within a Next.js application using the Vercel AI SDK. We will build a simple SaaS feature where an AI assistant can fetch the current weather for a given city by executing a server-side tool. The UI will stream the tool's execution state in real-time.

The architecture follows an Edge-First Deployment Strategy. The tool execution logic is deployed to the Edge Runtime, ensuring low-latency responses. We enforce Strict Type Discipline using TypeScript to define the tool's input schema, preventing runtime errors from malformed requests.

The Code

// app/api/chat/route.ts

import { streamText, ToolExecutionUnion } from 'ai';
import { z } from 'zod';
import { openai } from '@ai-sdk/openai';

// IMPORTANT: This tool executes on the Edge Runtime.
// It simulates a network call to a weather API.
const fetchWeatherTool = {
  description: 'Get the current weather for a given city.',
  parameters: z.object({
    city: z.string().describe('The city name (e.g., "New York", "London")'),
  }),
  // The execute function runs ONLY when the LLM decides to call this tool.
  // It is executed on the server (Edge) before the result is streamed to the client.
  execute: async ({ city }: { city: string }) => {
    // Simulate a network delay to show loading states
    await new Promise((resolve) => setTimeout(resolve, 1500));

    // Mock data based on city (simple logic for demo)
    const weatherMap: Record<string, string> = {
      'new york': 'Sunny, 22°C',
      'london': 'Rainy, 15°C',
      'tokyo': 'Cloudy, 18°C',
    };

    const weather = weatherMap[city.toLowerCase()] || 'Unknown weather conditions';

    return {
      city,
      weather,
      timestamp: new Date().toISOString(),
    };
  },
};

export async function POST(req: Request) {
  const { messages } = await req.json();

  // Initialize the text streamer
  const result = await streamText({
    model: openai('gpt-4-turbo-preview'),
    messages,
    tools: {
      getWeather: fetchWeatherTool as ToolExecutionUnion,
    },
    // This system prompt guides the LLM to use the tool when appropriate
    system: 'You are a helpful assistant. Use the getWeather tool to answer questions about the weather.',
  });

  // Stream the response back to the client
  // The SDK handles the serialization of tool calls and results automatically
  return result.toAIStreamResponse();
}

// app/page.tsx (Client Component)

'use client';

import { useChat } from 'ai/react';

export default function Chat() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
    api: '/api/chat',
  });

  return (
    <div className="flex flex-col w-full max-w-md mx-auto p-4 space-y-4">
      <div className="border rounded-lg p-4 h-64 overflow-y-auto space-y-2">
        {messages.map((message, index) => (
          <div key={index} className="p-2 rounded bg-gray-100">
            <strong>{message.role === 'user' ? 'You: ' : 'AI: '}</strong>

            {/* 
              CRITICAL RENDER LOGIC:
              We check for the presence of a 'toolInvocations' property on the message.
              This property is injected by the Vercel AI SDK when a tool is called.
            */}
            {message.toolInvocations ? (
              message.toolInvocations.map((tool, toolIndex) => (
                <div key={toolIndex} className="mt-2 p-2 bg-blue-50 text-sm text-blue-800 rounded">
                  {/* Render based on tool state */}
                  {tool.state === 'call' && (
                    <span>⚡ Executing tool: {tool.toolName} for {tool.args.city}...</span>
                  )}
                  {tool.state === 'result' && (
                    <span>
                      ✅ Result: {tool.result.city} is {tool.result.weather}
                    </span>
                  )}
                </div>
              ))
            ) : (
              // Render standard text content
              <span>{message.content}</span>
            )}
          </div>
        ))}

        {/* Loading Indicator for the stream */}
        {isLoading && (
          <div className="text-gray-500 italic">AI is thinking...</div>
        )}
      </div>

      <form onSubmit={handleSubmit} className="flex gap-2">
        <input
          type="text"
          value={input}
          onChange={handleInputChange}
          placeholder="Ask about weather in New York..."
          className="flex-1 border p-2 rounded"
        />
        <button type="submit" className="bg-black text-white px-4 py-2 rounded">
          Send
        </button>
      </form>
    </div>
  );
}

Visualizing the Data Flow

The following diagram illustrates the lifecycle of a tool call within the streaming architecture.

This diagram illustrates the complete lifecycle of a tool call, starting with the user's Send button click, progressing through the AI's processing and tool execution, and concluding with the streaming response appearing in the chat interface.

Detailed Line-by-Line Explanation

1. Server-Side Tool Definition (`app/api/chat/route.ts`)

import { streamText, ToolExecutionUnion } from 'ai';: Imports the core Vercel AI SDK functions. streamText handles the LLM interaction and streaming, while ToolExecutionUnion provides TypeScript types for our tool definitions.
import { z } from 'zod';: Imports Zod, a TypeScript-first schema validation library. This is crucial for Strict Type Discipline. It ensures the arguments passed to our tool match the expected structure at runtime.
const fetchWeatherTool = { ... }: We define a plain JavaScript object representing our tool. This keeps the logic modular.
parameters: z.object({ city: z.string() }): Defines the input schema. If the LLM tries to call this tool with a number or missing the city field, the SDK will reject the execution before it reaches our code, preventing runtime crashes.
execute: async ({ city }: { city: string }) => { ... }: This is the core server-side logic.
- It runs only when the LLM decides to invoke the tool.
- It executes in the Edge Runtime (assuming edge runtime is configured in Next.js), minimizing latency.
- We simulate a 1.5-second delay (setTimeout) to demonstrate how the UI handles loading states during network latency.
return { city, weather, ... }: The return value of this function is automatically serialized and streamed back to the client as the toolResult.

2. API Route Handler

export async function POST(req: Request): Standard Next.js App Router API endpoint.
const { messages } = await req.json();: Parses the incoming request body, which contains the conversation history sent from the client.
const result = await streamText({ ... }): This initializes the AI stream. It connects the user's messages to the OpenAI model and registers our tools.
tools: { getWeather: fetchWeatherTool as ToolExecutionUnion }: We attach the tool to the stream. The key getWeather is the name the LLM will use to reference the tool. The as ToolExecutionUnion cast ensures our plain object satisfies the SDK's strict type requirements.
return result.toAIStreamResponse(): This method converts the stream into a standard HTTP Response object that the client can consume. It handles the complex work of interleaving text tokens with tool call markers.

3. Client-Side Rendering (`app/page.tsx`)

'use client';: Marks this component for client-side rendering in Next.js App Router.
const { messages, ... } = useChat({ api: '/api/chat' });: The useChat hook manages the WebSocket connection, message state, and loading indicators. It automatically handles the protocol required by the Vercel AI SDK.
{messages.map(...)}: We iterate over the conversation history.
{message.toolInvocations ? ( ... ) : ( ... )}: This is the critical render pattern.
- The Vercel AI SDK automatically adds a toolInvocations array to message objects when a tool is called.
- This allows us to conditionally render tool-specific UI instead of just raw text.
tool.state === 'call': This state occurs immediately when the LLM requests the tool execution. We render a "Executing..." indicator here. This provides immediate feedback to the user.
tool.state === 'result': This state occurs once the server-side execute function completes. We render the actual weather data here.

Common Pitfalls

Vercel Edge Timeouts
- Issue: Vercel's Edge Runtime has a default execution timeout (often 10-30 seconds). If your execute function performs heavy computation or calls external APIs that are slow, the tool will fail with a timeout error.
- Fix: Keep tool logic lightweight. For heavy tasks, offload them to background jobs or standard serverless functions and return a "job ID" to the client, rather than blocking the stream.
Async/Await Loops in Tool Execution
- Issue: Developers often attempt to run multiple tool calls sequentially inside a single execute function (e.g., fetchWeather -> fetchForecast). This blocks the stream until all tasks are done, defeating the purpose of streaming UI.
- Fix: Use the ReAct Loop pattern. Let the LLM decide the next step. Return the result of the first tool, let the stream update the UI, and allow the LLM to reason and trigger the second tool call naturally.
Hallucinated JSON / Schema Mismatch
- Issue: Even with Zod validation, if the LLM generates a tool call with arguments that cannot be parsed (e.g., passing a string "twenty" instead of 20), the execute function might throw an error or behave unexpectedly.
- Fix: Rely on Zod's .transform() or .refine() methods within the parameters schema to coerce data types and validate logic before the execute function ever runs.
Missing 'use client' Directive
- Issue: In Next.js App Router, attempting to use useChat (which relies on React hooks and browser APIs) in a Server Component will result in a runtime error.
- Fix: Ensure any component using useChat or useState is marked with 'use client'. The API route, however, must remain a Server Component (no directive needed).

The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon

Loading knowledge check...

Code License: All code examples are released under the MIT License. Github repo.

All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.