Chapter 19: Observability - Logging with Axiom

Theoretical Foundations

In the previous chapter, we built the conversational memory of our application. We established how UIState and AIState work in concert to create a dynamic, streamable interface. Think of the AIState as the AI's internal monologue—a comprehensive log of its reasoning, tool calls, and the context it's gathering. The UIState, conversely, is the final performance on stage, the React components we choose to render based on that internal monologue. When we are debugging a conversation, we are essentially asking: "Why did the AI decide to render this specific component?" To answer that, we need to look not just at the final performance, but at the entire script, the director's notes, and the stagehands' movements that led to it.

This is where Observability enters the picture. It is the practice of understanding the internal state of a complex system by examining its external outputs. In our context, this means capturing a continuous, high-fidelity stream of events from our application as it runs. Without it, deploying an AI application is like launching a rocket with the cockpit windows painted black; you might be moving, but you have no idea what's happening inside until it's too late.

The Modern Dilemma: Why Traditional Logging Fails

For decades, developers have relied on console.log. It's the "Hello, World!" of debugging—simple, immediate, and universally understood. But in a modern, distributed SaaS environment, especially one powered by AI, console.log is like trying to understand a city's traffic patterns by asking a single driver where they are at one specific moment. It provides a point-in-time snapshot, but it lacks context, correlation, and persistence.

Consider the lifecycle of a single AI interaction:

A user sends a message via a client-side component.
The request hits our Next.js API route.
The API route authenticates the user (potentially fetching a user profile from a database).
It might call an external vector database to retrieve relevant context.
It then streams tokens back from a Large Language Model (LLM).
Finally, it updates the database with the new message.

A failure could happen at any of these steps. If we only log console.log('Got a new message') at the start, we have no visibility into whether the database query timed out, the LLM returned a malformed response, or the user's subscription had expired. We need to correlate all these disparate events into a single, cohesive story.

Structured Logging: From Unreadable Text to Actionable Data

This brings us to the core of our solution: Structured Logging. Traditional logging is unstructured; it's just a string of text. A log line like Error: User 123 failed to load chat 456 at 14:30 is human-readable, but it's a nightmare for a machine to parse and analyze at scale.

Structured logging treats each log entry not as a sentence, but as a data object. Instead of concatenating a string, we emit a JSON object with key-value pairs.

Analogy: The Shipping Manifest vs. The Handwritten Note

Imagine you run a global shipping company.

Unstructured Logging is like a driver scribbling "Left box at blue house on Main St" on a sticky note. It's useful for that one driver, but you can't ask questions like "How many boxes did we deliver to Main St. yesterday?" or "Show me all deliveries to blue houses." You'd have to manually read every single sticky note.
Structured Logging is the shipping manifest. Every package has a tracking number, a destination address (broken into street, city, zip), a weight, and a timestamp. Now, you can instantly query: "Give me the total weight of all packages delivered to New York last week." You have turned a chaotic pile of notes into a queryable database of events.

In our application, a structured log for an API request wouldn't be API request received. It would be an object like this:

// Conceptual representation of a structured log object
{
  "level": "info",
  "message": "Incoming API request",
  "timestamp": "2023-10-27T10:00:00Z",
  "traceId": "abc-123-def", // A unique ID to link all events from this request
  "data": {
    "method": "POST",
    "path": "/api/chat",
    "userId": "user_987",
    "sessionId": "sess_456"
  }
}

This structure is what allows us to build powerful observability. We can filter by userId, aggregate by path, or trace the entire lifecycle of a request using the traceId. This is the fundamental principle we will be implementing.

The Logger as a Microservice: Introducing Winston and Axiom

To handle this firehose of data, we need a robust pipeline. In our boilerplate, we use a combination of two powerful tools: Winston and Axiom.

Winston is our application's internal "log manager." It's a library that runs inside our Node.js process. Its job is to accept log messages from our code, format them (e.g., into JSON), and decide where to send them (to the console, to a file, or to an external service). It acts as a buffer and a router, ensuring that logging never blocks our application's primary functions.

Axiom is the "central command center." It's a cloud-based, high-performance log aggregation and observability platform. It's the destination for all our Winston logs. It ingests, indexes, and stores our structured logs, providing a powerful UI to search, visualize, and analyze them.

Analogy: The Restaurant Kitchen

Think of your Next.js application as a busy restaurant kitchen.

Your Code is the line cooks. When they finish a dish (an event, like a database query), they need to report it.
Winston is the Head Chef's expo station. The cooks don't just shout randomly. They place the ticket at the expo station, where the chef organizes it, checks it, and decides what to do with the information.
Axiom is the restaurant's owner, sitting in a remote office, looking at a real-time dashboard of every ticket from every restaurant in their chain. The owner doesn't need to be in the kitchen, but they can see that the "Pasta" station is backed up, or that "Table 5" has been waiting too long for their dessert.

Winston handles the local, high-frequency, low-latency logging within the kitchen. Axiom provides the global, persistent, high-power analysis for the entire chain.

The Flow of Information: From Code to Insight

Let's visualize how a single log event, generated by a user's interaction, flows through our system. This pipeline ensures that no event is lost and every event is enriched with context.

This diagram illustrates the sequential journey of a user-triggered log event as it travels from initial generation, through a series of enrichment and processing stages, to its final storage and analysis, ensuring data integrity and contextual completeness.

This diagram illustrates the decoupled nature of modern logging. The application's primary job is to serve the user, not to write logs. Winston acts as a non-blocking, asynchronous buffer. It collects logs and, in the background, sends them over the internet to Axiom's ingestion API. This means even if Axiom is temporarily slow or unavailable, our application doesn't crash or hang. The logs are queued and sent when possible.

The Power of Correlation: The `traceId`

The single most critical piece of context in a distributed system is the correlation ID. In our boilerplate, for every incoming HTTP request, we will generate a unique identifier (often called a traceId or requestId). This ID is the thread that ties every single log entry related to that one specific user request together.

When a user sends a message, we generate a traceId. We pass this ID to our logger. Every single log entry—from the initial API request, to the database call to fetch the user's profile, to the final token stream from the LLM—will include this exact same traceId.

Analogy: The Patient's Wristband

Imagine a patient in a large hospital.

The patient is a single API request.
The wristband is the traceId.
The doctor's notes, lab results, and pharmacy records are the individual log entries.

Without the wristband, a note saying "Administered 50mg" is useless. Whose chart does it belong to? But with the wristband, that note is permanently and unambiguously linked to a specific patient. If something goes wrong, an investigator can pull the patient's entire file and see the complete, chronological story of their care: who saw them, what tests were run, what medications were given, and what the outcomes were.

In Axiom, we will be able to take a single traceId and instantly view the entire, ordered sequence of events for that one user's interaction, making debugging complex, multi-step AI workflows not just possible, but simple. This is the foundational principle of observability that we will be building in the following sections.

Basic Code Example

In a SaaS application, observability is not optional; it is the backbone of reliability. When an AI model hallucinates or a payment webhook fails, you need immediate, structured context. The following example demonstrates a "Hello World" level integration of Axiom with Winston within a Next.js API route. This setup captures incoming request metadata, processes it, and streams it to a centralized logging platform.

This code is self-contained. It mocks the database fetch (as defined in "Data Fetching in SCs") to focus purely on the logging pipeline.

// File: app/api/chat/init/route.ts
import { NextRequest, NextResponse } from 'next/server';
import winston from 'winston';
import { AxiomTransport } from '@axiomhq/winston';

/**

 * 1. CONFIGURATION & TYPE SAFETY
 * We define a strict interface for our log payload. This prevents 
 * "hallucinated JSON" where log properties might be undefined or misspelled.
 */
interface ChatInitLog {
  userId: string;
  timestamp: string;
  action: 'chat_init' | 'context_fetch';
  metadata: {
    ip: string | null;
    userAgent: string | null;
    vectorSearchDuration: number; // Simulated pgvector latency
  };
  level: 'info' | 'error' | 'warn';
}

/**

 * 2. WINSTON LOGGER SETUP
 * We initialize Winston with a custom format. In production, we add 
 * the AxiomTransport. In development, we fall back to console for readability.
 * 
 * CRITICAL: Environment variables must be set in your Vercel/Env file:
 * - AXIOM_TOKEN: Your API token
 * - AXIOM_DATASET: The dataset name (e.g., 'production-logs')
 */
const logger = winston.createLogger({
  level: 'info',
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.json() // Structured logging is essential for querying
  ),
  transports: [
    // Only add Axiom in production to avoid noise during local dev
    process.env.NODE_ENV === 'production' && process.env.AXIOM_TOKEN
      ? new AxiomTransport({
          dataset: process.env.AXIOM_DATASET || 'boilerplate-logs',
          token: process.env.AXIOM_TOKEN,
        })
      : new winston.transports.Console({
          format: winston.format.simple(),
        }),
  ],
});

/**

 * 3. DATA FETCHING MOCK (SCs Pattern)
 * Simulates fetching initial context from a Supabase/pgvector database.
 * In a real app, this would be an async call to your DB.
 */
async function fetchContextMock(userId: string): Promise<{ context: string; latency: number }> {
  // Simulate network latency and vector search
  await new Promise((resolve) => setTimeout(resolve, 150));
  return { context: 'User prefers dark mode.', latency: 150 };
}

/**

 * 4. API HANDLER
 * The main entry point for the request.
 */
export async function POST(req: NextRequest) {
  const startTime = Date.now();

  try {
    // Parse incoming payload
    const body = await req.json();
    const { userId } = body;

    // Type Guard: Ensure userId is a string
    if (typeof userId !== 'string') {
      const errorLog: ChatInitLog = {
        userId: 'unknown',
        timestamp: new Date().toISOString(),
        action: 'chat_init',
        level: 'error',
        metadata: {
          ip: req.ip || null,
          userAgent: req.headers.get('user-agent'),
          vectorSearchDuration: 0,
        },
      };

      logger.error('Invalid User ID provided', errorLog);
      return NextResponse.json({ error: 'Invalid User ID' }, { status: 400 });
    }

    // 5. LOGGING THE INCOMING REQUEST
    const requestLog: ChatInitLog = {
      userId: userId,
      timestamp: new Date().toISOString(),
      action: 'chat_init',
      level: 'info',
      metadata: {
        ip: req.ip || null,
        userAgent: req.headers.get('user-agent'),
        vectorSearchDuration: 0, // Placeholder
      },
    };

    // Stream log to Axiom immediately
    logger.info('Incoming chat initialization', requestLog);

    // 6. EXECUTE BUSINESS LOGIC (Data Fetching)
    const { context, latency } = await fetchContextMock(userId);
    const totalTime = Date.now() - startTime;

    // 7. LOGGING PERFORMANCE METRICS
    const perfLog: ChatInitLog = {
      userId: userId,
      timestamp: new Date().toISOString(),
      action: 'context_fetch',
      level: 'info',
      metadata: {
        ip: req.ip || null,
        userAgent: req.headers.get('user-agent'),
        vectorSearchDuration: latency,
      },
    };

    logger.info('Context fetched successfully', perfLog);

    // Return response
    return NextResponse.json({ 
      status: 'success', 
      context,
      duration: totalTime 
    });

  } catch (error) {
    // 8. ERROR LOGGING
    const errorLog: ChatInitLog = {
      userId: 'unknown', // Safe fallback
      timestamp: new Date().toISOString(),
      action: 'chat_init',
      level: 'error',
      metadata: {
        ip: req.ip || null,
        userAgent: req.headers.get('user-agent'),
        vectorSearchDuration: 0,
      },
    };

    // Winston handles Error objects specially, but we attach our structured data
    logger.error('Critical failure in chat init', { ...errorLog, error });

    return NextResponse.json({ error: 'Internal Server Error' }, { status: 500 });
  }
}

Visualizing the Logging Pipeline

The following diagram illustrates the flow of data from the Next.js Server Component to the Axiom dataset.

This diagram visualizes the logging pipeline, showing the flow of data from a Next.js Server Component that captures errors and sends them to the Axiom dataset.

Line-by-Line Explanation

Type Definitions (ChatInitLog):
- We define a TypeScript interface ChatInitLog. This acts as a Type Guard for our logging data. By strictly typing the payload, we ensure that when we query Axiom later, fields like vectorSearchDuration are consistently numeric, preventing parsing errors in dashboards.
Logger Initialization:
- winston.createLogger: We initialize the logger.
- format.json(): This is crucial. It serializes logs into JSON format (e.g., {"message": "...", "level": "info", "timestamp": "..."}). Axiom is optimized for querying JSON structures.
- Transport Selection: We check process.env.NODE_ENV. In production, we instantiate AxiomTransport. This transport buffers logs in memory and flushes them to the Axiom API asynchronously to avoid blocking the main thread (non-blocking I/O). In development, we use Console for immediate feedback.
The API Handler (POST):
- NextRequest: We use the standard Next.js request object. We access req.ip and req.headers to capture request metadata, which is vital for debugging distributed systems.
Type Guarding Input:
- if (typeof userId !== 'string'): Before processing, we validate the input. If the input is invalid, we log an error immediately. This prevents "garbage in, garbage out" scenarios where the database query might fail or return unexpected results.
Logging the Request:
- We construct the requestLog object. Note that we include the action field. This allows us to filter logs in Axiom specifically for action: 'chat_init'.
- logger.info(...): This call sends the data to the configured transport.
Data Fetching (Mock):
- We simulate a database call. In a real SaaS context, this is where you would fetch user profiles or perform a pgvector similarity search. We capture the latency manually to log performance metrics.
Logging Performance:
- We create a second log entry perfLog. By logging discrete events (Request vs. Context Fetch), we can visualize the breakdown of latency in Axiom's query language.
Error Handling:
- The catch block captures any runtime exceptions. We log the error with level: 'error'. Winston automatically captures the stack trace if we pass the error object. This ensures that even if the request crashes, the observability pipeline remains intact.

Common Pitfalls

When implementing logging in a high-concurrency environment like Vercel, specific issues often arise:

Vercel Serverless Timeouts & Async Logging:
- The Issue: Vercel functions have a strict timeout (e.g., 10s). If your logging transport (Axiom) hangs or is slow to respond, and you await the log call inside the API route, the serverless function might time out before returning a response to the user.
- The Fix: Do not await logger calls. Winston transports are designed to be "fire-and-forget." They buffer logs and flush them in the background. Ensure your code does not look like await logger.info(...). It should simply be logger.info(...).
Hallucinated JSON Structures:
- The Issue: Developers often log objects dynamically without strict typing (e.g., logger.info('Event', { ...anyData })). Over time, as code evolves, the structure of these logs changes. A field that was once a string becomes an object, breaking your Axiom queries and dashboards.
- The Fix: Always use the TypeScript interfaces defined in the code example. If you need to log dynamic data, serialize it to a string explicitly or nest it under a consistent property key (e.g., payload: { ... }).
Over-Logging Sensitive Data:
- The Issue: In a panic to debug, developers often log the entire req object or database response. This often includes PII (Personally Identifiable Information) or API keys, which then get stored in Axiom.
- The Fix: Be explicit in your log objects. Only log the specific fields you defined in your interface. Never log req.headers.authorization or raw database rows.
Environment Variable Mismatches:
- The Issue: The code checks for process.env.AXIOM_TOKEN. If this is missing in the Vercel dashboard but present locally, the logger will fail silently or throw an initialization error.
- The Fix: Always validate environment variables at startup or use a schema validation library (like Zod) to ensure required variables are present before the server starts.

The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon

Loading knowledge check...

Code License: All code examples are released under the MIT License. Github repo.

All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.