Chapter 20: Capstone - Building a Usage-Based Billing System

Theoretical Foundations

At its heart, a usage-based billing system is an event-driven state machine. It transforms raw, high-volume behavioral data (events) into financial obligations (invoices) and, if necessary, corrective actions (dunning). To understand the architecture of such a system, we must first dissect the three pillars that support it: the asynchronous nature of serverless event processing, the deterministic logic of state transitions, and the probabilistic intelligence of autonomous agents.

The Event-Driven Backbone: Non-Blocking I/O and the Lambda Lifecycle

In a traditional monolithic billing application, a user's action—say, an API call—might trigger a synchronous database transaction, a calculation, and an invoice update within a single request-response cycle. This approach is brittle; if the billing service is slow, the user's primary action is delayed. It also scales poorly, as every concurrent user requires a dedicated server process.

Our capstone system rejects this synchronous model in favor of an asynchronous, event-driven architecture. This is where the concept of Non-Blocking I/O, introduced in our discussion of Node.js in Book 2, becomes the foundational rhythm of the system.

Analogy: The Restaurant Kitchen vs. The Food Truck

Imagine a traditional restaurant kitchen (a monolithic server). A chef receives an order and works on it sequentially: chop vegetables, sear the meat, plate the dish. If the meat takes 10 minutes to cook, the chef is "blocked"—they cannot take another order until the first is complete. This is inefficient for high-volume periods.

Now, imagine a modern food truck assembly line (a serverless, non-blocking system). The order comes in. The cook places the meat on the grill (an I/O operation that takes time) but immediately turns to the next order to chop vegetables. They are not waiting for the meat to cook; they are processing the next task. When the meat is done (the I/O operation completes), a bell rings (an event), and the cook finishes plating that specific dish.

In our billing system, an AWS Lambda function acts as that efficient cook. When a "Usage Recorded" event arrives (e.g., via AWS EventBridge), the Lambda function doesn't wait to process the entire billing cycle. Instead, it performs a non-blocking operation: it writes the raw usage metric to a database (like DynamoDB) or pushes it to a stream (like Kinesis). The function completes its execution almost instantly, freeing up the compute resource to handle the next incoming event. The actual billing calculation and invoice generation happen asynchronously in a separate, triggered Lambda function. This decoupling ensures that the user's core application performance is never impacted by billing logic, no matter how complex.

The State Machine: Deterministic Logic and Idempotency

While the event ingestion is asynchronous and non-blocking, the logic that processes these events must be deterministic. A usage-based billing system is fundamentally a state machine. A user's account moves through states: Active, Payment Due, Payment Failed, In Grace Period, Suspended.

Analogy: The Toll Booth Transaction

Think of a highway toll booth system. Each car (a user) has a state: Toll Paid, Toll Unpaid, or Violation. The system processes events (car passing a sensor). The logic is deterministic:

Event: Car passes sensor.
Check State: Is the account Toll Paid?
Action: If No, transition state to Toll Unpaid and generate a violation invoice. If Yes, do nothing.

In our architecture, this state machine is managed by a central orchestrator, often implemented using AWS Step Functions or a dedicated "Orchestrator" Lambda. This orchestrator consumes events from the usage stream. For example, a payment_intent.succeeded event from Stripe transitions a user's state from Payment Due back to Active. A payment_intent.payment_failed event transitions the state to Payment Failed.

Crucially, in a distributed system where events can be delivered more than once (at-least-once delivery), every state transition must be idempotent. This means processing the same event twice produces the same result as processing it once. If a payment_failed event is received twice, the system shouldn't double the dunning attempts. This is typically handled by using a unique event ID as a key in a database and checking for its existence before processing. This deterministic, idempotent logic is the bedrock of financial accuracy.

The Intelligent Layer: Multi-Agent Systems and Supervisor Nodes

The previous two pillars handle data ingestion and state management with deterministic logic. However, billing systems often encounter ambiguous scenarios that require judgment, context analysis, and adaptive decision-making—tasks where rigid, rule-based logic fails. This is the domain of AI Customer Support Agents and the Smart Dunning workflow.

We model this intelligence as a Multi-Agent System. Instead of a single, monolithic "AI Billing Brain," we create specialized agents, each with a specific role. This mirrors the microservices architecture pattern from web development, where each service owns a specific business capability (e.g., User Service, Payment Service). Here, each agent owns a specific cognitive task.

Analogy: A Hospital Emergency Room

An ER is a multi-agent system. The patients are the billing inquiries or payment failures. The agents are the medical staff:

Triage Nurse (Supervisor Node): The first point of contact. They don't treat the patient but assess the situation and delegate. They look at the patient's chart (the Graph State) and decide: "This is a broken arm, send to Orthopedics. This is a heart attack, send to Cardiology."
Specialist Doctors (Worker Agents): Each is an expert in one area. The Cardiologist doesn't try to set a broken bone. They focus solely on the heart issue.

In our system, the Supervisor Node is the central orchestrator for all AI-driven interactions. It is a specialized LLM-powered agent whose only job is to analyze the current context (the Graph State) and decide which Worker Agent to invoke next. The Graph State is a shared data structure, like a patient's chart, that contains all relevant information: user ID, current billing state, recent usage, payment history, and the incoming query.

The Supervisor uses a sophisticated prompt to make its routing decision. It doesn't generate the final answer; it generates a command to invoke a specific tool (a Worker Agent).

The Worker Agents are the specialists. For example:

DunningAgent: Specializes in recovering failed payments. It can analyze the reason for failure (e.g., card_declined vs. insufficient_funds) and choose the appropriate communication strategy and retry timing.
InvoiceInquiryAgent: Specializes in explaining complex invoice line items to a customer.
UsageAnomalyAgent: Specializes in detecting and explaining unusual spikes in usage.

To ensure the Supervisor's decisions are reliable and can be programmatically acted upon, we enforce JSON Schema Output. When the Supervisor LLM generates a decision, we don't want it to return free-form text like "I think we should try the dunning agent now." We need a structured, predictable output. We define a JSON Schema that specifies the exact format: an object with a nextAgent property (a string) and a reasoning property (a string for logging).

This allows the system to parse the LLM's response with 100% reliability and execute the next step in the workflow. It bridges the gap between the unstructured world of natural language and the structured world of software execution.

Visualization: The Integrated Workflow

The following diagram illustrates how these theoretical concepts integrate into a single, cohesive flow. It shows the non-blocking event ingestion, the deterministic state transitions, and the intelligent routing of the Supervisor Node.

This diagram illustrates the non-blocking event ingestion, deterministic state transitions, and intelligent routing of the Supervisor Node within a cohesive, integrated workflow.

Deep Dive: The Supervisor's Decision-Making Process

Let's dissect the Supervisor Node's role further, as it is the most novel part of this architecture. The Supervisor does not act on its own; it is a function of the current Graph State.

The Graph State is a JSON object that is passed between components. It is the "single source of truth" for a specific workflow instance. A simplified TypeScript interface for this state might look like this:

// Represents the shared state for a single billing workflow or inquiry
interface GraphState {
    // Core identifiers
    userId: string;
    stripeCustomerId: string;
    workflowId: string;

    // Current billing status (the state machine's current state)
    billingStatus: 'active' | 'payment_due' | 'payment_failed' | 'dunning' | 'suspended';

    // Contextual data
    recentUsage: Array<{ metric: string; value: number; timestamp: number }>;
    outstandingInvoices: Array<{ id: string; amount: number; status: string }>;
    lastPaymentError?: string;

    // The incoming trigger (e.g., a webhook from Stripe or a customer email)
    trigger: {
        type: 'stripe_event' | 'customer_query';
        payload: any; // The raw webhook payload or the customer's message
    };

    // The final decision made by the Supervisor (for audit trails)
    supervisorDecision?: {
        nextAgent: 'dunning_agent' | 'inquiry_agent' | 'anomaly_agent' | 'human_escalation';
        reasoning: string; // The LLM's justification for the decision
        timestamp: number;
    };
}

When the Orchestrator invokes the Supervisor, it passes this GraphState. The Supervisor's LLM prompt is engineered to analyze this state and produce a decision. The prompt would be something like:

"You are the Supervisor Node for a billing automation system. Analyze the provided Graph State. The user's status is payment_failed. The last payment error was card_declined. Based on this, determine the most appropriate next step. Choose from: dunning_agent (for automated recovery attempts), inquiry_agent (if the user has sent a message), or human_escalation (for complex, non-standard issues). Output your decision as a JSON object matching the provided schema."

The JSON Schema Output is the critical component that makes this reliable. Instead of parsing a paragraph of text, the system expects a clean object:

// The JSON Schema we enforce on the LLM's output
const supervisorDecisionSchema = {
    type: "object",
    properties: {
        nextAgent: {
            type: "string",
            enum: ["dunning_agent", "inquiry_agent", "anomaly_agent", "human_escalation"]
        },
        reasoning: {
            type: "string",
            description: "A brief explanation for the chosen agent."
        }
    },
    required: ["nextAgent", "reasoning"]
};

When the LLM responds with a valid JSON object adhering to this schema, the Orchestrator can confidently parse it and invoke the corresponding Worker Agent. This structured handoff is what allows the system to scale its intelligence without becoming an unpredictable "black box." The Supervisor is not just routing traffic; it is applying a layer of context-aware judgment to a deterministic workflow, creating a truly adaptive monetization engine.

Basic Code Example

In a serverless usage-based billing system, a Supervisor Node acts as the central nervous system. It receives raw usage events (e.g., API calls, data processed) and orchestrates the flow: validating the event, checking the customer's billing status, and delegating the actual metering to a dedicated worker agent. This separation of concerns ensures that the core billing logic remains robust and scalable, even when downstream services (like Stripe or a database) experience latency.

The following TypeScript example demonstrates a minimal, self-contained Supervisor Node using the LangGraph library (a popular framework for building agent workflows). It simulates a workflow where a usage event triggers a check against a mock billing service before being logged.

// supervisor-node-billing.ts
import { StateGraph, Annotation, END, START } from "@langchain/langgraph";
import { z } from "zod";

// 1. Define the State Schema
// We use Zod for runtime type validation, ensuring the graph state is always predictable.
const BillingState = z.object({
  customerId: z.string(),
  usageMetric: z.number().min(0),
  billingStatus: z.enum(["active", "past_due", "inactive"]).optional(),
  finalBill: z.number().optional(),
});

type BillingState = z.infer<typeof BillingState>;

// 2. Define the Supervisor's Tools (Worker Agents)
// In a real system, these would be API calls to Stripe or a database.
// Here, we mock them as async functions.

/**

 * Checks the customer's billing status in the database.
 * @param state - The current graph state containing the customerId.
 * @returns - A partial state update with the billing status.
 */
async function checkBillingStatus(state: BillingState): Promise<Partial<BillingState>> {
  console.log(`[Supervisor] Checking status for customer: ${state.customerId}`);

  // Simulate database latency
  await new Promise(resolve => setTimeout(resolve, 100));

  // Mock logic: 'user_123' is active, others are past_due
  const status = state.customerId === "user_123" ? "active" : "past_due";

  return { billingStatus: status };
}

/**

 * Calculates the cost based on usage and logs the meter.
 * This worker only runs if the billing status is 'active'.
 * @param state - The current graph state.
 * @returns - The final calculated bill.
 */
async function calculateAndLogUsage(state: BillingState): Promise<Partial<BillingState>> {
  console.log(`[Supervisor] Calculating usage for metric: ${state.usageMetric}`);

  // Simple pricing model: $0.01 per unit
  const cost = state.usageMetric * 0.01;

  return { finalBill: cost };
}

/**

 * Handles failed payments or inactive accounts.
 * In a real system, this might trigger a Smart Dunning workflow.
 * @param state - The current graph state.
 * @returns - An error state.
 */
async function handleFailedPayment(state: BillingState): Promise<Partial<BillingState>> {
  console.error(`[Supervisor] Payment failed or inactive for customer: ${state.customerId}`);

  // In a real app, we might throw an error or trigger a retry logic here.
  throw new Error(`Billing check failed for ${state.customerId}. Status: ${state.billingStatus}`);
}

// 3. Define the Routing Logic (The "Brain" of the Supervisor)
// This function analyzes the state and decides which worker to invoke next.
// It uses a simple conditional logic, but in production, this is often an LLM call.
function router(state: BillingState): string {
  if (!state.billingStatus) {
    return "check_status";
  }

  if (state.billingStatus === "active") {
    return "calculate_usage";
  }

  return "handle_failure";
}

// 4. Build the Graph
// We construct the workflow using LangGraph's StateGraph API.
const workflow = new StateGraph(BillingState)
  // Define nodes (agents/workers)
  .addNode("check_status", checkBillingStatus)
  .addNode("calculate_usage", calculateAndLogUsage)
  .addNode("handle_failure", handleFailedPayment)

  // Define the entry point
  .addEdge(START, "check_status")

  // Define conditional edges (The Supervisor's Routing Logic)
  .addConditionalEdges("check_status", router, {
    "calculate_usage": "calculate_usage",
    "handle_failure": "handle_failure",
    "check_status": "check_status", // Fallback, though unlikely in this logic
  })

  // End the graph after calculation or failure
  .addEdge("calculate_usage", END)
  .addEdge("handle_failure", END);

// 5. Compile the Graph
// This creates the executable agent.
const app = workflow.compile();

// 6. Execution Function
// This simulates an API endpoint (e.g., AWS Lambda) receiving a usage event.
async function main() {
  console.log("--- Starting Usage-Based Billing System ---");

  // Simulated incoming event from a serverless trigger
  const initialInput: Partial<BillingState> = {
    customerId: "user_123", // Try changing this to "user_999" to see failure logic
    usageMetric: 1000, // 1000 API calls
  };

  try {
    // Execute the graph
    const result = await app.invoke(initialInput);

    console.log("\n--- Final Result ---");
    console.log(JSON.stringify(result, null, 2));
  } catch (error) {
    console.error("\n--- Workflow Error ---");
    console.error(error);
  }
}

// Run the example
main();

Line-by-Line Explanation

Imports & Zod Schema:
- We import StateGraph and Annotation from @langchain/langgraph. This library provides the framework for building stateful, cyclic agent workflows.
- We define a BillingState using Zod. This is critical for production code. It ensures that the data flowing through the Supervisor Node adheres to a strict schema. If a worker tries to inject malformed data, the graph will throw an error immediately, preventing "hallucinated" data corruption.
Worker Agent Functions:
- checkBillingStatus: This represents the first logical step. It queries a data source (mocked here) to determine if the user can proceed. It returns a "partial state update" (just the billingStatus), which LangGraph merges into the central state.
- calculateAndLogUsage: This is the "metering" worker. It performs the actual business logic (pricing). It is only intended to run if the user is active.
- handleFailedPayment: This represents the "Smart Dunning" or error-handling path. In a real scenario, this would trigger email notifications or Stripe's retry logic.
The Supervisor Router:
- The router function is the core of the Supervisor Node. It receives the current state and returns a string indicating which node to visit next.
- This logic is deterministic in this example, but a more advanced Supervisor might use an LLM call here to analyze complex, unstructured state data.
Graph Construction:
- new StateGraph(BillingState): Initializes the graph with our strict schema.
- .addNode(...): Registers the worker functions as executable nodes.
- .addEdge(START, "check_status"): Defines the entry point. Every execution begins here.
- .addConditionalEdges(...): This is the Supervisor's decision-making power. It tells the graph: "After check_status finishes, look at the state and run the router function to decide where to go next."
- .addEdge(..., END): Defines the terminal nodes. Once a calculation is done or a failure is handled, the workflow terminates.
Execution:
- app.invoke(initialInput): This kicks off the process. LangGraph manages the state updates and ensures the correct sequence of nodes is executed based on the conditional edges.

Visualizing the Workflow

The Supervisor Node orchestrates the flow from ingestion to collection. The graph below illustrates the logic paths.

The diagram depicts the Supervisor Node orchestrating data through a sequential workflow, illustrating the distinct logic paths from initial ingestion to final collection.

Common Pitfalls in TypeScript Agent Workflows

When building Supervisor Nodes and multi-agent systems in TypeScript/Node.js, specific issues often arise due to the asynchronous and stateful nature of the logic.

State Mutation & Reference Loss:
- The Issue: In JavaScript, objects are passed by reference. If a worker agent mutates the state object directly (e.g., state.finalBill = 100), it can cause race conditions or unexpected side effects in concurrent executions.
- The Fix: Always treat the state as immutable. Return new partial objects from your worker functions (as shown in the example). LangGraph handles the merging safely.
Async/Await Loops and Vercel/Serverless Timeouts:
- The Issue: Serverless functions (AWS Lambda, Vercel) have strict execution time limits (e.g., 10 seconds on Vercel Hobby plan). If your Supervisor Node calls a worker that awaits a slow external API (like a complex Stripe query or a legacy database), the entire workflow can timeout before completion.
- The Fix:
  - Implement retries with exponential backoff for external calls.
  - For long-running workflows (e.g., Smart Dunning over days), do not keep the Lambda alive. Instead, use an event-driven architecture: The Supervisor triggers a Step Function or a background job queue (like AWS SQS) and returns an immediate 200 OK. The next step is triggered by a webhook or a scheduled event.
Hallucinated JSON in Tool Calling:
- The Issue: If you use an LLM as the Supervisor's router (rather than the deterministic function in the example), the model might return a string that looks like JSON but isn't parsable, or it might invent a tool name that doesn't exist.
- The Fix: Use strict Zod schemas (as demonstrated) to validate the output of every LLM call before passing it to the execution engine. Additionally, use the model's native "tool calling" feature (function calling) rather than asking it to return raw text. This forces the model to adhere to a predefined JSON schema for its arguments.
ESM vs. CommonJS Conflicts:
- The Issue: The example uses ECMAScript Modules (import/export). If your package.json lacks "type": "module", or if you mix require and import, Node.js will throw ERR_REQUIRE_ESM errors.
- The Fix: Standardize your project on ESM. Set "type": "module" in package.json and ensure your tsconfig.json has "module": "NodeNext" or "ESNext". Avoid mixing require and import statements.

The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon

Loading knowledge check...

Code License: All code examples are released under the MIT License. Github repo.

All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.