Chapter 13: Generating Smart PDF Summaries on the Fly
Theoretical Foundations
The generation of a smart PDF summary on the fly is not merely a file creation task; it is the orchestration of a complex, multi-stage data transformation pipeline that bridges the gap between unstructured human language and structured, machine-rendered visual output. To understand this pipeline, we must first visualize it not as a monolithic block of code, but as a distributed system operating within the constraints of a serverless environment.
Imagine you are a librarian in a vast, chaotic warehouse of raw data (unstructured text). Your goal is to produce a beautifully bound, indexed book (the PDF) that captures the essence of a specific topic within that warehouse. You cannot simply photocopy a random stack of papers; you must first identify the relevant sections, synthesize them into a coherent narrative, and then format them according to specific typographic rules.
In the context of web development, this process mirrors the transition from a raw API endpoint returning a JSON blob to a fully rendered frontend component. Just as a frontend framework like React takes raw data and applies a component tree to render a UI, our serverless pipeline takes raw text, applies an LLM (Large Language Model) to summarize and structure it, and then applies a rendering engine to generate the visual PDF.
The "Why": The Limitations of Static PDF Generation
Traditional PDF generation often relies on static templates populated with fixed data. However, in modern applications—such as legal tech, financial reporting, or research aggregators—the content is dynamic and voluminous. The "why" behind this intelligent pipeline addresses three critical bottlenecks:
- Context Window Constraints: LLMs have finite context windows (e.g., 128k tokens). Feeding an entire 300-page book into a single prompt is impossible. We need a strategy to "chunk" and "digest" information, similar to how a web application uses pagination or infinite scrolling to handle large datasets without crashing the browser.
- Inference Latency: Generating text is computationally expensive. In a serverless environment, where cold starts and execution time limits (e.g., Vercel Edge Functions often have timeouts around 30-60 seconds) are critical factors, we cannot afford to wait for an LLM to generate verbose prose for every single line of a document.
- Structured Output Reliability: LLMs are probabilistic. Asking an LLM to output a raw block of text is easy; asking it to output a strictly formatted JSON object that defines a PDF's structure (headings, bullet points, page breaks) requires specific prompt engineering and validation logic.
To manage the complexity of transforming raw text into a structured summary, we utilize the ReAct Loop (Reasoning and Acting). While typically associated with agentic workflows (e.g., a bot that browses the web), we repurpose this pattern here as an internal Content Processing Engine.
In this context, the "Agent" is not a conversational bot, but a logical loop that iterates over the source text. It reasons about the content's structure and acts by extracting specific data points or generating summary sections.
Analogy: The Assembly Line vs. The Assembly Line Manager
- Traditional PDF Gen: A simple assembly line where raw material (text) is pushed through a single station (template) and comes out as a product.
- ReAct PDF Gen: An intelligent assembly line managed by a supervisor (the ReAct Loop). The supervisor looks at a chunk of material (Input Text), reasons about what it is (Is this a header? A data table?), and decides which machine (LLM Prompt) to use to process it (Action). The machine returns the processed part (Observation), and the supervisor moves to the next chunk.
We can visualize this cyclical processing flow:
The Role of the useCompletion Hook
While the ReAct loop handles the logical structuring of data, the useCompletion hook serves as the transport mechanism for generating the textual content. In a full-stack architecture, this hook is typically used on the client side to stream text from the server. However, for the PDF generation pipeline, we must understand its theoretical role in the context of Server-Side Rendering (SSR) and Streaming.
The useCompletion hook is optimized for non-conversational, single-turn generation. This is distinct from a chat completion which maintains a history of messages. For PDF generation, we treat each section of the PDF as a distinct "completion" request.
Why is this distinction vital?
When generating a PDF, we often need to generate content in parallel or stream it to the client to bypass serverless timeouts. The useCompletion hook abstracts away the complexities of the underlying fetch request and stream parsing (Server-Sent Events). It allows the application to treat the LLM output as a reactive stream of text tokens.
In our pipeline, we might not use useCompletion directly in the UI for the PDF display (since PDFs are binary blobs), but we use the underlying SDK principles (useChat, useCompletion) to drive the generation of the text content before it is passed to the PDF library.
Web Development Analogy:
Think of useCompletion as the fetch API specialized for text streams. Just as fetch allows you to request a resource and handle the response body as a stream, useCompletion allows you to request a text generation and handle the tokens as they arrive. In the PDF context, we are "fetching" structured text content to populate our document model.
The Data Transformation Layer: From Tokens to Structured JSON
The most critical theoretical challenge in generating a "smart" PDF is ensuring the LLM outputs data that a PDF library (like PDFKit or React-PDF) can consume. We cannot simply ask the LLM to "write a PDF." We must ask it to output a JSON representation of the document structure.
This is where Prompt Engineering for Structured Output comes into play. We instruct the LLM to act as a data transformer. We provide it with raw text and ask it to return a JSON object following a strict schema.
For example, instead of generating "The report shows a 20% increase...", we prompt the LLM to generate:
{
"type": "section",
"title": "Financial Performance",
"content": "The report shows a 20% increase...",
"style": "h1"
}
Under the Hood:
- Input Chunking: The raw text is split into manageable chunks (e.g., 4000 tokens). This is similar to how a database paginates results to prevent memory overflows.
- Parallel Processing: Because we are using Edge Functions, we can theoretically process multiple chunks in parallel (if the LLM provider allows concurrent requests). This reduces the total wall-clock time, mitigating the inference latency.
- Schema Validation: Before the data is accepted into the PDF generation queue, it must pass a Zod schema validation. This ensures that no malformed JSON breaks the PDF rendering engine.
The Rendering Phase: Binary Generation in a Stateless Environment
Once the data is structured (via the ReAct loop) and the text is generated (via LLM calls), we enter the rendering phase. This is where we convert the JSON structure into a binary PDF file.
In a serverless Edge Function, generating a binary file requires careful memory management. Unlike a traditional Node.js server with persistent memory, an Edge Function is ephemeral.
Analogy: The Digital Printer Imagine a printer (the Edge Function) that receives a digital layout file (the JSON structure). The printer must:
- Load the font files (assets).
- Calculate the layout (pagination, line breaks).
- Render the pixels into a PDF binary stream.
The challenge here is Inference Latency vs. Rendering Latency. If the LLM takes 10 seconds to generate the text, and the PDF rendering takes 5 seconds, the total request time is 15 seconds. This is dangerously close to the timeout limits of many serverless platforms.
To solve this, we often decouple the process:
- Trigger: The user requests a PDF.
- Async Generation: The server immediately returns a "Processing" status. The LLM generation happens in the background (or via a separate serverless function).
- Webhook/Notification: Once the PDF binary is generated and stored in object storage (e.g., S3), a webhook notifies the client.
However, for "on-the-fly" generation, we optimize by streaming the PDF generation. We do not wait for the entire text to be generated before starting the PDF rendering. We stream tokens from the LLM directly into the PDF stream buffer.
- The ReAct Loop is repurposed as a Content Structuring Agent, iterating over text chunks to reason about their semantic meaning and act by extracting structured data.
- The
useCompletionHook (and its underlying SDK) provides the mechanism for streaming text generation, allowing for reactive data fetching patterns to be applied to LLM outputs. - Structured Output (JSON Schema) acts as the bridge between the probabilistic nature of LLMs and the deterministic requirements of PDF rendering libraries.
- Edge Functions provide the compute environment, offering low latency near the user but requiring strict management of memory and execution time, necessitating efficient chunking and parallel processing strategies.
This theoretical foundation sets the stage for implementing a pipeline that is not just functional, but resilient, scalable, and capable of handling the unpredictability of generative AI within the rigid constraints of serverless architecture.
Basic Code Example
This example demonstrates a minimal, self-contained serverless function that transforms raw text into a structured PDF using an LLM for intelligent summarization. The workflow is designed for a SaaS context where a user submits a document (e.g., a research paper or report) via a web app, and the backend generates a "Smart Summary" PDF on the fly.
The Logic Flow:
- Ingestion: The API receives a block of raw text.
- Intelligent Processing (LLM): The text is sent to an LLM with a specific prompt to extract metadata and generate a concise summary. We enforce a structured JSON output to ensure the data is machine-readable.
- Document Assembly: The extracted JSON data is passed to a PDF generation library (
pdfkit). - Rendering: The PDF is constructed in memory (Buffer) and returned as a binary stream to the client with the correct MIME type.
Code Example: generate-summary-pdf.ts
This code is intended to run as a serverless function (e.g., Vercel Edge Function or Node.js API route).
import { NextResponse } from 'next/server'; // Assuming Next.js App Router for the HTTP wrapper
import PDFDocument from 'pdfkit';
import { z } from 'zod';
import { zodResponseFormat } from 'openai/helpers/zod';
// NOTE: In a real project, import OpenAI from 'openai'.
// For this standalone example, we mock the LLM response structure.
// import OpenAI from 'openai';
// ==========================================
// 1. TYPE DEFINITIONS & SCHEMAS
// ==========================================
/**
* Defines the expected structure of the LLM's response.
* Using Zod ensures type safety and runtime validation.
*/
const SummarySchema = z.object({
title: z.string().describe("The main title of the document."),
summary: z.string().describe("A concise 3-sentence summary of the text."),
keyPoints: z.array(z.string()).describe("A list of 3-5 critical bullet points."),
sentiment: z.enum(["positive", "neutral", "negative"]).describe("The overall sentiment of the text."),
});
type SummaryData = z.infer<typeof SummarySchema>;
// ==========================================
// 2. LLM INTEGRATION (MOCKED FOR SIMPLICITY)
// ==========================================
/**
* Simulates an LLM call (e.g., OpenAI GPT-4).
* In a real app, this would use `openai.chat.completions.create` with `response_format: { type: 'json_object' }`.
*
* @param text - The raw input text to analyze.
* @returns A Promise resolving to the structured JSON data.
*/
async function generateSmartSummary(text: string): Promise<SummaryData> {
// --- REAL IMPLEMENTATION WOULD LOOK LIKE THIS ---
// const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// const completion = await openai.chat.completions.create({
// model: "gpt-4-turbo-preview",
// messages: [
// { role: "system", content: "You are a precise document analyzer. Extract the following data strictly as JSON." },
// { role: "user", content: text }
// ],
// response_format: zodResponseFormat(SummarySchema, "summary"),
// });
// return JSON.parse(completion.choices[0].message.content!) as SummaryData;
// --- MOCK IMPLEMENTATION FOR THIS EXAMPLE ---
// Simulating a delay for network latency
await new Promise(resolve => setTimeout(resolve, 500));
return {
title: "Project Alpha Report",
summary: "This report details the initial findings of Project Alpha. The data indicates a 15% increase in efficiency. Further testing is required for Q3.",
keyPoints: [
"Efficiency metrics exceeded expectations.",
"User adoption rate is stabilizing.",
"Budget remains within projected limits."
],
sentiment: "positive"
};
}
// ==========================================
// 3. PDF GENERATION LOGIC
// ==========================================
/**
* Generates a PDF document from structured data.
* Uses pdfkit to create a binary buffer in memory.
*
* @param data - The structured summary data.
* @returns A Promise resolving to a Buffer containing the PDF data.
*/
async function createPdfBuffer(data: SummaryData): Promise<Buffer> {
return new Promise((resolve, reject) => {
// Create a new PDF document
const doc = new PDFDocument({ margin: 50 });
const chunks: Buffer[] = [];
// Collect stream data into chunks
doc.on('data', (chunk: Buffer) => chunks.push(chunk));
// When the stream ends, combine chunks into a single Buffer
doc.on('end', () => resolve(Buffer.concat(chunks)));
doc.on('error', reject);
// --- PDF STYLING & CONTENT ---
// Title
doc.fontSize(18).font('Helvetica-Bold').text(data.title, { align: 'center' });
doc.moveDown();
// Summary Section
doc.fontSize(12).font('Helvetica').text("Executive Summary:", { continued: false });
doc.font('Helvetica-Oblique').text(data.summary);
doc.moveDown();
// Key Points Section
doc.font('Helvetica-Bold').text("Key Takeaways:", { continued: false });
doc.font('Helvetica');
data.keyPoints.forEach((point) => {
doc.text(`• ${point}`, { indent: 15 });
});
doc.moveDown();
// Metadata Footer
doc.fontSize(10).fillColor('gray').text(`Sentiment: ${data.sentiment.toUpperCase()}`, { align: 'right' });
doc.text(`Generated on: ${new Date().toLocaleDateString()}`, { align: 'right' });
// Finalize the PDF
doc.end();
});
}
// ==========================================
// 4. API ENDPOINT (NEXT.JS APP ROUTER)
// ==========================================
/**
* POST /api/generate-pdf
*
* Request Body: { text: string }
* Response: PDF Binary Stream
*/
export async function POST(req: Request) {
try {
// 1. Parse Input
const { text } = await req.json();
if (!text || typeof text !== 'string') {
return NextResponse.json(
{ error: "Invalid input: 'text' field is required." },
{ status: 400 }
);
}
// 2. Generate Smart Summary (LLM Step)
const summaryData = await generateSmartSummary(text);
// 3. Generate PDF (Rendering Step)
const pdfBuffer = await createPdfBuffer(summaryData);
// 4. Return Binary Stream
return new NextResponse(pdfBuffer, {
headers: {
'Content-Type': 'application/pdf',
'Content-Disposition': `attachment; filename="summary-${Date.now()}.pdf"`,
'Content-Length': pdfBuffer.length.toString(),
},
});
} catch (error) {
console.error("Error generating PDF:", error);
return NextResponse.json(
{ error: "Failed to generate PDF." },
{ status: 500 }
);
}
}
Visualizing the Data Flow
The following diagram illustrates the lifecycle of the request, highlighting the transformation from text to structured data, and finally to a binary document.
Detailed Line-by-Line Explanation
-
Imports & Setup:
NextResponse: Standard wrapper for HTTP responses in Next.js.PDFKit: The library used to construct the PDF document object.zod: A schema validation library. We use it here to strictly define what the LLM should return, reducing hallucinations.
-
Type Definitions (
SummarySchema):- We define a
z.objectwith specific fields (title,summary,keyPoints,sentiment). - Why? When communicating with an LLM, vague instructions lead to inconsistent outputs. By defining a schema, we can theoretically pass this to the LLM (via tools like
zodResponseFormat) to force it to adhere to this JSON structure.
- We define a
-
generateSmartSummaryFunction:- The Mock: Since this is a "Hello World" example without live API keys, we simulate the network delay (
setTimeout) and return a hardcoded object matchingSummaryData. - The Real Logic: The commented-out section shows how you would actually integrate OpenAI. Note the
response_format: zodResponseFormat(...). This is a newer feature in GPT-4 models that forces the model to output valid JSON matching your schema, which is critical for reliable automation.
- The Mock: Since this is a "Hello World" example without live API keys, we simulate the network delay (
-
createPdfBufferFunction:- Stream Handling: PDFKit works as a Node.js stream. We cannot simply return the doc object. We must listen for
dataevents (chunks of binary data) and collect them. - The Buffer Promise: We wrap the stream logic in a
new Promise. This allows us toawaitthe PDF generation until theendevent fires, ensuring we have the complete file in memory. - Styling: We use standard PDF methods (
fontSize,font,text) to layout the content. We map thesummaryDataobject properties directly into the document structure.
- Stream Handling: PDFKit works as a Node.js stream. We cannot simply return the doc object. We must listen for
-
API Endpoint (
POST):- Input Validation: We check if
textexists. If not, we return a 400 error immediately. Never trust client input. - Orchestration: This function acts as the orchestrator. It calls the LLM function, waits for the result, then passes that result to the PDF generation function.
- Binary Response: The most critical part is the return statement. We do not return JSON. We return a
NextResponsewith the rawpdfBuffer. - Headers:
Content-Type: application/pdf: Tells the browser to render or download a PDF, not display text.Content-Disposition: Suggests a filename to the browser.Content-Length: Helps the client (browser) show an accurate progress bar during download.
- Input Validation: We check if
Common Pitfalls
-
Vercel/AWS Lambda Timeouts (The 10s Wall):
- Issue: Serverless functions often have strict timeouts (e.g., 10 seconds on Vercel Hobby plans). LLM calls and PDF generation can easily exceed this, especially for large documents.
- Fix:
- Use Edge Functions for lower latency (though they have stricter size limits).
- For heavy PDF generation, offload the task to a background job (e.g., BullMQ, Inngest) and notify the client via Webhook when the PDF is ready.
- Chunking: If the input text is massive, do not send it all to the LLM at once. Implement a
Chunking Strategy(split text into paragraphs), summarize each chunk individually, and then summarize the summaries.
-
LLM Hallucinated JSON:
- Issue: Even with strict prompts, LLMs can output invalid JSON (missing commas, trailing commas, unescaped strings), causing
JSON.parse()to crash the server. - Fix: Never
JSON.parse()raw LLM output directly in production.- Use Zod to validate the parsed object.
- Use Output Parsing Libraries (like
jsonrepairor LangChain's output parsers) to sanitize the string before parsing. - If using OpenAI, strictly use
response_format: { type: "json_object" }or the newer Zod integration.
- Issue: Even with strict prompts, LLMs can output invalid JSON (missing commas, trailing commas, unescaped strings), causing
-
Memory Leaks in Streams:
- Issue: In the
createPdfBufferfunction, if the PDF generation fails (e.g., an error in the PDFKit stream), theendevent might never fire, leaving the Promise pending forever. - Fix: Always attach an
errorevent listener to streams (doc.on('error', reject)). This ensures the Promise rejects and the API returns a 500 error instead of hanging.
- Issue: In the
-
Async/Await Loop Blocking:
- Issue: PDF generation is CPU-intensive. In a Node.js environment (standard serverless), this blocks the event loop, preventing other requests from being processed.
- Fix: While
awaithandles the I/O wait, the actual PDF rendering is synchronous. For very complex PDFs, consider using a dedicated worker thread or a specialized PDF microservice if you expect high concurrency. For "Hello World" and simple summaries, this single-threaded approach is acceptable but must be monitored.
The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon
Loading knowledge check...
Code License: All code examples are released under the MIT License. Github repo.
Content Copyright: Copyright © 2026 Edgar Milvus | Privacy & Cookie Policy. All rights reserved.
All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.