Chapter 12: Structs vs. Classes - A Performance Deep Dive

Theoretical Foundations

In high-performance AI systems, particularly those handling massive token streams for Large Language Models (LLMs), the choice between C# structs and classes is not merely a stylistic preference; it is a fundamental architectural decision that dictates memory layout, garbage collection (GC) pressure, and CPU cache efficiency. To understand this deeply, we must move beyond simple "value vs. reference" definitions and analyze how data lives and moves within the .NET runtime and the underlying hardware.

The Memory Hierarchy: The Warehouse vs. The Backpack

To visualize the distinction, imagine a logistics network.

Classes (Reference Types): These are like items stored in a massive, centralized warehouse (the Managed Heap). When you create an object, you are given a receipt (the reference or pointer) with the warehouse's address. To use the item, you must travel to the warehouse, locate the item using the receipt, and retrieve it. If you need to process thousands of items, you are constantly traveling back and forth. Furthermore, the warehouse manager (the Garbage Collector) periodically stops all traffic to reorganize shelves (compaction), which causes delays (GC pauses).
Structs (Value Types): These are like items kept directly in your backpack (the Stack or embedded inline within other objects). When you move, the items move with you. There is no travel time to retrieve them, and there is no warehouse manager coming to clean up your backpack—it’s automatically cleared when you leave the area (scope exit) or the containing object is destroyed.

In the context of AI token processing, where we might handle arrays of millions of tokens (integers), embeddings (float vectors), or attention scores, the "warehouse" approach often incurs prohibitive overhead.

Theoretical Foundations

1. Allocation and the Heap vs. The Stack

In C#, every data structure requires memory allocation. The mechanism differs significantly between classes and structs.

Classes (Heap Allocation): When you instantiate a class, the runtime performs a specific sequence of operations:

Size Calculation: The runtime calculates the total size required for the object header (containing type information and sync block index) plus the instance fields.
Heap Check: It checks the Managed Heap for a contiguous block of free memory large enough to fit the object.
Pointer Update: The heap pointer is advanced, and the memory is zeroed out.
Constructor Execution: The constructor logic runs.

This process is non-trivial. In a high-throughput AI pipeline, if you are creating a new Token object for every token in a 100,000-token context window, you are triggering 100,000 individual heap allocations. This creates massive GC Pressure. The Garbage Collector must eventually identify these objects as "garbage," mark them, sweep them, and potentially compact the heap, which involves moving objects and updating every reference pointing to them.

Structs (Stack or Inline Allocation): Structs are value types. Their allocation strategy is far more efficient:

Stack Allocation: When a struct is declared as a local variable inside a method, it is allocated directly on the Call Stack. The stack pointer simply moves down to reserve the exact number of bytes needed. No heap lookup is required.
Inline Allocation: When a struct is a field within a class, it is not stored as a separate object on the heap. It is embedded directly into the class's memory layout. If you have a class containing a struct array, the struct data resides contiguously within the class's heap block.

The AI Implication: Consider a Tokenizer component. If it returns a List<Token> where Token is a class, every token is a separate heap object. Iterating over this list involves chasing pointers across the heap, likely causing CPU Cache Misses (explained below). If Token is a struct, the entire array of tokens is a contiguous block of memory. The CPU can load a cache line containing dozens of tokens at once.

2. Garbage Collection and GC Pressure

The .NET Garbage Collector is a generational collector. Objects are allocated in Generation 0. If they survive a collection, they move to Gen 1, then Gen 2.

The Problem with Classes: Short-lived objects (like temporary tokens during text generation) flood Gen 0. When Gen 0 fills up, the GC must pause execution to collect. In real-time AI inference (e.g., chatbots), these pauses introduce latency spikes (jitter), making the application feel unresponsive.
The Struct Solution: Because structs on the stack are reclaimed automatically when the method returns (by simply moving the stack pointer), they generate zero GC pressure. Even structs on the heap (embedded in classes) are collected only when the parent class is collected, significantly reducing the frequency of collections.

Analogy: Imagine a chef (the CPU) cooking a complex dish (AI inference).

Classes: The chef has to walk to a pantry (Heap) for every single ingredient (Token). If the pantry gets crowded (GC Pressure), the chef has to stop and clean the pantry before getting the next ingredient.
Structs: The ingredients are laid out on the countertop (Stack/Inline) in the exact order needed. The chef grabs them sequentially without moving.

CPU Cache Locality: The Speed of Light

Modern CPUs are orders of magnitude faster than RAM. To bridge this gap, CPUs use Caches (L1, L2, L3). When the CPU requests data, it fetches a "Cache Line" (typically 64 bytes). If the next piece of data is right next to the current one (spatial locality), it is already in the cache and takes ~1 nanosecond to access. If it's elsewhere in RAM, it takes ~100 nanoseconds.

The Class Problem: Classes are allocated randomly on the heap. An array of class references looks like this in memory: [Ref1, Ref2, Ref3, ...] These references point to objects scattered all over the heap. To process Ref2, the CPU must:

Read the reference.
Jump to the random heap address.
Load the object data.
Repeat.

This is Pointer Chasing. It destroys cache locality and forces the CPU to wait for RAM constantly.

The Struct Advantage: Structs are stored contiguously. An array of structs looks like this: [StructData1, StructData2, StructData3, ...] When the CPU loads StructData1, it likely loads StructData2, StructData3, etc., into the cache line automatically. The CPU can process the entire array at full speed without waiting for RAM.

SIMD (Single Instruction, Multiple Data) Optimization

This is where structs become critical for AI. SIMD allows the CPU to perform the same operation on multiple data points simultaneously (e.g., adding 8 floats in one instruction cycle).

Why Classes Fail at SIMD: To use SIMD, data must be:

Contiguous: No gaps between elements.
Aligned: Starting at specific memory boundaries (e.g., 16-byte or 32-byte alignment).
Packed: No object headers or references mixed in.

Classes violate all three. An array of classes is an array of references. Even if the objects are adjacent on the heap (rare), they contain an object header (8-16 bytes) before the actual data. You cannot load an object header into a vector register.

Why Structs Excel: If you define a struct with Vector<T> fields (from System.Numerics), the JIT compiler can lay out the struct to match hardware vector registers.

Example: An AttentionScore struct containing 8 floats can be loaded into an AVX2 register in a single instruction.
Token Processing: When processing embeddings (arrays of floats), iterating over a struct[] allows the JIT to emit SIMD instructions (like Add, Multiply, DotProduct) that operate on 4, 8, or 16 floats per cycle.

The `ref struct` and `Span<T>` Synergy

In modern C# (Modern C# features), we combine structs with Span<T> and ref struct.

Span<T>: A ref struct that represents a contiguous region of arbitrary memory (stack, heap, or unmanaged).
ref struct Constraint: A struct that must live on the stack. It cannot be boxed, cannot be a field in a class, and cannot be used in async methods.

This is vital for AI pipelines. We can parse a massive text file (e.g., 1GB of training data) into a ReadOnlySpan<byte> without allocating a single string or byte array on the heap. We can then tokenize this span into a Span<int> (token IDs) living on the stack. The entire pipeline runs with zero allocations and zero GC pressure.

Practical Guidelines for AI Architecture

When to use Structs:

Data-Oriented Design: When you have collections of small, immutable data (e.g., Token, Vector3, QuantizedWeight).
Hot Paths: Inside loops processing millions of items (e.g., the forward pass of a neural network or token embedding lookup).
Memory-Mapped I/O: When reading model weights directly from disk into memory buffers.
SIMD Vectors: Any data structure designed to be processed by Vector<T>.

When to use Classes:

Identity and Mutation: When the object's reference matters (e.g., a ModelSession that maintains state across multiple requests).
Large Data: If a struct exceeds 16-24 bytes, passing it by value (copying) becomes expensive. Classes pass by reference (a single pointer).
Polymorphism: When you need inheritance or interfaces (e.g., IModel implemented by OpenAIModel and LlamaModel). Structs cannot inherit from classes or other structs (though they can implement interfaces).

Visualizing the Memory Layout

The following diagram illustrates the stark difference in memory layout between a class-based approach and a struct-based approach for an array of 4 Token objects.

This diagram contrasts the memory layout of an array of class objects, which stores references to heap-allocated data, with an array of struct objects, which stores the objects' data directly in contiguous memory.

The "What If": Quantization and Memory Mapped Files

In Book 9, we discussed Span<T> for zero-copy I/O. Now, imagine loading a 4-bit quantized LLM (e.g., 3.7 billion parameters). The weights are stored as int8 or float16 values.

Using Classes: You would have to allocate a Weight class for each parameter. This is impossible; the overhead of the object headers would exceed the size of the model weights themselves.
Using Structs: You define a QuantizedWeight struct (e.g., 1 byte). You map the file directly into memory using MemoryMappedFile and wrap it in a Span<QuantizedWeight>. This allows you to load a 4GB model into a 4GB memory space with zero allocation overhead, enabling the AI to run within strict memory constraints (like mobile devices) using C#.

Theoretical Foundations

The shift from classes to structs in AI token processing is a shift from Object-Oriented Programming (OOP) to Data-Oriented Design (DOD).

OOP focuses on the identity and behavior of data.
DOD focuses on the transformation of data.

In high-performance AI, we care about transforming tokens into embeddings as fast as possible. Structs provide the memory density and cache locality required to feed the CPU's execution units efficiently. By eliminating the indirection of references and the overhead of the Garbage Collector, we unlock the raw throughput necessary for real-time, large-scale AI inference in C#.

Basic Code Example

Here is a code example demonstrating the memory layout and performance differences between class and struct in the context of processing a stream of data tokens for an AI inference engine.

using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

namespace HighPerformanceAI.TokenProcessing
{
    // ---------------------------------------------------------
    // CONTEXT: AI Token Processing
    // ---------------------------------------------------------
    // In an AI inference engine, we process a sequence of tokens.
    // Each token has a vocabulary ID and a weight (logit).
    // We need to perform vectorized operations (like scaling weights)
    // on these tokens millions of times per second.
    //
    // PROBLEM: Using classes creates heap allocations and pointer chasing,
    // which destroys CPU cache locality. Using structs allows data to be
    // packed contiguously in memory, enabling SIMD (Single Instruction, Multiple Data).

    /// <summary>
    /// Represents a Token as a CLASS (Reference Type).
    /// This is the "slow" path for high-throughput numeric processing.
    /// </summary>
    public class TokenClass
    {
        public int Id;
        public float Weight;

        public TokenClass(int id, float weight)
        {
            Id = id;
            Weight = weight;
        }
    }

    /// <summary>
    /// Represents a Token as a STRUCT (Value Type).
    /// This is the "fast" path for high-throughput numeric processing.
    /// </summary>
    [StructLayout(LayoutKind.Sequential)] // Ensures specific memory layout
    public struct TokenStruct
    {
        public int Id;
        public float Weight;

        public TokenStruct(int id, float weight)
        {
            Id = id;
            Weight = weight;
        }
    }

    public class TokenBenchmark
    {
        const int ITERATIONS = 1_000_000; // 1 Million tokens

        public static void RunDemo()
        {
            Console.WriteLine($"--- Token Processing Benchmark (Iterations: {ITERATIONS:N0}) ---\n");

            // 1. BENCHMARK CLASS-BASED PROCESSING
            // ---------------------------------------------------------
            // We allocate an array of references. Each token is a separate object
            // allocated on the Managed Heap.
            TokenClass[] tokenClasses = new TokenClass[ITERATIONS];

            // Pre-fill to ensure allocation overhead is accounted for
            for (int i = 0; i < ITERATIONS; i++)
            {
                tokenClasses[i] = new TokenClass(i, 1.0f);
            }

            Stopwatch sw = Stopwatch.StartNew();

            // SCENARIO: Apply a temperature scaling factor to the weights.
            // In a real AI model, this is a vectorized operation.
            for (int i = 0; i < ITERATIONS; i++)
            {
                // Accessing a class involves:
                // 1. Load the reference from the array.
                // 2. Dereference the pointer to find the object on the heap.
                // 3. Access the field.
                tokenClasses[i].Weight *= 0.5f; 
            }

            sw.Stop();
            long classTime = sw.ElapsedMilliseconds;
            Console.WriteLine($"[Class] Processing Time: {classTime} ms");

            // 2. BENCHMARK STRUCT-BASED PROCESSING
            // ---------------------------------------------------------
            // We allocate an array of values. The structs are packed contiguously
            // in memory (no pointers, no heap objects).
            TokenStruct[] tokenStructs = new TokenStruct[ITERATIONS];

            // Pre-fill
            for (int i = 0; i < ITERATIONS; i++)
            {
                tokenStructs[i] = new TokenStruct(i, 1.0f);
            }

            sw.Restart();

            // SCENARIO: Apply the same temperature scaling.
            for (int i = 0; i < ITERATIONS; i++)
            {
                // Accessing a struct involves:
                // 1. Calculate offset in the contiguous array.
                // 2. Access the data directly (CPU cache friendly).
                // Note: We must copy the struct to the stack to modify it,
                // then copy it back. However, the JIT optimizer often optimizes
                // this in tight loops to direct memory manipulation.
                TokenStruct t = tokenStructs[i];
                t.Weight *= 0.5f;
                tokenStructs[i] = t;
            }

            sw.Stop();
            long structTime = sw.ElapsedMilliseconds;
            Console.WriteLine($"[Struct] Processing Time: {structTime} ms");

            // 3. ADVANCED: SIMD OPTIMIZATION (The "Why")
            // ---------------------------------------------------------
            // Structs allow us to use System.Numerics.Vector<T> for SIMD.
            // We cannot easily do this with arrays of classes because the data
            // is scattered all over the heap.
            Console.WriteLine("\n--- SIMD Optimization (Vector<T>) ---");

            // Reset data for fair comparison
            for (int i = 0; i < ITERATIONS; i++) tokenStructs[i] = new TokenStruct(i, 1.0f);

            sw.Restart();
            ProcessStructsSimd(tokenStructs);
            sw.Stop();
            long simdTime = sw.ElapsedMilliseconds;
            Console.WriteLine($"[Struct + SIMD] Processing Time: {simdTime} ms");

            // 4. MEMORY LAYOUT VISUALIZATION
            // ---------------------------------------------------------
            VisualizeMemoryLayout();
        }

        /// <summary>
        /// Optimized processing using SIMD (Vectorization).
        /// Requires the data to be contiguous (structs in an array).
        /// </summary>
        private static void ProcessStructsSimd(TokenStruct[] tokens)
        {
            // In a real scenario, we would use Span<T> and Vector<T>.
            // This example simulates the concept by processing chunks.
            // Note: We cannot use Vector<T> directly on custom structs easily
            // without unsafe code or explicit layout, but for floats, 
            // we can treat the memory as floats if we ignore the ID.

            // For this demo, we will simply iterate, but imagine using:
            // Vector<float> scale = new Vector<float>(0.5f);
            // This processes 8 floats (AVX2) or 16 floats (AVX-512) at once.

            for (int i = 0; i < tokens.Length; i++)
            {
                // In SIMD, this loop would be unrolled and vectorized automatically
                // by the JIT if we were using Vector<T> types.
                tokens[i].Weight *= 0.5f;
            }
        }

        private static void VisualizeMemoryLayout()
        {
            Console.WriteLine("\n--- Memory Layout Visualization ---");
            Console.WriteLine("Generating DOT diagram for memory representation...");

            string dot = @"
<!-- Errore diagramma 16 -->

";
            Console.WriteLine("


");
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            TokenBenchmark.RunDemo();
        }
    }
}

Detailed Explanation

1. The Real-World Context: Token Processing in AI

In an AI inference engine (like GPT), the model generates a probability distribution over a vocabulary of 50,000+ tokens. To select the next token, we often apply operations like Temperature Scaling, Top-K Sampling, or Log-Softmax. These operations must be performed on arrays of millions of floating-point numbers per inference step.

The Goal: Minimize latency (time per token) and maximize throughput (tokens per second).
The Bottleneck: Memory access patterns. The CPU spends more time waiting for data from RAM than actually calculating.

2. Code Breakdown: `TokenClass` vs. `TokenStruct`

Step 1: Defining the Data Structures

TokenClass (Reference Type):
- When you declare TokenClass[] tokens = new TokenClass[1_000_000], you are allocating an array of 1 million pointers (references).
- Each individual TokenClass object must be allocated separately on the Managed Heap.
- Memory Fragmentation: The objects are scattered randomly across the heap. To process tokens[0], the CPU fetches the pointer, then jumps to a random memory address. To process tokens[1], it jumps to another random address. This causes frequent Cache Misses.
TokenStruct (Value Type):
- When you declare TokenStruct[] tokens = new TokenStruct[1_000_000], you allocate a single contiguous block of memory large enough to hold 1 million structs.
- Data Locality: tokens[0] is immediately followed by tokens[1] in memory. When the CPU fetches the first struct, it automatically pulls the next few into the L1/L2 cache (Cache Line Fetch). This results in near 100% cache hit rates during sequential processing.

Step 2: The Benchmark Loop

Class Loop: tokenClasses[i].Weight *= 0.5f;
- Indirection: The JIT compiler generates code that loads the array reference, bounds checks the index, loads the object reference from that array slot, and finally dereferences that to find the Weight field.
- GC Pressure: Creating these objects generates work for the Garbage Collector. In a high-throughput AI scenario, allocating millions of temporary token objects would cause "GC Pauses," freezing the inference engine.
Struct Loop: TokenStruct t = tokenStructs[i]; t.Weight *= 0.5f; tokenStructs[i] = t;
- Stack/Inline Operations: Structs are stored inline. The data is copied to the CPU registers for processing.
- Zero Allocation: No heap allocation occurs here. The memory for the array was allocated once upfront. This results in Zero Garbage Collection Pressure.

Step 3: SIMD (Single Instruction, Multiple Data)

While the example shows a scalar multiplication for clarity, the true power of structs in AI is SIMD.
The System.Numerics.Vector<T> type allows the CPU to process 4, 8, or 16 floats in a single CPU cycle (using AVX2 or AVX-512 instructions).
Why Structs are Required: SIMD instructions require data to be packed tightly in memory (contiguous). You cannot easily apply a vectorized operation (e.g., multiplying 8 floats at once) to an array of classes because the data is scattered across the heap. Structs guarantee the layout required for SIMD intrinsics.

3. Memory Layout Visualization

The Graphviz diagram embedded in the code visualizes the stark difference in memory organization:

Left Side (Heap): Shows the "Array Object" containing pointers. These pointers point to distinct objects (Object 1, Object 2, etc.) located elsewhere in memory. This is a "pointer chase."
Right Side (Contiguous): Shows a solid block of memory. The Id and Weight of the first struct sit right next to the Id and Weight of the second struct. The CPU Prefetcher can predict this pattern and load data before it is even requested.

Common Pitfalls

1. The "Mutable Struct" Trap A frequent mistake when optimizing with structs is making them mutable.

// BAD PRACTICE
public struct TokenStruct
{
    public int Id;
    public float Weight;
}

// Usage
TokenStruct[] tokens = new TokenStruct[10];
tokens[0].Weight = 5.0f; // ERROR (in newer C# versions) or WARNING

Why it fails: In C#, array access returns a copy of the struct. Modifying tokens[0].Weight modifies the copy, not the data in the array. The change is immediately discarded.
The Fix: You must copy the struct to a local variable, modify it, and copy it back (as shown in the example: TokenStruct t = tokens[i]; ... tokens[i] = t;).
Modern C# Feature: Use ref struct or ref returns (e.g., public ref TokenStruct GetToken(int i)) to avoid copying, but be aware of the lifetime constraints (a ref struct cannot escape the stack).

2. Excessive Struct Size

The Mistake: Putting too much data in a struct (e.g., > 32 bytes).
Consequence: While structs avoid heap allocation, passing large structs by value copies all the data. If a struct is 100 bytes, passing it to a method copies 100 bytes to the stack. This can actually be slower than passing a reference (8 bytes) to a class.
Guideline: Keep structs small (ideally fitting in a CPU register or a cache line). For AI tokens, the 8 bytes used here (4-byte int + 4-byte float) is perfect.

3. Assuming Structs are Always Faster

The Mistake: Converting everything to structs blindly.
Consequence: If you need to store a token object in multiple lists or pass it around by reference, a struct might be copied excessively. Classes are better when you need shared ownership or identity (reference equality).
Context: In the AI pipeline, use structs for the mathematical data arrays (tensors, token streams) inside the hot loop, but use classes for the model architecture (layers, configuration) which is initialized once.

The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon

Loading knowledge check...

Code License: All code examples are released under the MIT License. Github repo.

All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.

Chapter 12: Structs vs. Classes - A Performance Deep Dive

Theoretical Foundations

The Memory Hierarchy: The Warehouse vs. The Backpack

Theoretical Foundations

1. Allocation and the Heap vs. The Stack

2. Garbage Collection and GC Pressure

CPU Cache Locality: The Speed of Light

SIMD (Single Instruction, Multiple Data) Optimization

The ref struct and Span<T> Synergy

Practical Guidelines for AI Architecture

Visualizing the Memory Layout

The "What If": Quantization and Memory Mapped Files

Theoretical Foundations

Basic Code Example

Detailed Explanation

1. The Real-World Context: Token Processing in AI

2. Code Breakdown: TokenClass vs. TokenStruct

3. Memory Layout Visualization

Common Pitfalls

The `ref struct` and `Span<T>` Synergy

2. Code Breakdown: `TokenClass` vs. `TokenStruct`