Chapter 5: Memory and Span - Zero Allocation Slicing
Theoretical Foundations: Memory and Span**
In previous sections, we explored the performance characteristics of standard collections and the algorithmic complexity of data lookups. We established that while List<T> provides O(1) indexed access, searching for an element without a key is O(n), and Dictionary<TKey, TValue> offers O(1) average-case lookups at the cost of memory overhead and hashing complexity. Now, we pivot to the underlying memory model that powers these high-performance structures and enables zero-allocation data manipulation.
The Limitations of Traditional Array Slicing
Historically, in C# (prior to Span<T> and Memory<T>), slicing a data structure involved allocating a new array on the heap. Consider a scenario in AI model inference where we process a large buffer of tokens (integers) representing a prompt. If we need to process a specific sequence of tokens (a window) within that buffer, the traditional approach looks like this:
// Traditional approach: Heap allocation for every slice
public int[] ProcessWindow(int[] fullTokenBuffer, int start, int length)
{
// This allocates a new array on the heap (GC pressure)
int[] window = new int[length];
Array.Copy(fullTokenBuffer, start, window, 0, length);
// Perform processing...
return window;
}
This approach has three critical drawbacks for high-performance AI pipelines:
- GC Pressure: Each slice creates a new object on the heap. In a tight loop processing millions of tokens (e.g., in a transformer model), this generates massive garbage collection pauses, degrading throughput.
- Memory Fragmentation: Frequent allocations of varying sizes lead to fragmented memory, reducing cache locality.
- Copy Overhead:
Array.Copyinvolves a linear O(n) memory copy operation, doubling the work for a simple windowing operation.
The Concept: Zero-Cost Abstraction via Stack Allocation and Ref Structs
To solve this, C# introduced Span<T>. Conceptually, a Span<T> is a type-safe view over a contiguous region of memory. It is a ref struct, meaning it is allocated on the stack and cannot escape the current method call (it cannot be stored in a field of a class or used as an async method return type). This constraint is intentional; it guarantees that the Span<T> never outlives the memory it points to, preventing memory safety violations.
Real-World Analogy:
Imagine a massive warehouse of books (the heap array). Instead of photocopying a specific shelf of books to take to your desk (allocating a new array), you simply take a clipboard with a notepad containing the shelf number and range (e.g., "Shelf 5, Books 10-15"). Span<T> is that clipboard. It allows you to access and modify the books on that shelf directly without moving them. It is lightweight, fast to create, and incurs zero memory overhead for the data itself.
Memory vs. Span
While Span<T> is the stack-only view, Memory<T> is its heap-compatible counterpart. Memory<T> can be stored in fields and passed across async boundaries, but it cannot be directly dereferenced (accessed). To access the data, you must request a Span<T> from the Memory<T>.
Span<T>: The "slicing" tool. Used for synchronous, stack-allocated data manipulation.Memory<T>: The "transport" tool. Used to hold a reference to memory that might be on the heap (like an array) or managed (like unmanaged memory) and can be passed around before being sliced.
Internal Mechanics: How Slicing Works
When you slice a Span<T> or Memory<T>, you are not creating a new data structure. You are creating a new descriptor that points to the same underlying memory but with adjusted start and length properties.
using System;
public void ZeroAllocationSlicing()
{
// Large array on the heap (e.g., a batch of embeddings)
float[] embeddings = new float[1000];
// Create a Span over the entire array (no allocation)
Span<float> fullSpan = embeddings.AsSpan();
// Create a slice (no allocation, no copy)
// This is essentially a pointer arithmetic operation: start + offset
Span<float> slice = fullSpan.Slice(100, 50);
// Modifying the slice modifies the original array
slice[0] = 1.0f;
Console.WriteLine(embeddings[100]); // Output: 1
}
Under the Hood:
A Span<T> is essentially a struct containing two fields:
- A reference (pointer) to the start of the data.
- An integer representing the length.
When you call .Slice(offset, length), the runtime calculates a new pointer (original pointer + offset) and validates that the new length fits within the original bounds. This is an O(1) operation with minimal CPU instructions.
Stack Allocation with stackalloc
To further optimize, Span<T> can be used with memory allocated directly on the stack using stackalloc. This is crucial for temporary buffers in algorithms where the size is known at compile time or is small.
public int SumSmallBuffer(Span<int> data)
{
// Allocate a small buffer on the stack
// This memory is reclaimed immediately when the function returns
Span<int> tempBuffer = stackalloc int[64];
data.CopyTo(tempBuffer);
int sum = 0;
foreach (int val in tempBuffer)
{
sum += val;
}
return sum;
}
Constraint Warning: Because Span<T> is a ref struct, it cannot be used in iterators (yield return) or async methods. If you need to slice data asynchronously, you must use Memory<T> and convert it to Span<T> inside the synchronous execution path.
Practical Application: Tokenizer Vocabulary Lookups
In Natural Language Processing (NLP), tokenizers map strings (tokens) to integer IDs. A vocabulary might contain 50,000 entries. We need an efficient way to look up substrings of a large input string (the prompt) against the vocabulary.
Using string.Substring() creates a new string object (heap allocation) for every check. Using Span<char> allows us to create a "view" of the substring without allocation.
However, Dictionary<TKey, TValue> in .NET does not natively support Span<char> as a key because Span<T> cannot be used as a generic type argument (it is a ref struct). To solve this, we often use a custom hash map or a technique called "string interning" via a custom comparer, but for the sake of this theoretical foundation, let's look at how we might manually implement a lookup loop using Span<char> to demonstrate zero-allocation comparison.
Scenario: We have a vocabulary stored as an array of strings (for simplicity, though a Dictionary is usually used). We want to find the ID of a token present in a large text buffer.
using System;
using System.Collections.Generic;
public class TokenizerLookup
{
// Simulated vocabulary: Array of token strings
private string[] _vocabulary = new string[] { "hello", "world", "transformer", "attention" };
// Lookup dictionary for O(1) access (Standard approach)
private Dictionary<string, int> _vocabMap;
public TokenizerLookup()
{
_vocabMap = new Dictionary<string, int>();
for (int i = 0; i < _vocabulary.Length; i++)
{
_vocabMap[_vocabulary[i]] = i;
}
}
// Zero-Allocation Lookup using Span<char>
// We iterate manually to avoid LINQ (which allocates delegates/enumerators)
public int? FindTokenId(Span<char> tokenSpan)
{
for (int i = 0; i < _vocabulary.Length; i++)
{
// Compare the Span<char> with the string
// This creates a Span<char> from the string and compares memory
if (tokenSpan.SequenceEqual(_vocabulary[i].AsSpan()))
{
return i;
}
}
return null;
}
public void ProcessStream(ReadOnlyMemory<char> stream)
{
// We can slice the stream (Memory<T>) and pass it to synchronous processing
ReadOnlySpan<char> view = stream.Span;
// Example: Extracting a word "hello" from the stream without allocation
// Assuming we know the start and length (e.g., from a previous scan)
ReadOnlySpan<char> token = view.Slice(0, 5); // "hello"
int? id = FindTokenId(token);
Console.WriteLine($"Token ID: {id}");
}
}
Complexity Analysis:
- Standard
Dictionary<string, int>Lookup: O(1) average case. However, looking up a substring requires allocating that substring. - Manual
Span<char>Loop: O(n) where n is vocabulary size. However, it is O(1) in memory allocation (zero GC pressure). This is a trade-off: we sacrifice CPU cycles (linear search) to save memory bandwidth and GC pauses. For small vocabularies or hot paths where allocation is the bottleneck, this is superior.
Graph Traversal Optimization
In graph algorithms (e.g., Breadth-First Search for recommendation systems), we often maintain a queue of nodes to visit. A standard Queue<T> uses a circular array internally. When the queue grows beyond capacity, it allocates a new, larger array and copies elements (amortized O(1) enqueue).
Using Span<T> allows us to operate on a pre-allocated buffer for the queue storage. If we can predict the maximum queue size (e.g., in a fixed-size graph traversal), we can allocate a single array on the stack or heap once and use a Span<T> to manage the "view" of the active queue elements.
public struct Node { public int Id; public float Weight; }
public void BfsTraversal(Span<Node> storageBuffer, int startNodeId)
{
// We use the span as a backing store for our queue logic
// This avoids the overhead of the Queue<T> class wrapper and resizing
int head = 0;
int tail = 0;
storageBuffer[tail++] = new Node { Id = startNodeId, Weight = 1.0f };
while (head < tail)
{
Node current = storageBuffer[head++];
// Simulate visiting neighbors
// In a real scenario, we would calculate neighbors and add them to storageBuffer[tail++]
// ensuring we don't exceed storageBuffer.Length
}
}
Memory Safety and Bounds Checking
A common misconception is that Span<T> is "unsafe" like pointers. In reality, Span<T> is safer. While it uses pointer arithmetic internally, the runtime performs bounds checking on every access. If you attempt to access span[10] on a span of length 5, an IndexOutOfRangeException is thrown immediately. This prevents buffer overflows while maintaining the performance of contiguous memory access.
Theoretical Foundations
- Zero-Copy Slicing:
Span<T>andMemory<T>provide a view into existing memory without copying data. - Stack vs. Heap:
Span<T>is stack-allocated (fast, scoped), whileMemory<T>is heap-compatible (flexible, async-safe). - Performance Trade-offs: Using these types reduces GC pressure significantly, which is critical for AI applications processing large datasets (embeddings, token streams) in real-time.
- Constraints: The
ref structnature ofSpan<T>restricts its usage in async/await and iterators, requiring careful architectural design.
This foundation enables the high-performance data manipulation pipelines required in modern AI systems, where minimizing latency and memory footprint is paramount.
Basic Code Example
Here is a simple, 'Hello World' level code example that demonstrates the internal mechanics of a Dictionary and a HashSet to solve a real-world problem: deduplicating a stream of user IDs and looking up user names. This example manually implements the logic to show how hashing and buckets work, adhering to the constraints of avoiding LINQ shortcuts.
Code Example: User ID Deduplication and Lookup
using System;
using System.Collections.Generic;
class Program
{
static void Main()
{
// 1. Simulate a stream of incoming user events (IDs and Names).
// In a real scenario, this might come from a log file or API.
var userEvents = new List<(int Id, string Name)>
{
(101, "Alice"),
(102, "Bob"),
(101, "Alice"), // Duplicate ID
(103, "Charlie"),
(102, "Bob"), // Duplicate ID
(104, "David")
};
// 2. Create a HashSet to track unique IDs efficiently.
// HashSet<T> uses hashing to achieve O(1) average time complexity for lookups and insertions.
var uniqueUserIds = new HashSet<int>();
// 3. Create a Dictionary to map IDs to Names for quick retrieval.
// Dictionary<K,V> also uses hashing to map keys to values.
var userDictionary = new Dictionary<int, string>();
// 4. Process the stream manually (no LINQ).
// We iterate through the list and check for duplicates using the HashSet.
foreach (var user in userEvents)
{
// TryAdd is a safe way to add to a HashSet; it returns false if the item already exists.
if (uniqueUserIds.Add(user.Id))
{
// If the ID was unique, add it to the dictionary.
userDictionary.Add(user.Id, user.Name);
Console.WriteLine($"Added: ID={user.Id}, Name={user.Name}");
}
else
{
Console.WriteLine($"Skipped Duplicate: ID={user.Id}");
}
}
// 5. Demonstrate efficient lookup.
// This is the "Tokenizer" or "Graph Node" scenario: finding data instantly.
Console.WriteLine("\n--- Lookup Test ---");
int searchId = 102;
if (userDictionary.TryGetValue(searchId, out string name))
{
Console.WriteLine($"Found ID {searchId}: {name}");
}
else
{
Console.WriteLine($"ID {searchId} not found.");
}
}
}
Explanation of Internal Mechanics
-
The Problem Context: The code simulates a high-throughput system (like a tokenization pipeline or a graph traversal) where data arrives as a stream. We need to filter out duplicates and store relationships (ID to Name) efficiently. Using a
Listfor this would be slow because checking if an item exists (List.Contains) requires scanning every element. -
How
HashSet<T>Works (The "Set" Concept):- Hashing: When you call
uniqueUserIds.Add(101), the object101is passed to a hashing function. This function returns an integer (hash code). - Buckets: This hash code maps to a specific "bucket" in memory. The
HashSetstores the value in that bucket. - Lookup: To check if
101exists,HashSetcalculates the hash of101, goes directly to that bucket, and checks the value. It doesn't look at other numbers. This is why it is O(1) (constant time).
- Hashing: When you call
-
How
Dictionary<K,V>Works (The "Map" Concept):- It works exactly like
HashSet, but the bucket contains both the Key and the Value. - Key Hashing: The Key (
102) is hashed to find the bucket. - Value Storage: The Value (
"Bob") is stored alongside the Key in that bucket. - Retrieval:
userDictionary.TryGetValue(102, ...)hashes102, jumps to the bucket, and retrieves"Bob"instantly.
- It works exactly like
-
Complexity Comparison:
- List.Contains (O(N)): If we used a
List<int>to store IDs, checking for duplicates requires scanning the list from start to finish. For 1,000,000 items, this is 1,000,000 checks per item. - HashSet.Add (O(1)): With a
HashSet, checking for duplicates is nearly instant, regardless of how many items are already stored.
- List.Contains (O(N)): If we used a
Visualizing the Buckets
The following diagram illustrates how the Dictionary stores the IDs 101, 102, and 103 in memory buckets based on their hash values.
Common Pitfalls
-
Assuming O(1) is always O(1): While
HashSetandDictionaryare O(1) on average, they can degrade to O(N) in the worst-case scenario. This happens if you have a terrible hashing function that maps all keys to the same bucket (a "hash collision"). In .NET, the built-in types handle this well by using "chaining" (linked lists inside buckets) or open addressing, but it is still a performance hit if many collisions occur. -
Modifying the Collection While Iterating: Never add or remove items from a
HashSetorDictionarywhile iterating over it with aforeachloop (except using the specific.Removemethods provided by the iterator if available). This throws anInvalidOperationException. Always build the collection first, then iterate, or use a temporary list to store items to remove. -
Using Reference Types as Keys without Overriding GetHashCode: If you use a custom class (e.g.,
class UserToken) as a Key in aDictionary, you must overrideGetHashCode()andEquals(). By default, reference types hash based on memory location, which is unique for every new object. If you create twoUserTokenobjects with the same text content, they will have different hash codes and the Dictionary will treat them as different keys, which is usually not what you want.
The chapter continues with advanced code, exercises and solutions with analysis, you can find them on the ebook on Leanpub.com or Amazon
Code License: All code examples are released under the MIT License. Github repo.
Content Copyright: Copyright © 2026 Edgar Milvus | Privacy & Cookie Policy. All rights reserved.
All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.