Chapter 7: Generics - Creating Reusable Data Pipelines
Theoretical Foundations
Generics are the mechanism that lets us build a single, reusable blueprint for a data structure or algorithm, and then instantiate that blueprint with specific types at compile time. This prevents the runtime errors that plagued early Java collections and eliminates the need for unsafe casting. In the context of building AI data pipelines, where we process heterogeneous data—images, text, or numerical tensors—we need a way to enforce type safety while maintaining flexibility.
The Core Concept: The Blueprint Analogy
Imagine an architectural blueprint for a house. The blueprint defines the structure: walls, a roof, and connections for plumbing and electricity. It does not, however, specify the exact materials. You can use the same blueprint to build a wooden cottage or a concrete high-rise. The structure remains consistent, but the material type changes.
In C#, a generic class is that blueprint.
- The Class Name (
Processor) is the blueprint. - The Type Parameter (
<T>) is the placeholder for the material (e.g.,Integer,String,Tensor). - The Instantiation (
new Processor<Tensor>()) is the actual construction using specific materials.
Without generics, we would have to build a ProcessorForIntegers, a ProcessorForStrings, and a ProcessorForTensors, duplicating logic for every data type. Generics allow us to write the logic once and apply it anywhere.
Generic Classes and Interfaces
A generic class is defined with angle brackets < > containing type parameters. These parameters act as placeholders for actual types used when creating an instance.
Syntax:
public class Processor<T>
{
private T data;
public Processor(T input)
{
this.data = input;
}
public T Process()
{
// Perform some operation on data
return data;
}
}
Usage in AI Data Structures:
In an AI pipeline, we often deal with datasets that contain raw data and labels. A generic Dataset<T> class allows us to handle different types of raw data while keeping the structure identical.
// A dataset containing raw data of type T and a label of type L
public class Dataset<T, L>
{
public T RawData { get; set; }
public L Label { get; set; }
public Dataset(T data, L label)
{
this.RawData = data;
this.Label = label;
}
}
We can instantiate this for image data (byte arrays) or text data (strings):
// Image dataset: Raw data is a byte array, Label is an integer (class ID)
Dataset<byte[], int> imageSet = new Dataset<byte[]>(new byte[1024], 1);
// Text dataset: Raw data is a string, Label is a boolean (sentiment)
Dataset<string, bool> textSet = new Dataset<string>("The model performed well", true);
Constraints on Type Parameters
Sometimes we need to restrict the types that can be used as arguments to ensure they support specific operations or structure. In C#, this is done using the where clause.
Unlike Java, C# does not use extends or super for generics. Instead, it offers a robust set of constraints:
- Reference Type Constraint (
where T : class): T must be a reference type (class, interface, delegate, array). - Value Type Constraint (
where T : struct): T must be a non-nullable value type (int, double, custom struct). This is critical for AI performance to avoid memory allocation (boxing). - Interface/Base Class Constraint (
where T : SomeBaseClass, ISomeInterface): T must inherit from the class or implement the interface. - Constructor Constraint (
where T : new()): T must have a public parameterless constructor, allowing you to create new instances (new T()) inside the generic class.
Usage in AI Data Structures: In model training, we often need to perform math. We can enforce that a generic type implements a numeric interface (available in modern .NET) or simply enforce that it is a value type to ensure performance.
// Constraint: T must be a value type (struct) and implement IComparable
public class TensorAdapter<T> where T : struct, IComparable<T>
{
public T[] Data { get; set; }
public bool IsFirstElementPositive(T zeroValue)
{
// Because of IComparable, we can use CompareTo
return Data[0].CompareTo(zeroValue) > 0;
}
}
Real-World AI Application:
In model training, we often need to normalize features. If we have a generic Normalizer<T> class, we can constrain T to Number or Double (primitive wrapper) to ensure we can perform division and subtraction. We cannot normalize a String.
2. Lower Bound (<T super SomeClass>)
This restricts T to be SomeClass or a superclass of it. This is less common but crucial for writing to generic structures.
Why use it? If you have a collection that you want to write to, you need to ensure the collection can accept the specific type you are writing.
Variance: Covariance and Contravariance
Variance defines how subtyping between generic types relates to the subtyping of their type arguments. Unlike Java, C# does not support wildcards (like List<? extends Number>). Instead, variance is defined at the definition level (in the interface or delegate declaration) using the in and out keywords.
1. Covariance (out T) - Producers
Use the out keyword when the generic type T is only used as a return type (output). It allows you to treat a collection of a derived type as a collection of a base type.
* Example: IEnumerable<out T>
* Code: IEnumerable<string> can be assigned to IEnumerable<object>.
2. Contravariance (in T) - Consumers
Use the in keyword when the generic type T is only used as an input parameter. It allows you to use a handler for a base type to handle a derived type.
* Example: Action<in T>
* Code: An Action<object> (that can handle anything) can be assigned to a variable of type Action<string>.
3. Invariance
Classes in C# (like List<T>) are always invariant. You cannot assign List<string> to List<object>. This is for type safety: if it were allowed, you could try to add an int to the List<object> variable, which would corrupt the underlying list of strings.
Visualizing the Flow in a Pipeline:
In AI pipelines, we often use covariant interfaces for reading data streams (IDataSource<out T>) and contravariant interfaces for writing data or logging (IDataSink<in T>).
PECS: Producer Extends, Consumer Super
This is the golden rule for wildcards.
1. Producer Extends (? extends T)
Use this when you only read from a structure and do not modify it.
- Analogy: Think of a vending machine that dispenses
Fruit. You know it dispensesFruit, but you don't know if it dispensesAppleorBanana. You can take aFruitout, but you cannot put anything back in (because you don't know the exact slot type). - Usage:
List<? extends Number>allows you to readNumberobjects from the list. You cannot add to it (exceptnull).
2. Consumer Super (? super T)
Use this when you only write to a structure.
- Analogy: Think of a recycling bin labeled "Recyclables". You can throw a
PlasticBottle(a specific type) into it because it is aRecyclable. However, you cannot reliably pull a specific item out of it; you only know you got aRecyclable. - Usage:
List<? super Integer>allows you to addIntegerobjects to the list. If you read from it, you only getObject.
Visualizing the Flow in a Pipeline:
List<? super Integer> acts as a flexible pipeline that accepts Integer objects (or its subtypes) flowing into it, but when you pull data out, the flow is widened to the common supertype Object.public class Pipeline
{
// A method that accepts a list of any type of Number (or subclass)
// and processes them.
public void ProcessNumbers(List<? extends Number> numbers)
{
foreach (Number n in numbers)
{
// We can safely read because the wildcard guarantees it's at least a Number
Console.WriteLine(n.DoubleValue());
}
// ERROR: Cannot add to a producer list
// numbers.Add(10.5);
}
// A method that accepts a list of any super-type of Integer
// and adds Integers to it.
public void FillList(List<? super Integer> list)
{
// We can safely write because the list accepts Integers or their parents
list.Add(1);
list.Add(2);
// ERROR: Cannot safely read (other than Object)
// Integer i = list.Get(0); // Compilation error
}
}
List<string> and List<object> share the same compiled code (JIT) but maintain distinct type metadata.
* Value Types: Crucially for AI, List<int> and List<double> generate separate specialized native code. List<int> holds raw integers, not boxed objects.
Architectural Implication:
1. Performance: Because value types (structs) are not erased to Object, there is no Boxing/Unboxing overhead. A Tensor<float> in C# is extremely performant compared to an ArrayList or a type-erased Java generic.
2. Runtime Reflection: You can check types at runtime. The code if (obj is List<int>) is valid and works in C#. You can also use new T() if the new() constraint is present, which is impossible in languages with type erasure.
How it works:
1. Unbounded: List<T> becomes List<Object>.
2. Bounded: List<T extends Number> becomes List<Number>.
Architectural Implication:
Because the type information is lost at runtime, you cannot use reflection to determine the generic type of an object dynamically.
* if (obj instanceof T) is invalid.
* new T() is invalid (because T is erased to Object and Object has no default constructor).
Why this matters for AI:
When building a generic Tensor<T> class, you cannot rely on runtime checks to ensure T is Float or Double for GPU acceleration. You must enforce this via bounds at compile time. If you use reflection to load a model configuration dynamically, you lose the specific type safety provided by generics, reverting to the risks of raw types.
### Generic Methods
Generics can also be applied to methods, independent of the class they are in. This is useful for utility classes where the type is determined by the arguments passed.
Syntax:
public class DataUtils
{
// <U> declares the type parameter for this method
// U is inferred from the return type or arguments
public static <U> U[] copyArray(U[] input)
{
U[] copy = (U[]) java.lang.reflect.Array.newInstance(
input.getClass().getComponentType(), input.length);
System.arraycopy(input, 0, copy, 0, input.length);
return copy;
}
}
List<Integer> (indices) or List<Double> (weights) without writing two separate methods.
public class Preprocessor
{
// Swaps elements in any list type
public static <T> void swap(List<T> list, int i, int j)
{
T temp = list.get(i);
list.set(i, list.get(j));
list.set(j, temp);
}
}
List<> without specifying a type.
However, legacy code from .NET 1.0 (before Generics were introduced in 2005) used non-generic collections like ArrayList or Hashtable. These collections store everything as System.Object.
The Danger:
Using these legacy collections eliminates type safety and forces expensive boxing operations for numeric data.
// BAD: Legacy ArrayList (No type safety, causes Boxing)
ArrayList mixedList = new ArrayList();
mixedList.Add(1); // Boxes int to object
mixedList.Add("Hello");
// GOOD: Generic List (Type safe, No Boxing)
List<int> numbers = new List<int>();
numbers.Add(1);
// numbers.Add("Hello"); // Compile-time Error
System.Collections.Generic (like List<T>, Dictionary<K,V>) and avoid the legacy System.Collections namespace.
### Summary of Constraints and Best Practices
1. Do not use raw types. Always specify type parameters or use wildcards.
2. Use bounded wildcards to increase API flexibility. If a method only reads, use ? extends T. If it only writes, use ? super T.
3. Be aware of type erasure. You cannot overload methods based solely on generic types (e.g., void method(List<Integer> l) and void method(List<String> l) are indistinguishable at runtime).
4. Use generic methods for utility operations. They provide type inference for the caller.
By mastering these theoretical foundations, we lay the groundwork for building the FeaturePipeline and TensorAdapter classes mentioned in the chapter outline, ensuring our AI systems are both flexible and robust.
### Basic Code Example
Context:
In AI data processing, we often deal with raw data (like sensor readings) that needs to be converted into a specific format (like a normalized vector) before being stored. We need a reusable "pipeline" component that handles this transformation. We will create a generic Transformer class that converts an input type I to an output type O. This ensures type safety so we don't accidentally pass a string where a number is expected.
Code Example:
using System;
// A generic interface representing a component that transforms data.
// <TInput>: The type of data entering the transformer.
// <TOutput>: The type of data leaving the transformer.
public interface ITransformer<TInput, TOutput>
{
TOutput Transform(TInput input);
}
// A concrete implementation of the transformer.
// We use 'where' constraints to ensure TInput is a value type (struct)
// and TOutput is a class, ensuring memory safety.
public class NumericNormalizer<TInput, TOutput> : ITransformer<TInput, TOutput>
where TInput : struct // Constraint: TInput must be a value type (e.g., int, double, float)
where TOutput : class // Constraint: TOutput must be a reference type (e.g., string, byte[])
{
private readonly TOutput _placeholderValue;
// Constructor to initialize the transformer with a default output value.
public NumericNormalizer(TOutput placeholderValue)
{
_placeholderValue = placeholderValue;
}
public TOutput Transform(TInput input)
{
// In a real AI scenario, this would perform complex normalization math.
// Here, we simply check if the input is a number and return the placeholder.
// We use runtime type checking because we cannot perform arithmetic
// on generic types 'TInput' directly without constraints like 'where TInput : INumber'.
if (typeof(TInput) == typeof(int))
{
int val = (int)(object)input;
Console.WriteLine($"Normalizing integer: {val}");
// Return the placeholder cast to TOutput (simulating a processed byte array or string)
return _placeholderValue;
}
throw new NotSupportedException("This example only supports integers for simplicity.");
}
}
// Main application to demonstrate the pipeline.
public class Program
{
public static void Main()
{
// 1. Create a transformer that takes an 'int' and outputs a 'string' (representing a processed feature).
// This mimics converting raw sensor data (int) into a normalized string descriptor.
ITransformer<int, string> sensorProcessor = new NumericNormalizer<int, string>("Normalized_Feature_Vector");
// 2. Process a raw data point.
int rawSensorValue = 42;
string processedFeature = sensorProcessor.Transform(rawSensorValue);
Console.WriteLine($"Input: {rawSensorValue}, Output: {processedFeature}");
// 3. Attempting to use an incompatible type will cause a compile-time error.
// Uncommenting the line below will fail because 'sensorProcessor' expects an 'int', not a 'double'.
// sensorProcessor.Transform(3.14);
}
}
ITransformer<TInput, TOutput>):
We define an interface with two generic type parameters. This allows us to describe a relationship between an input and an output type without committing to specific data types yet. This is the foundation of a reusable pipeline.
2. Class Implementation (NumericNormalizer<TInput, TOutput>):
We implement the interface. Notice the where clauses. These are Bounded Type Parameters.
* where TInput : struct: Restricts TInput to value types (integers, doubles, etc.), which is appropriate for raw numerical data.
* where TOutput : class: Restricts TOutput to reference types (objects, strings, arrays), which is useful for complex data structures like feature vectors.
3. The Transform Logic:
Inside Transform, we face a limitation of Generics: mathematical operations aren't defined for every type. To handle this, we use a runtime check (typeof(TInput) == typeof(int)). We perform a "double cast" (int)(object)input to unbox the value. This allows us to work with the concrete integer value inside the generic method.
4. Instantiation and Usage:
In Main, we create a specific instance: ITransformer<int, string>. This locks the generic types into concrete C# types. The compiler now enforces that Transform must accept an int and must return a string.
5. Type Safety:
The compiler prevents us from passing a double to Transform (as seen in the commented line). This eliminates runtime casting errors common in non-generic collections (like ArrayList).
### Common Pitfalls
The "Boxing" Performance Trap
A frequent mistake when using Generics with value types (structs) inside reference type constraints or interfaces is Boxing.
In the example above, the line if (typeof(TInput) == typeof(int)) followed by casting to object causes boxing. Boxing is the process of converting a value type (like int) into a reference type (like object) by allocating memory on the heap. In high-performance AI data pipelines processing millions of data points, this allocation can cause significant garbage collection pressure and slow down the application.
How to avoid it:
1. Keep value types generic whenever possible (e.g., List<int> is much faster than ArrayList).
2. Avoid casting generics to object unless absolutely necessary.
3. In modern C#, use interfaces like INumber<T> (available in .NET 7+) to perform arithmetic on generics without boxing.
INumber<T> to perform arithmetic operations on a type parameter T without boxing, illustrating how the interface enables direct, efficient operations on numeric types like int or double.Code License: All code examples are released under the MIT License. Github repo.
Content Copyright: Copyright © 2026 Edgar Milvus | Privacy & Cookie Policy. All rights reserved.
All textual explanations, original diagrams, and illustrations are the intellectual property of the author. To support the maintenance of this site via AdSense, please read this content exclusively online. Copying, redistribution, or reproduction is strictly prohibited.