Beyond the Struct: Crafting a "Zero-Overhead" Generic Dynamic Array in C

If you've spent any time writing C, you know the drill when it comes to dynamic arrays. You define a struct, pack it with a pointer to your data, a size_t size, and a size_t capacity, and then write a half-dozen helper functions to manage allocations, resizing, and freeing. It’s a rite of passage. But let's be honest: it’s boilerplate-heavy, it ruins the "feel" of native arrays, and passing structs around by value or pointer quickly turns your clean code into a soup of dereferences.

What if you could have a generic dynamic array in C that behaves exactly like a native pointer? No explicit struct definitions for every type, no manually tracking capacity in your application code, and no macro-induced headaches. This week, a fascinating technique made waves on Hacker News, demonstrating how to build a generic dynamic array in pure C that stores no visible capacity, requires no user-facing structs, and maintains absolute type safety.

Today, we are going down the rabbit hole of memory layout hacks, macro magic, and API design to see how this works, why it is incredibly clever, and how you can implement it in your own systems-level projects.

The Core Problem: The Clunkiness of C Vectors

In languages like C++ or Rust, we have std::vector and Vec. They use generics (templates) to abstract away the underlying memory management. In C, we don't have templates. Historically, developers have solved this in one of two ways.

The Struct Approach

You define a struct for every single type you want a vector for:

typedef struct {
    int* data;
    size_t size;
    size_t capacity;
} IntVector;

This is type-safe, but it’s incredibly verbose. If you need a vector of float, double, and MyCustomStruct, you are duplicating this code (or abusing the preprocessor to generate these structs for you).

The void* Approach

You write a single generic struct using void*:

typedef struct {
    void* data;
    size_t element_size;
    size_t size;
    size_t capacity;
} Vector;

This eliminates code duplication, but you lose all compile-time type safety. You are constantly casting pointers, and the compiler won't warn you if you accidentally push a double into a vector of int.

The Solution: Hiding the Metadata in Plain Sight

To build a dynamic array that acts like a normal C pointer (e.g., int* my_array), we have to use a clever memory layout trick: allocating hidden metadata just before the returned pointer.

When a developer requests a dynamic array, we don't allocate just enough space for the elements. Instead, we allocate a single block of memory that contains our metadata (size and capacity) first, followed immediately by the array elements. We then return a pointer that points directly to the first element of the array, not the start of the allocated block.

Here is what the memory layout looks like conceptually:

Allocated Block Start
      ▼
┌───────────────────┬───────────────────┬─────────────────────────────────────┐
│  capacity (size_t)│    size (size_t)  │  element 0  │  element 1  │ ...     │
└───────────────────┴───────────────────┴─────────────────────────────────────┘
                                        ▲
                                User-Facing Pointer (Returned to Caller)

Because the user gets a pointer directly to the elements, they can index it using standard C syntax: my_array[i]. They don't need to know that just a few bytes to the left, in "negative space," lies the metadata that manages the array's growth.

Implementing the "No-Struct" Vector

Let's write a minimal, working implementation of this pattern. We will define our metadata header as a simple structure, but the user will never instantiate it or interact with it directly.

Step 1: The Internal Header

First, we define our hidden header. We'll use size_t for both capacity and size to keep memory alignment straightforward.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct {
    size_t capacity;
    size_t size;
} VectorHeader;

Step 2: Helper Macros for Metadata Access

Since the user only holds a pointer to the payload, we need a way to go backward from the payload pointer to the header pointer. We can do this using simple pointer arithmetic.

#define vector_header(v) ((VectorHeader*)((char*)(v) - sizeof(VectorHeader)))

With this macro, if we have a pointer v pointing to the elements, we cast it to a char* (so pointer arithmetic moves byte-by-byte), subtract the size of VectorHeader, and cast it back to a VectorHeader*. Now we can easily define macros to read and write the size and capacity:

#define vector_capacity(v) ((v) ? vector_header(v)->capacity : 0)
#define vector_size(v)     ((v) ? vector_header(v)->size : 0)

Step 3: The Resizing Logic

To grow the array, we need an internal function that handles the reallocation. Because this function needs to modify the pointer itself (since realloc might move the entire block to a new memory address), and because we want this to be generic, we will write a helper function that returns the new payload pointer.

void* _vector_grow(void* v, size_t element_size) {
    size_t capacity = vector_capacity(v);
    size_t size = vector_size(v);
    
    // Double the capacity, or start at 4 if empty
    size_t new_capacity = capacity == 0 ? 4 : capacity * 2;
    size_t total_size = sizeof(VectorHeader) + (new_capacity * element_size);
    
    VectorHeader* new_header = NULL;
    
    if (v) {
        // If the vector already exists, reallocate the whole block
        VectorHeader* old_header = vector_header(v);
        new_header = (VectorHeader*)realloc(old_header, total_size);
    } else {
        // If the vector is NULL, perform the initial allocation
        new_header = (VectorHeader*)malloc(total_size);
        new_header->size = 0;
    }
    
    if (!new_header) {
        perror("Out of memory");
        exit(EXIT_FAILURE);
    }
    
    new_header->capacity = new_capacity;
    
    // Return the pointer shifted past the header
    return (void*)((char*)new_header + sizeof(VectorHeader));
}

Step 4: Putting It Together with Type-Safe Macros

To make the interface completely seamless and maintain type safety, we use macros for the user-facing API. The most critical macro is vector_push, which needs to check if the array is full, grow it if necessary, append the value, and increment the size.

#define vector_free(v) \
    do { \
        if (v) { \
            free(vector_header(v)); \
            (v) = NULL; \
        } \
    } while(0)

#define vector_push(v, value) \
    do { \
        if (!vector_capacity(v) || vector_size(v) == vector_capacity(v)) { \
            (v) = _vector_grow((v), sizeof(*(v))); \
        } \
        (v)[vector_header(v)->size++] = (value); \
    } while(0)

Notice the magic in sizeof(*(v)). Because v is a typed pointer (like int* or float*), dereferencing it inside sizeof gives us the correct size of a single element at compile-time, preserving absolute generics without any runtime overhead!

Let’s See It in Action

Now, let's write a short program to see how incredibly clean this makes C code look. No custom structs, no casting, and normal array subscripting work out of the box.

int main() {
    // Declared as a normal native pointer!
    int* my_numbers = NULL;

    // Push elements dynamically
    for (int i = 0; i < 10; i++) {
        vector_push(my_numbers, i * 10);
    }

    // Access elements using standard array syntax
    printf("Vector size: %zu\n", vector_size(my_numbers));
    printf("Vector capacity: %zu\n", vector_capacity(my_numbers));
    
    for (size_t i = 0; i < vector_size(my_numbers); i++) {
        printf("Element [%zu]: %d\n", i, my_numbers[i]);
    }

    // Free the memory
    vector_free(my_numbers);
    
    return 0;
}

The Trade-offs: Is There a Catch?

This pattern feels like magic, but as software engineers, we know there's no such thing as a free lunch. Let’s look at the engineering trade-offs of this "hidden header" approach.

The Good

  • Beautiful Syntax: You get to use standard [] indexing without dereferencing a struct member (e.g., vec->data[i]).
  • Type Safety: The compiler checks that the type you pass to vector_push matches the type of the pointer v.
  • Zero Runtime Overhead: There is zero indirection when reading elements. The pointer you hold is a direct pointer to the data array, making it incredibly cache-friendly and fast.

The Gotchas

  • Pointer Invalidation: Because vector_push can call realloc under the hood, the address of the array can change. This means you must always assign the result back to your variable (which our macro does automatically), and you should never store long-lived pointers to elements inside the vector.
  • Safety Risks: If you accidentally pass a regular dynamically allocated pointer (like one created with malloc(sizeof(int) * 10)) to vector_size() or vector_free(), your program will attempt to read memory before the allocation block, leading to undefined behavior or segmentation faults.
  • Debugger Visibility: Debuggers like GDB or LLDB will see my_numbers as a simple pointer. They won't inherently know it's a dynamic vector with metadata, making it slightly harder to inspect the vector's size or capacity in a debugging session without manually casting the pointer offset.

Wrapping Up: When to Use This Pattern

The hidden metadata pattern is a classic example of C's raw power and flexibility. It bridges the gap between high-level syntactic convenience and low-level memory control. It is widely used in production-grade C libraries, including the popular redis-style dynamic string library (SDS).

If you are writing a performance-critical C application, a game engine, or an embedded system where you want to keep boilerplates to an absolute minimum while retaining type safety, this zero-overhead generic array pattern is an incredibly elegant tool to have in your software engineering toolkit.

What are your thoughts on this approach? Do you prefer explicit, self-documenting structs, or do you love this kind of syntactic wizardry in C? Let me know in the comments below!

Happy coding,
Alex

Post a Comment

Previous Post Next Post