Memory – Part 1

Today, we’re going to begin a (series?) on the intricacies of memory management, focusing exclusively on Linux and C++, and especially concentrated on our use-case: real-time efficiency.

Today we’re going to discuss a fairly low level aspect of C/C++: malloc.

We’re going to dive right in, and then spend the next entry going over types of entries/allocators. Stack vs heap, anyone?

Not So Simple

A fairly typical assumption to make is that calling free, which ‘frees’ memory, releases said memory back to the operating system. In case you didn’t know, the operating system manages the computers memory. Memory in C (and C++ but that’s a hassle to type) works by taking memory from the operating system and using it within the program. This is true of either the stack or the heap, where the stack is simply the program ‘scratch pad’ – the program grabs a block of memory where it will keep track of variables, etc. If you look at the HP-50’s stack, it’s the same idea. The heap, on the other hand, is memory used for dynamic allocation. Heap allocation is more complicated that stack allocation for reasons we’ll get into at a later date. In the mean time, however, simply think of the heap as a pile of unused memory which we can divvy up.

Anyway, we call free on heap objects to dispose of those blocks of memory. delete in C++ typically gets implemented as simply a wrapper around free, so this explanation typically works both ways. freed memory, however, isn’t typically returned to the operating system right away. There’s a heavy cost with allocating memory from the OS, so typically that memory is cached in some way to make your program run faster.

This brings us to the following series of problems:

How do we organize free memory such that we can allocate a large block of memory without making calls to free too expensive?
How do we reduce fragmentation and waste in the general purpose allocator? We cannot expect our memory will be a certain size, order, or frequency.
Concurrency! (we’re going to ignore this)

Fragmentation

I digress for a moment to talk about fragmentation. Imagine we have 32 bytes of free memory:

----------------------------------
|                                |
----------------------------------

Now, let’s perform five allocations, creating a through e:

----------------------------------
|aaaabbccccccddeeee              |
----------------------------------

Then let’s free the first four allocations:

----------------------------------
|              eeee              |
----------------------------------

And finally, let’s try to allocate a really big (16 byte) object. Oops! We can’t do that, even though there’s 28 bytes of free memory!

This is less of a problem with virtual memory, which I encourage you to google. Basically, memory is organized by pages, and memory has to be contiguous in this virtual space and not in physical space. So if we have virtual memory with a page size of 2 bytes, we could do this:

----------------------------------
|ffffffffffffffeeeeff            |
----------------------------------

And then because we’re using virtual memory, we can set up each two byte chunk so that the entire virtual memory space looks like the following:

------------------------------------------------------etc
|              eeeeffffffffffffffff                   
------------------------------------------------------etc

Anyway, what really sucks about the malloc problem is that our two primary objectives oftentimes run directly counter to each other. Keeping a linked list of free blocks will make calls best-case constant time, but unless we accept waste and fragmentation, it’s going to be worst-case more often than not.

Implementation

The simplest malloc implementation uses the system call sbrk(2) to acquire memory and then use a linked list in order to store free chunks. In this case, free is constant time, but malloc is O(n).

But wait! There’s more! We also need to store metadata about each chunk, including size, free vs in-use, free-list pointers, etc. Well, you can’t really call malloc from the allocator, so typically the metadata is stored in a header which precedes the address which is handed to the application. If the header is 16 bytes in size, the allocator will hand the program ptr and the header is stored at ptr - 16

header  ptr
v       v
------------------------------------------------------
| chunk |                                            |
------------------------------------------------------

Typically, malloc implementations will use this metadata to combine chunks of free memory if the adjacent ones are free.

header  ptr                        header  ptr
v       v                          v       v
---------------------------------------------------------------------------
| chunk |                          | chunk |                              |
---------------------------------------------------------------------------

We can find the chunk to the right using this:

struct malloc_chunk *right = (struct malloc_chunk *)(ptr + header(ptr)->size);

However, we don’t know anything about the size of the pointer to the left, so mallocs which do this will typically put the size of each block in the footer!

header  ptr                        header  ptr                          size
v       v                          v       v                             v
---------------------------------------------------------------------------
| chunk |                        |s| chunk |                            |s|
---------------------------------------------------------------------------

We can find the left chunk using this:

struct malloc_chunk *left = (struct malloc_chunk *)(ptr - footer_left(ptr) - sizeof(size_t));

But wait! More system calls to sbrk or mmap don’t necessarily guarantee contiguous virtual addresses! We have to watch out that invalid pointers aren’t dereferenced adding to a pointer that’s on the edge of what’s being managed (this would seg fault).

The solution to a lot of this complexity comes by creating a separate data structure to keep track of the virtual addresses we’re managing. Allocators such as tcmalloc simply store metadata in that structure which avoids a lot of pointer issues.

Further information can be found in CMU’s lectures