Today, we’re going to begin a (series?) on the intricacies of memory management, focusing exclusively on Linux and C++, and especially concentrated on our use-case: real-time efficiency.
Today we’re going to discuss a fairly low level aspect of C/C++: malloc.
We’re going to dive right in, and then spend the next entry going over types of entries/allocators. Stack vs heap, anyone?
Not So Simple
A fairly typical assumption to make is that calling free
, which ‘frees’ memory, releases said memory back to the operating system. In case you didn’t know, the operating system manages the computers memory. Memory in C (and C++ but that’s a hassle to type) works by taking memory from the operating system and using it within the program. This is true of either the stack or the heap, where the stack is simply the program ‘scratch pad’ – the program grabs a block of memory where it will keep track of variables, etc. If you look at the HP-50’s stack, it’s the same idea. The heap, on the other hand, is memory used for dynamic allocation. Heap allocation is more complicated that stack allocation for reasons we’ll get into at a later date. In the mean time, however, simply think of the heap as a pile of unused memory which we can divvy up.
Anyway, we call free
on heap objects to dispose of those blocks of memory. delete
in C++ typically gets implemented as simply a wrapper around free
, so this explanation typically works both ways. free
d memory, however, isn’t typically returned to the operating system right away. There’s a heavy cost with allocating memory from the OS, so typically that memory is cached in some way to make your program run faster.
This brings us to the following series of problems:
- How do we organize free memory such that we can allocate a large block of memory without making calls to
free
too expensive? - How do we reduce fragmentation and waste in the general purpose allocator? We cannot expect our memory will be a certain size, order, or frequency.
- Concurrency! (we’re going to ignore this)
Fragmentation
I digress for a moment to talk about fragmentation. Imagine we have 32 bytes of free memory:
----------------------------------
| |
----------------------------------
Now, let’s perform five allocations, creating a through e:
----------------------------------
|aaaabbccccccddeeee |
----------------------------------
Then let’s free the first four allocations:
----------------------------------
| eeee |
----------------------------------
And finally, let’s try to allocate a really big (16 byte) object. Oops! We can’t do that, even though there’s 28 bytes of free memory!
This is less of a problem with virtual memory, which I encourage you to google. Basically, memory is organized by pages, and memory has to be contiguous in this virtual space and not in physical space. So if we have virtual memory with a page size of 2 bytes, we could do this:
----------------------------------
|ffffffffffffffeeeeff |
----------------------------------
And then because we’re using virtual memory, we can set up each two byte chunk so that the entire virtual memory space looks like the following:
------------------------------------------------------etc
| eeeeffffffffffffffff
------------------------------------------------------etc
Anyway, what really sucks about the malloc
problem is that our two primary objectives oftentimes run directly counter to each other. Keeping a linked list of free blocks will make calls best-case constant time, but unless we accept waste and fragmentation, it’s going to be worst-case more often than not.
Implementation
The simplest malloc
implementation uses the system call sbrk(2) to acquire memory and then use a linked list in order to store free chunks. In this case, free
is constant time, but malloc
is O(n).
But wait! There’s more! We also need to store metadata about each chunk, including size, free vs in-use, free-list pointers, etc. Well, you can’t really call malloc
from the allocator, so typically the metadata is stored in a header which precedes the address which is handed to the application. If the header is 16 bytes in size, the allocator will hand the program ptr
and the header is stored at ptr - 16
header ptr
v v
------------------------------------------------------
| chunk | |
------------------------------------------------------
Typically, malloc
implementations will use this metadata to combine chunks of free memory if the adjacent ones are free.
header ptr header ptr
v v v v
---------------------------------------------------------------------------
| chunk | | chunk | |
---------------------------------------------------------------------------
We can find the chunk to the right using this:
struct malloc_chunk *right = (struct malloc_chunk *)(ptr + header(ptr)->size);
However, we don’t know anything about the size of the pointer to the left, so mallocs which do this will typically put the size of each block in the footer!
header ptr header ptr size
v v v v v
---------------------------------------------------------------------------
| chunk | |s| chunk | |s|
---------------------------------------------------------------------------
We can find the left chunk using this:
struct malloc_chunk *left = (struct malloc_chunk *)(ptr - footer_left(ptr) - sizeof(size_t));
But wait! More system calls to sbrk
or mmap
don’t necessarily guarantee contiguous virtual addresses! We have to watch out that invalid pointers aren’t dereferenced adding to a pointer that’s on the edge of what’s being managed (this would seg fault).
The solution to a lot of this complexity comes by creating a separate data structure to keep track of the virtual addresses we’re managing. Allocators such as tcmalloc
simply store metadata in that structure which avoids a lot of pointer issues.
Further information can be found in CMU’s lectures