Why is all of this so complicated?
Compiling C++ can be described by one word: complex. Most of this really hails back to how C++ was designed back in the day, but the reality is that the compilation process for any language is complex. C++ works as hard as it can to create an uber-fast binary, and as such the steps are a little more nasty to deal with. Here, we’ll break it down into the component steps of the process.
Basically, we’re pretending like we just ran g++ main.cpp thepolice.cpp
Preprocessor
The first thing that happens is the preprocessor runs through each file, and the associated header file. In this case, our tree for the project looks like this:
$ tree .
|-- main.cpp
|-- main.hpp
|-- thepolice.cpp
|-- thepolice.hpp
The .hpp files are called header files, and typically only contain declarations (thank you, templates). The .cpp files are the actual source files, and contain implementations. This is sort of out of scope for this post, but basically in the .hpp you put this (Java analogy):
public int foo(short blah, Object o);
and in the .cpp file:
public int foo(short blah, Object o) {
// do stuff
return (int) blah;
}
Back to the preprocessor. The preprocessor runs through each file, and reads the source file as text and then produces another text file as output. Any lines which begin with ‘#’ are not actually C++, they’re written in the preprocessor language. #pragma once
, #include
and #ifndef
are all preprocessor directives. I can’t go into a lot of detail here, but the important thing to note is that any files which #include
a header file have that line replaced by the content of that header file. As a result, the compiler now knows that each thing defined in the header file is implemented (written out) in an associated source file! This allows the linker to combine everything (we’re getting there).
Compiler
The compiler is probably the most straightforward step, but it does have one relevant subtlety. Basically, it takes your source code and converts it to machine code. So
--- contents of iostream library from #include ---
int main() {
std::cout << "Hello, World\n";
return 0;
}
ends up looking more like
8020 78
8021 A9 80
8023 8D 15 03
8026 A9 2D
8028 8D 14 03
802B 58
802C 60
802D EE 20 D0
8030 4C 31 EA
The numbers of the left are memory addresses, and those on the right are the contents of bytes starting at those addresses.
The compiler takes in the output files from the preprocessor (.i extension for the C preprocessor) and outputs .o extension files, called “object files”.
Now for the subtlety. The object files are not just machine code. They also contain tags which reference external functions (those written in other files). Clearly, our project is nowhere near complete.
Note that between the source code and machine code it does get translated into assembly code and then machine code – it’s two distinct steps.
Linker
This step is where it all comes together, hence the name ‘Linker.’ Some people who build C++, such as myself, actually break this whole process into two steps: preprocessor/compiler and linker. Basically they run gcc
which generates object files, and then ld
which is the GNU linker.
As we noted before, the compiler couldn’t quite write pure machine code. The reason? The compiler only knows where stuff is in its own file. So instead it just writes notes on how it assumed stuff was layed out. Then the linker uses these notes to assign actual memory addresses to everything. This allows one file to call functions in another file. If you’re confused, do email me, because I can explain this better, just in more words.
Doing this by hand (why would you?)
- Run
cpp
which is the name of the preprocessor executable. (Preprocessor) - Run
gcc
on the .i files output by the preprocessor. (Compiler) - Run
as
on the .s files output by the compiler. (Assembler) - Run
ld
on the .o files output by the assembler. (Linker) - ???
- Profit?
edit: See this post
So that’s the full overview of the C++ compilation process. I totally didn’t cover multiarch like I meant to, so we’ll get to that next post, I suppose!
edit: Post is up, click here to read it!