Computer Systems Practical - Part 0: The Compilation System

Abstract

This article describes the workflow of the gcc/g++ compilation system and provides different methods to manually complete each stage, so as to better understand the program translation process.

The compilation system

Preprocessor

First call the C preprocessor (an executable object file named cpp, usually located in the /usr/bin/ directory) to extend the source code of each source file (for example: files ending in .c or .cc or .cpp)

Insert all files specified with #include command and expand all macros specified with #define declaration.

The output result of preprocessing a source file is an intermediate file ending with .i, that is, the intermediate file of the source code encoded as ASCII code.

1
cpp -std=c++11 main.cpp -o main.i
1
g++ -E main.cpp -o main.i

Note

  • The main.cpp contains the header file (covered in the C++11 standard). Therefore, you need to add the -std=c++11 option when calling the preprocessor cpp directly, but you do not need to add this option when you use g++ -E.

  • If the source file and the user header file it contains are not in the same directory, you need to add the -I option to specify the search path to successfully generate the .i file.

  • Example:

    1. When the user header files are in the same directory, the corresponding command: g++ -E main.cpp -o main.i -I <directory>.
    2. When the user header files are in different directories, the corresponding command: g++ -E main.cpp -o main.i -I <directory 1> -I <directory 2> -I <directory n>.
  • In some gcc/g++ versions, the preprocessor is integrated into the compilation drive instead of being present as a stand-alone program.

Compiler

Secondly, call the compiler (named cc1-executable object file for compiling C programs or cc1plus-executable object file for compiling C++ programs, both of which are located in /usr/lib/gcc on my machine /x86_64-linux-gnu/6) compile the expanded source code (files ending with .i) into assembly code (files ending with .s, that is, assembly language files encoded as ASCII codes).

1
2
# /usr/lib/gcc/x86_64-linux-gnu/6/cc1plus -o main.s main.cpp <other arguments>
/usr/lib/gcc/x86_64-linux-gnu/6/cc1plus -o main.s main.cpp -quiet -v -imultiarch x86_64-linux-gnu -D_GNU_SOURCE -quiet -dumpbase -mtune=generic -march=x86-64 -auxbase-strip -version -fstack-protector-strong -Wformat -Wformat-security
1
g++ -S main.i -o main.s

Note

  • How to determine <other arguments>
  • If you need to add debugging information, you can only add the -g option when performing this step. Other periods: Step 1), step 3), step 4) adding -g option has no effect. You can see sections related to debugging in the assembly code file generated by adding -g, such as: .debug_aranges, .debug_info, .debug_abbrev, .debug_line, .debug_str, .debug_ranges, etc.
  • One way to tell whether the target file is a DEBUG version or a RELEASE version: readelf -S main | grep debug. If it is a DEBUG version, there will be an information output with .debug*; otherwise, nothing will be output.
  • The input of the compiler can be either the source file or the preprocessed file.
  • For source files ending in .cpp, whether gcc or g++, the actual compiler is cc1plus.
  1. -v option - print output.
1
g++ -S main.cpp -o main.s -v
  1. The complete compilation option is the content after cc1plus.
1
-quiet -v -imultiarch x86_64-linux-gnu -D_GNU_SOURCE main.cpp -quiet -dumpbase main.cpp -mtune=generic -march=x86-64 -auxbase-strip main.s -version -o main.s -fstack-protector-strong -Wformat -Wformat-security

Assembler

Next, call the assembler (executable object file named as, usually located in the /usr/bin/ directory) to convert the assembly code (file ending in .s) into relocatable object code (ending in .o) File, the binary representation of the assembly code, but the address of the global value has not yet been filled in).

1
as main.s -o main.o
1
g++ -c main.s -o main.o

Linker

Finally, call the linker (an executable object file named ld, usually located in the /usr/bin/ directory) to combine (multiple) relocatable object files and some necessary system object files, and generate the final executable object file Execute the target file.

1
2
ld -o main <list.o> <system object files and args>
ld -o main main.o other.o -pthread
1
g++ -o main main.o other.o -pthread

Note

  • g++ -o main main.o other.o -pthread command combines multiple relocatable object files (main.o, other.o) and system object files (via -pthread option specified) and so on to generate the final executable object file-main.
  1. -v option - print output.