Computer Systems Practical - Part 1: Object Files
Abstract
This article describes the basic knowledge of object files, which introduces the ELF-64
object file format in detail, and provides common usage of object file processing tools in Linux systems- readelf
and objdump
.
Object files
Object files come in three forms:
Relocatable object file. Contains binary code and data in a form that can be combined with other relocatable object files at compile time to create an executable object file.
Executable object file. Contains binary code and data in a form that can be copied directly into memory and executed.
Shared object file. A special type of relocatable object file that can be loaded into memory and linked dynamically, at either load time or run time.
Target file type | Generator | Loaded directly into memory for execution | Suffix |
---|---|---|---|
Relocable object file | Assembler | No | .o |
Executable object file | Linker | Sure | no suffix or arbitrary - defined in a file in the ELF Header |
Shared object file | Compiler/Linker | No | .so |
- How to generate relocable object file
1 | g++ -c main.cpp -o main.o |
- How to generate executable object file
1 | g++ main.cpp |
- How to generate shared object file
1 | g++ -shared -fPIC main.cpp # defaul a.out |
shared
option - linker option.-fPIC
option - compiler option.
Object files format
System | Target file format |
---|---|
Windows | PE (Portable Executable) |
Mac OX-X | Mach-O (Mach object) |
X86-64-Linux/aarch64 Linux | ELF-64 |
- Generate
PE
1 | /usr/bin/x86_64-w64-mingw32-g++ -o main_w64.exe main.cpp |
Cross-compilers are required, such as: g++-mingw-w64
sudo apt-get install g++-mingw-w64
.
- Generate
ELF-64
1 | g++ -c main.cpp -o main.o |
-m32
for ELF-32, -m64
for ELF-64
1 | readlelf -h main.o |
Relocable object files (ELF-64 format)
ELFheader
ELFheader
: The ELF header begins with a 16-byte sequence that describes the word size and byte ordering of the system that generated the file. The rest of the ELF header contains information that allows a linker to parse and interpret the object file.
readelf -h <object file>
.text
.text
: The machine code of the compiled program.
objdump -d <object file>
(only output assembly, output content includes: .init, .plt, .plt.got, .text, .fini, etc. sections)objdump -d <object file>
(output assembly and corresponding source code. You need to add the -g option when compiling to print the source code.)objdump -d --section .text sum.so
(only output assembly, and only print .text section)
.rodata
.rodata
: Read-only data such as the format strings in printf statements, and jump tables for switch statements.
.data
.data
: Initialized global and static C variables and whose initial value is not 0. Local C variables are maintained at run time on the stack and do not appear in either the .data or .bss sections.
.bss
.bss
: Uninitialized global and static C variables, along with any global or static variables that are initialized to zero. Object file formats distinguish between initialized and uninitialized variables for space efficiency: unini- tialized variables do not have to occupy any actual disk space in the object file. At run time, these variables are allocated in memory with an initial value of zero.
.symtab
.symtab
: A symbol table with information about functions and global variables that are defined and referenced in the program. Some programmers mis- takenly believe that a program must be compiled with the -g option to get symbol table information. In fact, every relocatable object file has a symbol table in .symtab (unless the programmer has specifically re- moved it with the strip command). However, unlike the symbol table inside a compiler, the .symtab symbol table does not contain entries for local variables.
readelf -s <object file>
- Command to delete the
.symtab
section in the object file is:strip <object file>
.
.rel.text
.rel.text
: A list of locations in the .text section that will need to be modified when the linker combines this object file with others. In general, any instruction that calls an external function or references a global variable will need to be modified. On the other hand, instructions that call local functions do not need to be modified. Note that relocation information is not needed in executable object files, and is usually omitted unless the user explicitly instructs the linker to include it.
readelf -r <object file>
.rel.data
.rel.data
: Relocation information for any global variables that are referenced or defined by the module. In general, any initialized global variable whose initial value is the address of a global variable or externally defined function will need to be modified.
readelf -r <object file>
.debug
.debug
: A debugging symbol table with entries for local variables and typedefs defined in the program, global variables defined and referenced in the program, and the original C source file. It is only present if the compiler driver is invoked with the -g option.
readelf --debug-dump <object file>
objdump -g <object file>
- Command to delete debugging-related sections such as
.debug
in the object file:strip --strip-debug <object file>
. - Only delete the sections related to debugging information:
strip <object file>
.
.line
.line
: A mapping between line numbers in the original C source program and machine code instructions in the .text section. It is only present if the compiler driver is invoked with the -g option.
.comment
.comment
: Version control information.
readelf -p .comment <object file>
.
.shstrtab
.shstrtab
: A string table containing the names of all sections.
readelf -p .shstrtab <object file>
.
.strtab
.strtab
: A tring table containing the names of all symbols.
readelf -p .strtab <object file>
.- The command to delete
.strtab
:strip <object files>
.
Section header table
: Describe the location and size of all sections of the object files.
readelf -S <object file>
.
Note
The meaning of the abbreviations in the readelf command:
-h is the abbreviation of –file-header
-s is the abbreviation of –symbols
-r is the abbreviation of –relocs
-p is the abbreviation of –string-dump
-S is the abbreviation of section-headers.
The meaning of the abbreviations in the objdump command:
-d is the abbreviation of –disassemble
-S is the abbreviation of –source
-g is the abbreviation of –debugging.
Note CS:APP (Third Edition) Section 7.4 describes that the section header table name is contained in .strtabsection
.
But after actual verification by the author, in X86-64 Linux (Ubuntu 20.04), the section header table name is included in .shstrtab instead of .strtabsection. This point needs attention.
Executable object files (ELF-64 format)
ELFheader - read only - memory when exec
ELFheader
: The ELF header begins with a 16-byte sequence that describes the word size and byte ordering of the system that generated the file. The rest of the ELF header contains information that allows a linker to parse and interpret the object file.
readelf -h <object file>
Program header table - read only - memory when exec
Describe the location and size of segments in the executable target file and other information.
readelf -l <object file>
objdump -p <object file>
.interp - read only - memory when exec
This section saves the path of the dynamic linker required for the executable object file.
readelf -p .interp <object file>
readelf -l <object file> | grep interpreter
objdump -s --section .interp <object file>
.dynsym - read only - memory when exec
Contains symbols that need dynamic links.
readelf --dyn-syms <object file>
readelf -s <object file>
.dynstr - read only - memory when exec
Dynamic Link Symbol Table, including only the symbol names that require dynamic links.
readelf -p .dynstr <object file>
.rela.dyn - read only - memory when exec
Data repositioning table in dynamic links. Similar to .rela.data in static links.
readelf -r <object file>
.rela.plt - read only - memory when exec
Code repositioning table in dynamic links. Similar to .rela.text in static links.
readelf -r <object file>
.init - readable, executable - memory when exec
This section defines a function named _init, which will be called by the program initialization code.
objdump -d <object file>
.text - readable, executable - memory when exec
The machine code of the compiled program.
objdump -d <object file>
objdump -S <object file>
objdump -d --section .text sum.so
.rodata - read only - memory when exec
Read-only data, such as constant strings, global variables modified by consts and static variables.
.dynamic - read only - memory when exec
The basic information of dynamic links is saved in this section.
readelf -d <object file>
objdump -p <object file
.data - read, write - memory when exec
Global and static variables that have been initialized and whose initial value is not zero.
.bss - read, write - memory when exec
Global and static variables that are not initialized or have an initial value of 0.
.symtab
A symbol table that stores information about functions and global variables and static variables defined and referenced in the object file.
readelf -s <object file>
.debug
A table of debug symbols whose entries are local variables and typedefs defined in the program, global variables defined and referenced in the program, and the original C source file. ( Only when compiled with -g will it be generated)
readelf --debug-dump <object file>
objdump -g <object file>
.comment
readelf -p .comment <object file>
.shstrtab
readelf -p .shstrtab <object file>
.strtab
readelf -p .strtab <object file>
Section header table
readelf -S <object file>
Note: As described in Section 7.8 of CS:APP (Third Edition), ELF Header, Program Header Table and .rodatasection are all located in Code Segment. But after actual verification by the author, in X86-64 Linux (Ubuntu 20.04), ELF Header and Program Header Table are located in the same segment (according to the order, here is called The First Segment), but not where the .text is Code snippet. In addition, the section where the .rodatasection is located (referred to here as The Third Segment) is not a code section. These two points need to be noted.`
Different .symtab and .dynsym
Section | Content | Load into memory | Deleted throught strip command |
---|---|---|---|
.symtab | All symbols | No | Yes |
.dynsym | Only symbols that need dynamic links are included | Yes | No |