Computer Systems Practical - Part 1: Object Files

Abstract

This article describes the basic knowledge of object files, which introduces the ELF-64 object file format in detail, and provides common usage of object file processing tools in Linux systems- readelf and objdump.

Object files

Object files come in three forms:

  • Relocatable object file. Contains binary code and data in a form that can be combined with other relocatable object files at compile time to create an executable object file.

  • Executable object file. Contains binary code and data in a form that can be copied directly into memory and executed.

  • Shared object file. A special type of relocatable object file that can be loaded into memory and linked dynamically, at either load time or run time.

Target file type Generator Loaded directly into memory for execution Suffix
Relocable object file Assembler No .o
Executable object file Linker Sure no suffix or arbitrary - defined in a file in the ELF Header
Shared object file Compiler/Linker No .so
  1. How to generate relocable object file
1
g++ -c main.cpp -o main.o
  1. How to generate executable object file
1
2
g++ main.cpp
g++ main.cpp -o main
  1. How to generate shared object file
1
2
g++ -shared -fPIC main.cpp # defaul a.out
g++ -shared -fPIC main.cpp -o main.so # main.so
  • shared option - linker option.
  • -fPIC option - compiler option.

Object files format

System Target file format
Windows PE (Portable Executable)
Mac OX-X Mach-O (Mach object)
X86-64-Linux/aarch64 Linux ELF-64
  1. Generate PE
1
/usr/bin/x86_64-w64-mingw32-g++ -o main_w64.exe main.cpp

Cross-compilers are required, such as: g++-mingw-w64

sudo apt-get install g++-mingw-w64.

  1. Generate ELF-64
1
g++ -c main.cpp -o main.o

-m32 for ELF-32, -m64 for ELF-64

1
readlelf -h main.o

Relocable object files (ELF-64 format)

ELFheader

ELFheader: The ELF header begins with a 16-byte sequence that describes the word size and byte ordering of the system that generated the file. The rest of the ELF header contains information that allows a linker to parse and interpret the object file.

  1. readelf -h <object file>

.text

.text: The machine code of the compiled program.

  1. objdump -d <object file> (only output assembly, output content includes: .init, .plt, .plt.got, .text, .fini, etc. sections)
  2. objdump -d <object file> (output assembly and corresponding source code. You need to add the -g option when compiling to print the source code.)
  3. objdump -d --section .text sum.so (only output assembly, and only print .text section)

.rodata

.rodata: Read-only data such as the format strings in printf statements, and jump tables for switch statements.

.data

.data: Initialized global and static C variables and whose initial value is not 0. Local C variables are maintained at run time on the stack and do not appear in either the .data or .bss sections.

.bss

.bss: Uninitialized global and static C variables, along with any global or static variables that are initialized to zero. Object file formats distinguish between initialized and uninitialized variables for space efficiency: unini- tialized variables do not have to occupy any actual disk space in the object file. At run time, these variables are allocated in memory with an initial value of zero.

.symtab

.symtab: A symbol table with information about functions and global variables that are defined and referenced in the program. Some programmers mis- takenly believe that a program must be compiled with the -g option to get symbol table information. In fact, every relocatable object file has a symbol table in .symtab (unless the programmer has specifically re- moved it with the strip command). However, unlike the symbol table inside a compiler, the .symtab symbol table does not contain entries for local variables.

  1. readelf -s <object file>
  2. Command to delete the .symtab section in the object file is: strip <object file>.

.rel.text

.rel.text: A list of locations in the .text section that will need to be modified when the linker combines this object file with others. In general, any instruction that calls an external function or references a global variable will need to be modified. On the other hand, instructions that call local functions do not need to be modified. Note that relocation information is not needed in executable object files, and is usually omitted unless the user explicitly instructs the linker to include it.

  1. readelf -r <object file>

.rel.data

.rel.data: Relocation information for any global variables that are referenced or defined by the module. In general, any initialized global variable whose initial value is the address of a global variable or externally defined function will need to be modified.

  1. readelf -r <object file>

.debug

.debug: A debugging symbol table with entries for local variables and typedefs defined in the program, global variables defined and referenced in the program, and the original C source file. It is only present if the compiler driver is invoked with the -g option.

  1. readelf --debug-dump <object file>
  2. objdump -g <object file>
  3. Command to delete debugging-related sections such as .debug in the object file: strip --strip-debug <object file>.
  4. Only delete the sections related to debugging information: strip <object file>.

.line

.line: A mapping between line numbers in the original C source program and machine code instructions in the .text section. It is only present if the compiler driver is invoked with the -g option.

.comment

.comment: Version control information.

  1. readelf -p .comment <object file>.

.shstrtab

.shstrtab: A string table containing the names of all sections.

  1. readelf -p .shstrtab <object file>.

.strtab

.strtab: A tring table containing the names of all symbols.

  1. readelf -p .strtab <object file>.
  2. The command to delete .strtab: strip <object files>.

Section header table: Describe the location and size of all sections of the object files.

  1. readelf -S <object file>.

Note

  • The meaning of the abbreviations in the readelf command:

    -h is the abbreviation of –file-header

    -s is the abbreviation of –symbols

    -r is the abbreviation of –relocs

    -p is the abbreviation of –string-dump

    -S is the abbreviation of section-headers.

  • The meaning of the abbreviations in the objdump command:

    -d is the abbreviation of –disassemble

    -S is the abbreviation of –source

    -g is the abbreviation of –debugging.

Note CS:APP (Third Edition) Section 7.4 describes that the section header table name is contained in .strtabsection.

But after actual verification by the author, in X86-64 Linux (Ubuntu 20.04), the section header table name is included in .shstrtab instead of .strtabsection. This point needs attention.

Executable object files (ELF-64 format)

ELFheader - read only - memory when exec

ELFheader: The ELF header begins with a 16-byte sequence that describes the word size and byte ordering of the system that generated the file. The rest of the ELF header contains information that allows a linker to parse and interpret the object file.

  1. readelf -h <object file>

Program header table - read only - memory when exec

Describe the location and size of segments in the executable target file and other information.

  1. readelf -l <object file>
  2. objdump -p <object file>

.interp - read only - memory when exec

This section saves the path of the dynamic linker required for the executable object file.

  1. readelf -p .interp <object file>
  2. readelf -l <object file> | grep interpreter
  3. objdump -s --section .interp <object file>

.dynsym - read only - memory when exec

Contains symbols that need dynamic links.

  1. readelf --dyn-syms <object file>
  2. readelf -s <object file>

.dynstr - read only - memory when exec

Dynamic Link Symbol Table, including only the symbol names that require dynamic links.

  1. readelf -p .dynstr <object file>

.rela.dyn - read only - memory when exec

Data repositioning table in dynamic links. Similar to .rela.data in static links.

  1. readelf -r <object file>

.rela.plt - read only - memory when exec

Code repositioning table in dynamic links. Similar to .rela.text in static links.

  1. readelf -r <object file>

.init - readable, executable - memory when exec

This section defines a function named _init, which will be called by the program initialization code.

  1. objdump -d <object file>

.text - readable, executable - memory when exec

The machine code of the compiled program.

  1. objdump -d <object file>
  2. objdump -S <object file>
  3. objdump -d --section .text sum.so

.rodata - read only - memory when exec

Read-only data, such as constant strings, global variables modified by consts and static variables.

.dynamic - read only - memory when exec

The basic information of dynamic links is saved in this section.

  1. readelf -d <object file>
  2. objdump -p <object file

.data - read, write - memory when exec

Global and static variables that have been initialized and whose initial value is not zero.

.bss - read, write - memory when exec

Global and static variables that are not initialized or have an initial value of 0.

.symtab

A symbol table that stores information about functions and global variables and static variables defined and referenced in the object file.

  1. readelf -s <object file>

.debug

A table of debug symbols whose entries are local variables and typedefs defined in the program, global variables defined and referenced in the program, and the original C source file. ( Only when compiled with -g will it be generated)

  1. readelf --debug-dump <object file>
  2. objdump -g <object file>

.comment

  1. readelf -p .comment <object file>

.shstrtab

  1. readelf -p .shstrtab <object file>

.strtab

  1. readelf -p .strtab <object file>

Section header table

  1. readelf -S <object file>

Note: As described in Section 7.8 of CS:APP (Third Edition), ELF Header, Program Header Table and .rodatasection are all located in Code Segment. But after actual verification by the author, in X86-64 Linux (Ubuntu 20.04), ELF Header and Program Header Table are located in the same segment (according to the order, here is called The First Segment), but not where the .text is Code snippet. In addition, the section where the .rodatasection is located (referred to here as The Third Segment) is not a code section. These two points need to be noted.`

Different .symtab and .dynsym

Section Content Load into memory Deleted throught strip command
.symtab All symbols No Yes
.dynsym Only symbols that need dynamic links are included Yes No