How Does the “File Size is Smaller Than Mem Size” Optimization Work in GNU ld for the .bss Section?
This article explains how the GNU linker (GNU ld) can save disk space for symbols in the .bss
section. TL;DR: If a section is of type SHT_NOBITS
and if it is the last section in a LOAD segment, GNU ld uses this optimization.
When you build software with a compiled language, such as C, certain symbols, which may originate in variable declarations, may be initialized to zero. An example are the following two global variable declarations (C code), which will end up in the .bss
section:
// Those two symbols (variables) will end up in the `.bss` section. char global_buffer_uninitialized[512]; int flag; int main() { return 0; }
By definition, uninitialized memory will be zeroed. The ELF specification says about .bss
:
This section holds uninitialized data that contribute to the program’s memory image. By definition, the system initializes the data with zeros when the program begins to run. The section occupies no file space, as indicated by the section type,
ELF Specification Version 1.2, Page 1-15SHT_NOBITS
.
I am almost confident that the C ISO standard also defines uninitialized memory to be zeroed on program startup, although I do not have a reference for that right now. As the content of those symbols, i.e., variables, is zeroed anyway, they do not need to be allocated statically in the ELF file. This is where the SHT_NOBITS
section type comes into play. SHT
stands for section type. Each section in an ELF file has a certain type. The ELF specification describes the SHT_NOBITS
type as:
A section of this type occupies no space in the file but otherwise resembles
ELF Specification Version 1.2, Page 1-12SHT_PROGBITS
. Although this section contains no bytes, thesh_offset
member contains the conceptual file offset.
In GNU ld’s source code, we find the following snippet:
// GNU binutils @ 658ba81aef5 > bfd/elf.c > line 2627 static const struct bfd_elf_special_section special_sections_b[] = { { STRING_COMMA_LEN (".bss"), -2, SHT_NOBITS, SHF_ALLOC + SHF_WRITE },
So, each symbol that belongs to .bss
benefits from this shortcut. Compilers will place symbols in this section if their memory is uninitialized or initialized to zero(s). This way, the $ readelf -Wl
output from a typical Linux-program written in C looks like this:
Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align ... LOAD 0x002df0 0x00003df0 0x00003df0 0x000224 0x000450 RW 0x1000 ... Section to Segment mapping: Segment Sections... ... 05 .init_array .fini_array .dynamic .got .data .bss ...
We see that .bss
is part of a LOAD segment that is readable and writable and its file size does not match the memory size. Another interesting detail is, that .bss
is the last section of that segment. I do not know if this optimization has a special name, but we can clearly see that file size doesn’t equal memory size. Hence, an ELF loader needs to provide this extra memory and can’t just load the ELF into memory as it is.
In a small experiment, based on code you can find on my GitHub (experiment 1, experiment 2), I could verify these statements. To summarize: GNU ld uses this space-saving optimization if a section is of type SHT_NOBITS
and placed as last section
in its LOAD segment. Otherwise, not.
In another blog post, I explained how we can work around this and guarantee that all memory that .bss
takes also lands “as is” in the file.
0 Comments