How Does the “File Size is Smaller Than Mem Size” Optimization Work in GNU ld for the .bss Section?

Published by Philipp Schuster on

This article explains how the GNU linker (GNU ld) can save disk space for symbols in the .bss section. TL;DR: If a section is of type SHT_NOBITS and if it is the last section in a LOAD segment, GNU ld uses this optimization.

When you build software with a compiled language, such as C, certain symbols, which may originate in variable declarations, may be initialized to zero. An example are the following two global variable declarations (C code), which will end up in the .bss section:

// Those two symbols (variables) will end up in the `.bss` section.

char global_buffer_uninitialized[512];
int flag;

int main() { return 0; }

By definition, uninitialized memory will be zeroed. The ELF specification says about .bss:

This section holds uninitialized data that contribute to the program’s memory image. By definition, the system initializes the data with zeros when the program begins to run. The section occupies no file space, as indicated by the section type, SHT_NOBITS.

ELF Specification Version 1.2, Page 1-15

I am almost confident that the C ISO standard also defines uninitialized memory to be zeroed on program startup, although I do not have a reference for that right now. As the content of those symbols, i.e., variables, is zeroed anyway, they do not need to be allocated statically in the ELF file. This is where the SHT_NOBITS section type comes into play. SHT stands for section type. Each section in an ELF file has a certain type. The ELF specification describes the SHT_NOBITS type as:

A section of this type occupies no space in the file but otherwise resembles SHT_PROGBITS. Although this section contains no bytes, the sh_offset member contains the conceptual file offset.

ELF Specification Version 1.2, Page 1-12

In GNU ld’s source code, we find the following snippet:

// GNU binutils @ 658ba81aef5 > bfd/elf.c > line 2627
static const struct bfd_elf_special_section special_sections_b[] =
{
  { STRING_COMMA_LEN (".bss"), -2, SHT_NOBITS,   SHF_ALLOC + SHF_WRITE },

So, each symbol that belongs to .bss benefits from this shortcut. Compilers will place symbols in this section if their memory is uninitialized or initialized to zero(s). This way, the $ readelf -Wl output from a typical Linux-program written in C looks like this:

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz  MemSiz   Flg Align
...
  LOAD           0x002df0 0x00003df0 0x00003df0 0x000224 0x000450 RW  0x1000
...
 Section to Segment mapping:
  Segment Sections...
...
   05     .init_array .fini_array .dynamic .got .data .bss 
...

We see that .bss is part of a LOAD segment that is readable and writable and its file size does not match the memory size. Another interesting detail is, that .bss is the last section of that segment. I do not know if this optimization has a special name, but we can clearly see that file size doesn’t equal memory size. Hence, an ELF loader needs to provide this extra memory and can’t just load the ELF into memory as it is.

In a small experiment, based on code you can find on my GitHub (experiment 1, experiment 2), I could verify these statements. To summarize: GNU ld uses this space-saving optimization if a section is of type SHT_NOBITS and placed as last section in its LOAD segment. Otherwise, not.

In another blog post, I explained how we can work around this and guarantee that all memory that .bss takes also lands “as is” in the file.


Philipp Schuster

Hi, I'm Philipp and interested in Computer Science. I especially like low level development, making ugly things nice, and de-mystify "low level magic".

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *