GNU ld Discards Section Containing Code – Section Flags are Important for ELF Files
Update: Since I published this article, I gained more knowledge from own research and hints from others. I adjusted the blog post. Thanks for the help!
When it comes to low-level systems engineering and producing binaries, such as firmware or kernels, dealing with linkers and especially linker scripts feels like the end boss in a video game. Only very little (good) documentation can be found on the Web. This blog post is the result of a 3 to 4-hour troubleshooting session where my colleague Thomas Prescher (known from the Meltdown paper) supported me – Thanks a lot!
TL;DR
In the end, we didn’t encounter a bug, but a behavior during linking that we did not expect. Specifiying section flags in GNU Assembly (GAS) is important and should not be omitted when using GNU Assembler (as). They will land in the resulting object file and are taken into account during linktime by the linker. This is described in the ELF specification. There is a minimal reproducible example on GitHub if you prefer to read code instead of a write-up.
Scope
This blog post covers topics such as GNU Assembler (as), assembly language with GNU syntax (GAS), ELF internals, and linking compiled assembly together with code produced from a high level language. The findings of this blog post apply for C/C++ projects with global assembly as well as Rust projects with global assembly.
Objective
I was about to compile a kernel with Rust. My goal was to create an ELF file that looks like this output from readelf
:
Elf file type is EXEC (Executable file) Entry point 0x800000 There are 2 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x001000 0x0000000000800000 0x0000000000800000 0x000011 0x000011 R E 0x1000 LOAD 0x002000 0xffffffff88000000 0xffffffff88000000 0x00000f 0x00000f R E 0x1000 Section to Segment mapping: Segment Sections... 00 .init_asm 01 .text
I want two load segments. Inside the segment containing the .text
section(s), I want the compiled x86_64 code from a high-level language that uses 64-bit virtual addresses. Inside the .init_asm
segment, I want to place code that is relevant for bootstrapping the x86_64 CPU: A mixture of 16 bit, 32 bit, and 64 bit code.
Issue Hunting Story
Next to the high level code (C or Rust), I need some assembly code that does the early bootstrapping of the CPU. My initial intuition was to assign a dedicated section in the assembly code that describes the boot code:
# start symbol must be globally available (linker must find it, don't discard it) .GLOBAL entry_asm .EXTERN entry_highlevel_lang .section .boot # Entry referenced by the ELF file entry_asm: movabs $0xdeadbeef, %r15 jmp entry_highlevel_lang ud2
The corresponding linker file looks like like the following:
/* * Custom linker script that ensures that boot code (written in assembly) and code from high * level language (.text section) are placed in different segments. */ /* Symbol comes from start.S */ ENTRY(entry_asm) PHDRS { /* PT_LOAD FLAGS(5): The flags of an ELF program header. Always 32 bit long, also for 64-bit ELFs. Also called "Segment Permissions" in ELF specification or "p_flags". */ init_asm PT_LOAD FLAGS(5); /* 0b101 */ kernel_rx PT_LOAD FLAGS(5); /* 0b101 */ } SECTIONS { .init_asm 8M : { /* By the way: KEEP(*(.boot)); didn't work here either. */ *(.boot) } : init_asm /* High level code (C or Rust) will be linked here. */ .text 0xffffffff88000000 : ALIGN(4K) { *(.text .text.*) } : kernel_rx /DISCARD/ : { *(.comment .comment.*) *(.eh_frame) } }
However, the readelf
looked like this:
Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x0000b0 0x0000000000000000 0x0000000000000000 0x000000 0x000000 R E 0x1000 LOAD 0x001000 0xffffffff88000000 0xffffffff88000000 0x00000f 0x00000f R E 0x1000 Section to Segment mapping: Segment Sections... 00 01 .text
Notice that the virtual address of the code inside the first load segment is zero. This should be 8 MiB. Furthermore, Segment 00 contains no sections but it should contain the .init_asm
section defined in the linker script. By the way, KEEP(*(.boot));
inside the linker script didn’t work either. With objdump
, I could find that the code seems to be inside the binary at least:
Disassembly of section .init_asm: 0000000000800000 <entry_asm>: 800000: 49 bf ef be ad de 00 movabs $0xdeadbeef,%r15 800007: 00 00 00 80000a: e9 f1 ff 7f 87 jmp ffffffff88000000 <entry_highlevel_lang> 80000f: 0f 0b ud2 Disassembly of section .text: ffffffff88000000 <entry_highlevel_lang>: ffffffff88000000: f3 0f 1e fa endbr64 ffffffff88000004: 55 push %rbp ffffffff88000005: 48 89 e5 mov %rsp,%rbp ffffffff88000008: b8 00 00 00 00 mov $0x0,%eax ffffffff8800000d: 5d pop %rbp ffffffff8800000e: c3 ret
However, this seemed strange and broken. Why wasn’t the section showing up in readelf
? After long investigation, I could trace the problem down to a minimal reproducible example. I set up a GitHub repository. The commits in it correspond to the steps in the following paragraphs. For simplicity, I decided for Assembly in combination with C as the problem was in the interaction between the object files generated from the assembly file and the Linker. However, the same applies if you use global assembly from a Rust project. I verified that.
Anyway, back to the actual problem. Next, I found out that if I name the section .init
in my assembly file and adjust the linker script to link the code from the *(.init)
section, it is working. This step corresponds to this commit in my demo project. Like what the heck? Is GNU linker using some kind of obscure allow list for section names?
In the next step, I found out that using section .text.boot
and *(.text.boot)
accordingly in the linker script, is also working as name. Thus, readelf
looks fine (this output is equal to the previous step where I named the section .init
):
Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x001000 0x0000000000800000 0x0000000000800000 0x000011 0x000011 R E 0x1000 LOAD 0x002000 0xffffffff88000000 0xffffffff88000000 0x000038 0x000038 R E 0x1000 Section to Segment mapping: Segment Sections... 00 .init_asm 01 .text
The virtual address is at 8 MiB and the final ELF file also looks good when I verify it in objdump
. Segment 00 contains the section .init_asm
. Great!
But this is a workaround and not a solution. Why can’t we use custom names for sections? Why is .init
fine as name? We tried different versions of GNU ld
and readelf
. We thought there might be an obscure bug somewhere. Close to our surrender, I had the idea to compare my code one more time to other projects. I found out that some use .section .foo, "ax"
inside the assembly. And folks, believe me or not, this does the trick! Also check out this commit. This small appendix is called section flags and described in the GAS manual. It marks this section as allocatable and executable. However, it doesn’t say why this is necessary and how it relates to linker scripts.
I did some further investigation of the object file from start.S
. Down below, you find the output of readelf -WS start.o
where start.S
doesn’t contain section flags but a custom section name:
There are 10 section headers, starting at offset 0x158: Section Headers: [Nr] Name Type Address Off Size ES Flg Lk Inf Al [ 0] NULL 0000000000000000 000000 000000 00 0 0 0 [ 1] .text PROGBITS 0000000000000000 000040 000000 00 AX 0 0 1 [ 2] .data PROGBITS 0000000000000000 000040 000000 00 WA 0 0 1 [ 3] .bss NOBITS 0000000000000000 000040 000000 00 WA 0 0 1 [ 4] .boot PROGBITS 0000000000000000 000040 000011 00 0 0 1 [ 5] .rela.lol RELA 0000000000000000 0000f0 000018 18 I 7 4 8 [ 6] .note.gnu.property NOTE 0000000000000000 000058 000030 00 A 0 0 8 [ 7] .symtab SYMTAB 0000000000000000 000088 000048 18 8 1 8 [ 8] .strtab STRTAB 0000000000000000 0000d0 000020 00 0 0 1 [ 9] .shstrtab STRTAB 0000000000000000 000108 000049 00 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), I (info), L (link order), O (extra OS processing required), G (group), T (TLS), C (compressed), x (unknown), o (OS specific), E (exclude), D (mbind), l (large), p (processor specific)
If I add the section flags, or use .init
or .text.boot
, the diff of readelf -WS start.o
looks like this:
< [ 4] .boot PROGBITS 0000000000000000 000040 000011 00 AX 0 0 1 --- > [ 4] .boot PROGBITS 0000000000000000 000040 000011 00 0 0 1
You can see that AX
flags are added to the section flags of the .boot
section. If I analyze the linked binary with readelf -WS c_kernel
, I can also see in one case that AX
is added to the binary. Hence, section flags seem to be inherited.
ELF Internals
In the end, this is not a specific behavior of the GNU assembler or the GNU linker. Actually, the answer can be found in the ELF specification in the “Section Attribute Flags” section. There, the allocatable and the executable flags, thus AX
, are described. They are called SHF_ALLOC
and SHF_EXECINSTR
inside the spec. The description of the A
flag says:
The section occupies memory during process execution. Some control
ELF Spec (v1.2), Page 28 / Sections 1-14
sections do not reside in the memory image of an object file; this attribute
is off for those sections.
So yeah, it all makes sense now. Section names such as .init
or .text.boot
are fine as they have the `AX` flags automatically set in their section. This can be found in Figure 1-2. Special Sections (Page 67 / Object Files 1-3).
Outcome & Findings
We finally know how to solve this issue. I learned that section flags exist, that section flags are stored in object files (start.o
compiled from start.S
), and that they are relevant for linking. This comes from the ELF specification. Without the "ax"
flags, the linker doesn’t behave as you expect it from your linker script – depending on the name of your section.
In the end it looks easy of course, but it was quite a long road until there. There was little that helped debugging along the way. However, we will never do this mistake again. Furthermore, I’ve become very experienced with linker scripts and linking in general in this process. That’s awesome! 🙂
As a future advice: write test scripts if you produce special ELF files, such as kernel binaries. With grep
, objdump
, and readelf
, you can write basic tests that ensure that sections are in segments where you expect them.
0 Comments