Kernel/Boot: Cope with a Relocation by a Bootloader in 32-bit x86 Assembly Code

Published by Philipp Schuster on

A kernel should be relocatable by bootloaders as this is the only guarantee that it doesn’t interfere with addresses already in use by the firmware. This is especially true for the legacy boot flow on x86, i.e., without UEFI.

Update 2024-01-29: I just published a new blog post focusing more on the background. It might be worth a read for you!

TL;DR

# find the relocation offset and store it in %edx
call    1f
1:
pop     %edx
sub     $1b,    %edx 

Introduction

There are multiple ways to make a kernel relocatable. One option is to use the “relocatable” tag when the kernel uses the Multiboot2 boot flow. The assembly code must cope with runtime relocations. For example, symbols, such as a buffer, are usually referenced by their absolute link address. Hence, the assembly code must find its load offset and resolve the actual address of certain symbols.

In this blog post, I show one approach that works for me to cope with relocations in a kernel whose boot code/init routine (or however you want to call it) can cope with relocations. This init routine is written in x86 32-bit assembly code. Eventually, the code might jump into higher-level C code after page-tables are initialized – but this is not covered by this post.

To get the relocation offset, we need to calculate the difference between the instruction pointer at a known link address and its actual load address. There are different ways of achieving this. For example, in 32-bit x86 assembly, this can be done with a call to a known place and an extraction of the previous instruction pointer, i.e., the return address, from the stack. On x86 in 64-bit mode, one would use the lea instruction for that instead. The approach with call works like this:

# find the relocation offset and store it in %edx
call 1f
1: pop  %edx
sub     $1b,    %edx 

As we use call, we need a valid stack, i.e., a sane value in the %esp register. But where do we get that? We could guess.. but.. do we really want this? Of course, no. Multiboot’s boot flow doesn’t give us a stack. But in the case of a Multiboot handoff (either version, one or two), we luckily have a pointer to the Multiboot information structure in %ebx. It points to valid memory in physical RAM. After handoff, it is in the %ebx register. We use it as our temporary stack. 😈

I created the following macro in GNU Assembler (GAS)-flavor that performs all relevant steps. This version is thoroughly commented:

/*
 * Calculates the relocation offset of the assembly code during runtime for x86
 * in 32-bit mode. In 64-bit mode, you should use the `lea` instruction as it
 * doesn't need the stack. The relocation offset is stored in %edx.
 *
 * This macro only needs to be called once. The macro itself is position
 * independend.
 *
 * Clobbers: %edx
 */
.macro M_CALC_RELOCATION_OFFSET_32BIT
    /*
     * Relative call to local label "1". This pushes the instruction pointer
     * (ip) of the next instruction onto the stack.
     */
    call    1f  # 1f: forward search for label "1"

    /*
     * After the call, the stack now contains return address, i.e., the runtime
     * address of the next instruction. Then, the ip is popped from the into
     * the register.
     */

    1: pop  %edx

    /*
     * We subtract the link address of label "1" from the actual ip. This
     * register now contains the offset between link address and load address
     * (=runtime address).
     */

    sub     $1b,    %edx    # 1b: backward search for label "1"
.endm

/*
 * Wrapper around `M_CALC_RELOCATION_OFFSET_32BIT` that sets up a temporary
 * stack from a provided memory address. The macro restores the memory of that
 * memory after the relocation offset was calculated.
 *
 * Parameters:
 * - reg_tmp_stack: Register that holds the begin address of an at least 4-byte
 *   long physical memory area that is writeable.
 * Clobbers: %eax, %edx, %esp, flags
 */
.macro M_CALC_RELOCATION_OFFSET_32BIT_WRAPPER reg_tmp_stack
        # prepare minimal stack for the next call
        mov     (\reg_tmp_stack),   %eax    # save original memory content
        mov     \reg_tmp_stack,     %esp

        M_CALC_RELOCATION_OFFSET_32BIT

        # restore memory that was used as stack
        mov     %eax,           (\reg_tmp_stack)
.endm

The macro can be used like this:

/* include macros here */

.code32
.global start
start:

    # Find offset of the relocation and store it in %edx.
    M_CALC_RELOCATION_OFFSET_32BIT_WRAPPER     %ebx

Now we have the load offset in register %edx. From now on, we can add this value to any known link address, and we have the actual address. We can create a dedicated macro for that:

/*
 * Resolves the runtime address of an symbol, i.e., an link address, and stores
 * the value in %eax. For this, it adds the relocation offset onto that address.
 *
 * This is useful to find symbols if the code was relocated.
 *
 * Parameters:
 * - link_addr: Link address either as symbol name or as numeric value.
 * Clobbers: %eax, %edx, flags
 */
.macro M_RESOLVE_RUNTIME_ADDR link_addr
    mov     $\link_addr,    %eax
    add     %edx,           %eax    # add relocation offset
.endm

All together, it might look like the following:

/* include macros here */

.code32
.global start
start:

    # Find offset of the relocation and store it in %edx.
    M_CALC_RELOCATION_OFFSET_32BIT_WRAPPER     %ebx

    # Find the runtime address of the top of the stack.
    M_RESOLVE_RUNTIME_ADDR stack_top

    # Move the correct backing storage for the stack into %esp. 
    mov     %eax,   %esp

.align 4
.zero 1024
stack_top:

This code example calculates its relocation offset and finds the actual address of the top of the stack. Afterwards, the calculated address is moved to the %esp register. Now, we have a fully initialized and valid stack – yeah! And that is all magic that is required for relocatable 32-bit code. A working example that performs this actions and sets up page-tables next can be found on my GitHub.

That was easy, wasn’t it? Feel free to copy my macro to your project. By the way: If you ask yourself how you can jump into the higher half of the address space now, which is written in code of a high-level language, check out my GitHub link above. You need to set up proper page tables. For setting up page-tables, you also need to find their backing memory that might be relocated. My example takes care of that.


Philipp Schuster

Hi, I'm Philipp and interested in Computer Science. I especially like low level development, making ugly things nice, and de-mystify "low level magic".

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *