Rust insight: memory layout of “&str”

Published by Philipp Schuster on

TL;DR: At least on x86_64 a &str is two u64 (16 byte) long. The first u64 is a pointer and the second u64 is the length of the string.

Due to a small procrastination session, I wanted to know what the layout of &str looks like in memory (on x86_64). For this I used a trick called type punning and created a union that contains a &str and an u8 array. On my system std::mem::size_of::<&str>() returns 16 bytes. Therefore the type definition must look like follows.

union RustStrLayout {
    // sizeof "&str"
    bytes: [u8; 16],
    str: &'static str,
}

I created a variable that contains the string “Na Moin :​)” (it’s a german phrase). Using our knowledge from ASCII, UTF-8 and C, this string will probably be 10+1 bytes long (one for the null termination). But because in Rust a &str is more than a regular C-String, the output of println!("bytes: {:#?}", foobar.bytes); gives us

bytes: [
    32,
    176,
    76,
    109,
    187,
    85,
    0,
    0,
    10,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
]

Therefore, the string content itself is not contained in the union as you can clearly see. The first 8 bytes are most probably a 64-bit address and the 10 in the second 64-bit value should be the length of the string slice (without null byte). So I just tried using the address as c_char-pointer (as in C-Language) and it worked. Please note: I’m confident that Rust terminates the string by null even tho it is not mandatory because it also knows always about the length. Because 1 more byte is negligible and it’s common to null-terminate strings, I guess we can rely on that (at least for static strings!). If any reader knows more about this, please tell me! But anyway, here is my full code. You can also execute it in Rust playground.

use std::ffi::CStr;
use std::os::raw::c_char;

union RustStrLayout {
    // sizeof "&str"
    bytes: [u8; 16],
    str: &'static str,
}

fn main() {
    println!("sizeof str: {}", std::mem::size_of::<&str>());

    // initialize with zeroes
    let mut foobar = RustStrLayout {
      bytes: [0; 16]
    };
    foobar.str = "Na Moin :)";

    unsafe {
        println!("{:#?}", foobar.str);
        println!("bytes: {:#?}", foobar.bytes);

        let double_words = std::mem::transmute::<[u8; 16], [u64; 2]>(foobar.bytes);
        let ascii_addr = double_words[0];
        let ascii_len = double_words[1];
        let ascii_ptr = ascii_addr as *const c_char;
        // assumes/requires that string is null-terminated
        let ascii_str = CStr::from_ptr(ascii_ptr);
        println!(
            "ascii_addr = 0x{:x}, len = {}, c_string = {:?}, is_null_terminated={}",
            ascii_addr,
            ascii_len,
            ascii_str,
            *ascii_ptr.add(ascii_len as usize) == 0,
            // *(ascii_ptr.add(ascii_len as usize) as *const char) == '\0',
        );
    }
}

A possible output is:

sizeof str: 16
"Na Moin :)"
bytes: [
    32,
    176,
    76,
    109,
    187,
    85,
    0,
    0,
    10,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
]
ascii_addr = 0x55bb6d4cb020, len = 10, c_string = "Na Moin :)", is_null_terminated=true

Philipp Schuster

Hi, I'm Philipp and interested in Computer Science. I especially like low level development, making ugly things nice, and de-mystify "low level magic".

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *