Rust insight: memory layout of “&str”
Update: In the meantime, my knowledge of Rust got better. &str
simply uses the layout of a fat pointer. Original article down below.
TL;DR: At least on x86_64
a &str
is two u64
(16 byte) long. The first u64 is a pointer and the second u64 is the length of the string.
Due to a small procrastination session, I wanted to know what the layout of &str
looks like in memory (on x86_64
). For this I used a trick called type punning and created a union that contains a &str
and an u8
array. On my system std::mem::size_of::<&str>()
returns 16 bytes
. Therefore the type definition must look like follows.
union RustStrLayout { // sizeof "&str" bytes: [u8; 16], str: &'static str, }
I created a variable that contains the string “Na Moin :)” (it’s a german phrase). Using our knowledge from ASCII
, UTF-8
and C
, this string will probably be 10+1 bytes
long (one for the null termination). But because in Rust a &str
is more than a regular C-String
, the output of println!("bytes: {:#?}", foobar.bytes);
gives us
bytes: [ 32, 176, 76, 109, 187, 85, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0, ]
Therefore, the string content itself is not contained in the union as you can clearly see. The first 8 bytes are most probably a 64-bit address and the 10 in the second 64-bit value should be the length of the string slice (without null byte). So I just tried using the address as c_char-pointer (as in C-Language)
and it worked. Please note: I’m confident that Rust terminates the string by null even tho it is not mandatory because it also knows always about the length. Because 1 more byte is negligible and it’s common to null-terminate strings, I guess we can rely on that (at least for static strings!). If any reader knows more about this, please tell me! But anyway, here is my full code. You can also execute it in Rust playground.
use std::ffi::CStr; use std::os::raw::c_char; union RustStrLayout { // sizeof "&str" bytes: [u8; 16], str: &'static str, } fn main() { println!("sizeof str: {}", std::mem::size_of::<&str>()); // initialize with zeroes let mut foobar = RustStrLayout { bytes: [0; 16] }; foobar.str = "Na Moin :)"; unsafe { println!("{:#?}", foobar.str); println!("bytes: {:#?}", foobar.bytes); let double_words = std::mem::transmute::<[u8; 16], [u64; 2]>(foobar.bytes); let ascii_addr = double_words[0]; let ascii_len = double_words[1]; let ascii_ptr = ascii_addr as *const c_char; // assumes/requires that string is null-terminated let ascii_str = CStr::from_ptr(ascii_ptr); println!( "ascii_addr = 0x{:x}, len = {}, c_string = {:?}, is_null_terminated={}", ascii_addr, ascii_len, ascii_str, *ascii_ptr.add(ascii_len as usize) == 0, // *(ascii_ptr.add(ascii_len as usize) as *const char) == '\0', ); } }
A possible output is:
sizeof str: 16 "Na Moin :)" bytes: [ 32, 176, 76, 109, 187, 85, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0, ] ascii_addr = 0x55bb6d4cb020, len = 10, c_string = "Na Moin :)", is_null_terminated=true
0 Comments