Live-Migration of QEMU/KVM VMs with libvirt: Insights, Cheat Sheet, Tips

Published by Philipp Schuster on

Update: The original post is from November 2024. I updated the post in the meantime a couple of times to reflect my latest knowledge.

During my work as virtualization engineer at Cyberus Technology, I work from time to time with libvirt. Specifically, I am working with live-migration of QEMU/KVM VMs (QEMU as VMM and KVM as hypervisor). I want to share what I’ve learned, highlight the caveats I’ve encountered, and discuss the solutions that have worked for me. This article is constantly being expanded and improved.

Everything I’m telling here focuses on one of multiple use-cases of libvirt. libvirt knows different migration strategies and different VMMs/Hypervisors. I’m focusing on live-migration with QEMU/KVM. All my experiments were done with Linux 6.12, libvirt 10.10, and QEMU 9.2.0.

VM (Live) Migration in Context & Terminology

The documentation discusses the multiple “Network data transport” and “Communication control path” strategies libvirt supports:

Network data transport strategies:

  • Hypervisor native transport: The hypervisor performs the VM state transfer directly between two instances/processes. (In the terminology I prefer: Hypervisor == Virtual Machine Monitor (VMM))
  • libvirt tunneled transport: libvirt fetches the VM state from the VMM, tunnels the data through a libvirt-managed migration, and puts the state into the VMM on the receiver side

Communication control path:

  • Managed direct migration: The libvirt driver for the given hypervisor (VMM) completely manages the migration from the client side (this can be a virsh client running on neither the VM source host or the VM destination)
  • Managed peer-to-peer migration: The libvirt driver for the given hypervisor (VMM) completely manages the migration (prepare/perform/finish) from the VM source host side (this is by the way the only supported strategy by the QEMU driver in libvirt).
  • Unmanaged direct migration: The libvirt driver for the given hypervisor (VMM) on the client forwards the request to a hypervisor-management layer that is not libvirt

So.. what the..heck does that mean for QEMU?! The QEMU libvirt driver actually only supports what is called “peer to peer” above, which we can learn from the code. This also means that the --p2p flag for virsh migrate is implicitly always set!

As we want to use QEMU’s built-in functionality for the VM state transport, we are not going to use --tunneled. This is also the default and refers to what is described above as “hypervisor native transport”. But for hypervisor native transport, we need to talk about --desturi and --migrateuri.

--desturi and --migrateuri

A virsh command to directly migrate a VM live from a VM source host to a destination host (command is invoked on source host) might look as follows:

$ sudo virsh migrate \
  --live \
  --p2p \
  --domain <name-of-vm-domain> \
  --desturi qemu+ssh://<vm-dest-host>/system \
  --migrateuri tcp://192.168.123.1 \
  --parallel --parallel-connections 8 \
  --verbose

Let’s break that down: This spawns a blocking process performing the whole live migration, which waits until the migration is done. We want the migration to be live (the vCPUs of the VM keep running and keep making progress). We use --p2p only for some explicitly here; it is the only viable option we have for QEMU anyway. But what are --desturi and migrateuri?

desturi:

  • Describes where to find the VM destination host and how to establish the connection with the corresponding libvirt daemon + driver (e.g., libvirt qemu driver) on that host. The URI is specified from the perspective of the source host. This also includes whether we connect to the driver in system or session mode.
    • Technical insight: Currently, the libvirt “remote” driver connects to the monolithic daemon on the receiver side to perform all necessary steps. This will change in the future with the modular daemon design where the connection directly lands at the virtqemud daemon, but this is an implementation detail.
  • Possible schemes are: qemu+ssh://, qemu+tcp://, qemu+tls://
  • The scheme only specifies the control path communication, but not the transfer of VM state (as we use the “hypervisor native transport, see below)!
  • 💡 The line qemu+tls://<host>/system is read as:
    “Connect to the QEMU libvirt driver on the receiver side in the “system” context using the libvirt TLS socket on the given host”
    –> +tcp instead would use the libvirt TCP socket to talk to the QEMU libvirt driver
    –> +ssh instead would use the normal SSH service to talk to the QEMU libvirt driver

migrateuri:

  • Describes the network destination as seen by QEMU. This may be different from the --desturi, when you want to use a specific interface (identified by its IP net) for example
  • Possible schemes are: tcp://, rdma:// (and more)

This means we have the following options to transfer data over the network (simplified commands):

  • virsh migrate --desturi qemu+tcp://host/system --migrateuri tcp://192.126.123.2
    –> communication path unencrypted, VM state transfer unencrypted
  • virsh migrate --desturi qemu+tls://host/system --migrateuri tcp://192.126.123.2
    –> communication path encrypted, VM state transfer unencrypted
  • virsh migrate --desturi qemu+ssh://host/system --migrateuri tcp://192.126.123.2
    –> communication path encrypted, VM state transfer unencrypted
  • virsh migrate --desturi qemu+tcp://host/system --migrateuri tcp://192.126.123.2 --tls
    –> communication path unencrypted, VM state transfer encrypted
  • virsh migrate --desturi qemu+ssh://host/system --migrateuri tcp://192.126.123.2 --tls
    –> communication path encrypted, VM state transfer encrypted

So, please learn the difference between these two properties! Both are important.

PS: The full desturi template is: driver[+transport]://[username@][hostname][:port]/[path][?extraparameters]. You can read more about in the official docs.

Command Cheat Sheet

Trigger a live migration (via native QEMU transport through an SSH channel).

This spawns a blocking process that waits until the migration is done. The flags can be adapted accordingly to fit your use-case.

sudo virsh migrate --domain $DOMAIN --desturi qemu+ssh://$USER@$HOST/system --migrateuri tcp://$HOST --live --auto-converge --verbose

Get Information and Statistics about an Ongoing Migration

This must happen on the sender VMM side:

sudo virsh domjobinfo --domain $DOMAIN.

The output looks something like this:

Job type:         Unbounded   
Operation:        Outgoing migration
Time elapsed:     280866       ms
Data processed:   26.086 GiB
Data remaining:   637.297 MiB
Data total:       16.009 GiB
Memory processed: 26.086 GiB
Memory remaining: 637.297 MiB
Memory total:     16.009 GiB
Memory bandwidth: 106.762 MiB/s
Dirty rate:       29459        pages/s
Page size:        4096         bytes
Iteration:        28          
Postcopy requests: 0           
Constant pages:   3782873     
Normal pages:     6815548     
Normal data:      25.999 GiB
Expected downtime: 9502         ms
Setup time:       64           ms
Compression cache: 64.000 MiB
Compressed data:  30.065 MiB
Compressed pages: 403027       
Compression cache misses: 6392597      
Compression overflows: 6739         
Auto converge throttle: 99    

If you frequently poll this data, you can get nice statistics. Especially, but not limited to, the vCPU throttling of the auto-converge feature and the impact on the memory dirty rate.

Get Information about a Completed Migration

This has to be invoked when the command from above gives no more output:

sudo virsh domjobinfo --domain $DOMAIN --completed [--keep-completed]

The output somehow looks like this:

Job type:         Completed
Operation:        Outgoing migration
... as above
Abort an Ongoing Migration

sudo virsh domjobabort --domain $DOMAIN

Tips & Tricks

  • The VM that you want to migrate needs a CPU model configured that is compatible with both hosts. Configure something more compatible, such as “IvyBridge” in virt-manager rather than “host-passthrough” (except you have identical CPUs).
  • The VM image must live in network storage. The corresponding network storage pool (libvirt terminology) must be configured in all libvirt hosts manually, beforehand.
  • Always specify --migrateuri in conjunction to --desturi! Otherwise, weird behavior can happen regarding the path the VM state transfer takes! Set it for example like this: -desturi $migration_scheme://$DST_HOST/system and --migrateuri tcp://$DST_HOST. Without migrateuri, I experienced situations where additional (reverse) DNS magic was done by the sender VM host, which caused the migration to use another network connection than the one it was supposed to use (specifically not the direct link that was attached to the machine next to the public internet)!

Philipp Schuster

Hi, I'm Philipp and interested in Computer Science. I especially like low level development, making ugly things nice, and de-mystify "low level magic".

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *