Network Stall when doing live migration
W Kern
wkmail at bneit.com
Tue Sep 5 18:29:58 UTC 2023
Greetings.
I have a testbed setup of two stock Ubuntu22LTS libvirt Host installs
using Shared Storage (MooseFS in this case, cuz it was readily available).
I have configured the MooseFS as a 'dir' pool on each machine and they
are on the same mount /MFS using the MFS fusemount.
I am using an OVS bridge on each server to provide a live IP to the VMs.
Each OVS installation is assigned to its own ethernet card and the two
machines are on the same Cisco switch.
The Cisco switch is setup as a trunk with a VLAN, and Virsh connects the
VM to the OVS instance with that VLAN tag
I can install and boot up individual VMs on each Host with no problem.
I can --offline migrate Domains from one host to the other using virsh
migrate
Note the --unsafe which seems to be required and prevents me from using
Cockpit for migration.
virsh migrate U22-TEST qemu+ssh://x.x.x.126/system --unsafe --offline
--persistent --undefinesource --abort-on-error
then
virsh start U22-TEST. So that works fine.
So I am now trying a live migration using
virsh migrate U22-TEST qemu+ssh://x.x.x.126/system --unsafe --live
--verbose --persistent --undefinesource --abort-on-error
Which works as well. I see the migration percentage climbing up and at
100% the transfer occurs with the VM down on the source and up on the
second host. virsh console works at that point.
However, there is always a 2-3 minute period after the VM migrates (i.e.
comes up on the destination host) when the networking is dead.
After the 3 minute wait, the VM suddenly responds to a ping, ports are
open etc. Most of the time any SSH connections have timed out by then.
I assume this is some sort of arp issue, but where? Libvirt, OVS, the
Cisco switch
Is there some sort of additional step, flag, or even IOS config
suggestion that I can use to limit the network downtime?
As minor secondary issue, is there some additional XML flag (<shared>) I
can pass to the storage pool XML to indicate that it really is shared
media and doesn't need the --unsafe flag
-wk
More information about the virt-tools-list
mailing list