[virt-tools-list] VMs died due to hanging httpd processes
Dennis Jacobfeuerborn
dennisml at conversis.de
Sun Dec 12 14:40:35 UTC 2010
Hi,
about an hour ago two web-serving VMs died at the same time with the
following error on the console:
INFO: task httpd:4304 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
httpd D 00af1f714d1112e2 0 4304 22471 4305 4303 (NOTLB)
ffff88006574bdc8 0000000000000282 00000000000041f8 ffff88006574bea8
000000000000000a ffff88009747b820 ffffffff804f4b00 00000000001a5eee
ffff88009747ba08 ffff880095be5015
Call Trace:
[<ffffffff8022d03c>] mntput_no_expire+0x19/0x89
[<ffffffff8020eeae>] link_path_walk+0xa6/0xb2
[<ffffffff80263a7e>] __mutex_lock_slowpath+0x60/0x9b
[<ffffffff80223f33>] __path_lookup_intent_open+0x56/0x97
[<ffffffff80263ac8>] .text.lock.mutex+0xf/0x14
[<ffffffff8021b52d>] open_namei+0xea/0x6d5
[<ffffffff8029cb30>] set_process_cpu_timer+0xc7/0xd2
[<ffffffff80227caa>] do_filp_open+0x1c/0x38
[<ffffffff8021a364>] do_sys_open+0x44/0xbe
[<ffffffff802602f9>] tracesys+0xab/0xb6
Monitoring show that in a timeframe of about 3 minutes the load on the
systems shot up to over 400 before they died. Since MaxClients is set to
512 I suspect that the processes had a mass-lockup with each process
constantly causing a load of 1 (similar to what happens when a process
hangs on an NFS mount point). One of the two VMs acts as a NFS server and
exports directories to the other VM (but doesn't mount any external NFS
sources itself).
What is strange is that both system locked up at the same time since they
are running on two different physical hosts. The hosts run Centos 5.3 while
the VMs run Centos 5.5 as PV Xen guests.
Since the call trace looks identical on both cases I wonder if anyone has
an idea what exactly went wrong here?
Regards,
dennis
More information about the virt-tools-list
mailing list