Bug 91350

Summary:	webdav: memory not been freed makes qemu crash
Product:	Spice	Reporter:	Victor Toso <bugzilla>
Component:	server	Assignee:	Spice Bug List <spice-bugs>
Status:	RESOLVED FIXED	QA Contact:
Severity:	critical
Priority:	medium	CC:	bugzilla
Version:	unspecified
Hardware:	x86-64 (AMD64)
OS:	Linux (All)
Whiteboard:
i915 platform:		i915 features:
Attachments:	valgrind 1 valgrind 2 massif: without wip patches massif: with wip patches

Description Victor Toso 2015-07-15 16:43:35 UTC

spice-server and QEMU from git master.

The memory consumption by qemu process grows by about the size of the file that is being copied from client to guest with webdav. Copying a 4GB made my system freeze.

Steps to reproduce!

1-) build spice-gtk with --enable-webdav=yes
2-) enable webdav in your VM by following:
https://elmarco.fedorapeople.org/manual.html#_folder_sharing
3-) using remote-viewer with webdav patches, connects to a fedora guest
4-) Open nautilus, go to 'Browse Network'
5-) On remote-viewer, enable shared folder by File > Preferences > [X] Share folder
6-) The spice client folder should appear: Double-click to mount it.
7-) Check the memory of your qemu process
8-) Copy a big file (let's say, 300 MB) from the shared folder to local VM
9-) See the memory consumption of qemu grows by a lot;

As far as I could follow in the code, the server does free after reading/writing. Not sure if it could be a bug in QEMU itself.

* Valgrind did not help as when the machine is shutdown, the memory is released correctly (apparently)
* Disconnecting the client does not free the memory

Comment 1 Victor Toso 2015-07-15 16:49:19 UTC

Created attachment 117142 [details]
valgrind 1

Comment 2 Victor Toso 2015-07-15 16:49:49 UTC

Created attachment 117143 [details]
valgrind 2

Comment 3 Marc-Andre Lureau 2015-07-16 10:28:47 UTC

massif might be better to find where allocation takes place if you can't find it as a leak.

Comment 4 Victor Toso 2015-08-20 09:16:36 UTC

(In reply to Marc-Andre Lureau from comment #3)
> massif might be better to find where allocation takes place if you can't
> find it as a leak.

Thanks for the suggestion! I missed your comment and thus the big delay :(

I'm going to attach two massif outputs, the first one with current spice and qemu master (well, qemu is actually v2.4.0-rc3+)

I run the following test:

1-) run vm with massif
2-) connect with remote-viewer
3-) enable share-folders
4-) share video of 327M from shared folder to the guest fs
5-) umount webdav
6-) shutdown -h 1 # on terminal
7-) disconnect the client # as it should dealloc resources IMO

Comment 5 Victor Toso 2015-08-20 09:18:05 UTC

Created attachment 117802 [details]
massif: without wip patches

Comment 6 Victor Toso 2015-08-20 09:18:41 UTC

Created attachment 117803 [details]
massif: with wip patches

Comment 7 Marc-Andre Lureau 2015-08-20 11:08:03 UTC

so the two massif profiles aren't that different. But the second one has a weird peak spike, it seems this is the bad guy:

->44.37% (239,139,529B) 0x4EAA766: spice_realloc (mem.c:123)
| ->44.37% (239,137,425B) 0x4E37B98: __spice_char_device_write_buffer_get (char_device.c:544)
| | ->44.37% (239,137,069B) 0x4E8EAD7: spicevmc_red_channel_alloc_msg_rcv_buf (spicevmc.c:326)
| | | ->44.37% (239,137,069B) 0x4E4D184: red_channel_client_receive (red_channel.c:272)


240M... it looks wrong :)

Comment 8 Victor Toso 2015-08-20 13:04:36 UTC

Thanks for taking a look on this,

(In reply to Marc-Andre Lureau from comment #7)
> so the two massif profiles aren't that different. But the second one has a
> weird peak spike, it seems this is the bad guy:
> 
> ->44.37% (239,139,529B) 0x4EAA766: spice_realloc (mem.c:123)
> | ->44.37% (239,137,425B) 0x4E37B98: __spice_char_device_write_buffer_get
> (char_device.c:544)
> | | ->44.37% (239,137,069B) 0x4E8EAD7:
> spicevmc_red_channel_alloc_msg_rcv_buf (spicevmc.c:326)
> | | | ->44.37% (239,137,069B) 0x4E4D184: red_channel_client_receive
> (red_channel.c:272)
> 
> 
> 240M... it looks wrong :)

Well, the file has 327M :P

The __spice_char_device_write_buffer_get try to get a buffer from memory pool queue; If the queue is empty it creates another WriteBuffer and after the data is written to the guest, it insert the WriteBuffer to the memory pool queue again.

The WIP patches try to limit the memory pool max size to (10 * 65535 B) and it also free the memory pool queue when client disconnect.

But even after disconnection the memory is not freed on qemu process.

QEMU also does use a lot of memory on this write

->49.64% (267,580,319B) 0x308B89: malloc_and_trace (vl.c:2724)
| ->49.38% (266,167,561B) 0x67CE678: g_malloc (gmem.c:97)
| | ->49.03% (264,241,152B) 0x511D8E: qemu_coroutine_new (coroutine-ucontext.c:106)
| | | ->49.03% (264,241,152B) 0x510E24: qemu_coroutine_create (qemu-coroutine.c:74)
(...)

Comment 9 Marc-Andre Lureau 2015-08-20 13:16:50 UTC

(In reply to Victor Toso from comment #8)
> Thanks for taking a look on this,
> 
> (In reply to Marc-Andre Lureau from comment #7)
> > so the two massif profiles aren't that different. But the second one has a
> > weird peak spike, it seems this is the bad guy:
> > 
> > ->44.37% (239,139,529B) 0x4EAA766: spice_realloc (mem.c:123)
> > | ->44.37% (239,137,425B) 0x4E37B98: __spice_char_device_write_buffer_get
> > (char_device.c:544)
> > | | ->44.37% (239,137,069B) 0x4E8EAD7:
> > spicevmc_red_channel_alloc_msg_rcv_buf (spicevmc.c:326)
> > | | | ->44.37% (239,137,069B) 0x4E4D184: red_channel_client_receive
> > (red_channel.c:272)
> > 
> > 
> > 240M... it looks wrong :)
> 
> Well, the file has 327M :P

ok, but webdav channels uses max 64k messages iirc.

it's weird that webdav would have memory issues and not usbredir for ex

> 
> The __spice_char_device_write_buffer_get try to get a buffer from memory
> pool queue; If the queue is empty it creates another WriteBuffer and after
> the data is written to the guest, it insert the WriteBuffer to the memory
> pool queue again.
> 
> The WIP patches try to limit the memory pool max size to (10 * 65535 B) and
> it also free the memory pool queue when client disconnect.

ah..

> 
> But even after disconnection the memory is not freed on qemu process.

the pool may keep the memory, across reconnection, no?

> QEMU also does use a lot of memory on this write
> 
> ->49.64% (267,580,319B) 0x308B89: malloc_and_trace (vl.c:2724)
> | ->49.38% (266,167,561B) 0x67CE678: g_malloc (gmem.c:97)
> | | ->49.03% (264,241,152B) 0x511D8E: qemu_coroutine_new
> (coroutine-ucontext.c:106)
> | | | ->49.03% (264,241,152B) 0x510E24: qemu_coroutine_create
> (qemu-coroutine.c:74)
> (...)

weird, it's like qemu would create 256 coroutines, maybe it does :)

Comment 10 Victor Toso 2015-08-20 13:30:07 UTC

> > > 240M... it looks wrong :)
> > 
> > Well, the file has 327M :P
> 
> ok, but webdav channels uses max 64k messages iirc.

Yes, but with big files it is several chunks of 64k. 

> it's weird that webdav would have memory issues and not usbredir for ex

It might have if data flow is fast enough to make pool queue grow on spice server and make qemu busy on the io

> 
> > 
> > The __spice_char_device_write_buffer_get try to get a buffer from memory
> > pool queue; If the queue is empty it creates another WriteBuffer and after
> > the data is written to the guest, it insert the WriteBuffer to the memory
> > pool queue again.
> > 
> > The WIP patches try to limit the memory pool max size to (10 * 65535 B) and
> > it also free the memory pool queue when client disconnect.
> 
> ah..
> 
> > 
> > But even after disconnection the memory is not freed on qemu process.
> 
> the pool may keep the memory, across reconnection, no?

Usually it does and for !webdav that was fine. With webdav, it should not keep huge pool IMO.

The WIP frees the (10 * 65k) when no client is connected, I'll send this WIP to spice mailing list shortly

> 
> > QEMU also does use a lot of memory on this write
> > 
> > ->49.64% (267,580,319B) 0x308B89: malloc_and_trace (vl.c:2724)
> > | ->49.38% (266,167,561B) 0x67CE678: g_malloc (gmem.c:97)
> > | | ->49.03% (264,241,152B) 0x511D8E: qemu_coroutine_new
> > (coroutine-ucontext.c:106)
> > | | | ->49.03% (264,241,152B) 0x510E24: qemu_coroutine_create
> > (qemu-coroutine.c:74)
> > (...)
> 
> weird, it's like qemu would create 256 coroutines, maybe it does :)

Maybe! Should I open a bug there? :-)

Comment 11 Victor Toso 2015-08-24 12:43:04 UTC

> > 
> > > QEMU also does use a lot of memory on this write
> > > 
> > > ->49.64% (267,580,319B) 0x308B89: malloc_and_trace (vl.c:2724)
> > > | ->49.38% (266,167,561B) 0x67CE678: g_malloc (gmem.c:97)
> > > | | ->49.03% (264,241,152B) 0x511D8E: qemu_coroutine_new
> > > (coroutine-ucontext.c:106)
> > > | | | ->49.03% (264,241,152B) 0x510E24: qemu_coroutine_create
> > > (qemu-coroutine.c:74)
> > > (...)
> > 
> > weird, it's like qemu would create 256 coroutines, maybe it does :)
> 
> Maybe! Should I open a bug there? :-)

Just did
https://bugzilla.redhat.com/show_bug.cgi?id=1256376

Comment 12 Victor Toso 2015-11-13 08:01:13 UTC

Patches that address the memory pool were pushed:
http://cgit.freedesktop.org/spice/spice/commit/?id=d7bee1bc56e2d3ea6af399
http://cgit.freedesktop.org/spice/spice/commit/?id=2832fdf25a9a2097efc585
http://cgit.freedesktop.org/spice/spice/commit/?id=a263c651e18266c2f05ca9

This bug was also split in:
-> Handling the write-queue size
https://bugs.freedesktop.org/show_bug.cgi?id=91903

-> Great amount of coroutines in qemu
[fedora] https://bugzilla.redhat.com/show_bug.cgi?id=1256376
[upstream] https://bugs.launchpad.net/qemu/+bug/1493033

As the memory pool has a limit and it is being freed, I'll close this bug.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.