Bug 34912 - [Piketon bisected]System hangs when run firefox-talos-gfx.trace
Summary: [Piketon bisected]System hangs when run firefox-talos-gfx.trace
Status: VERIFIED FIXED
Alias: None
Product: cairo
Classification: Unclassified
Component: xcb backend (show other bugs)
Version: 1.10.3
Hardware: All Linux (All)
: medium major
Assignee: Chris Wilson
QA Contact: cairo-bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-03-02 03:59 UTC by meng
Modified: 2011-03-06 16:40 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Workaround which makes the problem go away (447 bytes, patch)
2011-03-02 04:41 UTC, Uli Schlachter
Details | Splinter Review
Disable use of _cairo_xcb_picture_copy (527 bytes, patch)
2011-03-02 14:28 UTC, Uli Schlachter
Details | Splinter Review

Description meng 2011-03-02 03:59:31 UTC
System Environment:
--------------------------
Platform:              Piketon
Libdrm:		      (master)2.4.23-16-ge6018c25ca63fa6066d8fa6e57373030d07b0392
Mesa:		      (master)6ddfb322f58c7e00db73e25689ee55ffa1111bd9
Xserver:              (master)xorg-server-1.10.0
Xf86_video_intel:     (master)2.14.0-38-g6b926444c7c66e5d14eb911d9ab3d0e2d512bca4
Cairo:		      (master)c0dc933efda7672b07e188a1195821340f911a66
Kernel:	              (drm-intel-next)710f957846cff998c681f3701f6f90eda896458f

Bug detailed description:
-------------------------
System hangs when run firefox-talos-gfx.trace on Piketon.It's Cariro
regression.By bisected,12e41acf9c23618748036052f3403e6ece295796 is the first bad commit.

Reproduce steps:
----------------
1 xinit&
2 ./cairo-perf-trace firefox-talos-gfx.trace
Comment 1 Uli Schlachter 2011-03-02 04:39:20 UTC
To quote the commit message from 12e41acf9c2361:

    The only downside is that the cache enforced a limit of 16 MiB of pixel data
    that was used for source surfaces on the server. After this commit the Picture
    will be kept alive until the snapshot is detached. If this becomes a problem
    memory-wise, a new solution will have to be invented...

I did a quick awk script that parses the output of xtrace and counts the number of CreateFoo and FreeFoo requests and the number of live Pictures and Pixmaps goes through the roof. Removing the call to _cairo_surface_attach_snapshot() from _cairo_xcb_surface_picture() solves this problem (and AFAIK this causes the same behavior as with cairo-xlib).

Any ideas how we could limit the number of living cairo_xcb_picture_t without having a cache that keeps them alive longer than needed? (And especially not longer than cairo_surface_finish())
Comment 2 Uli Schlachter 2011-03-02 04:41:32 UTC
Created attachment 44018 [details] [review]
Workaround which makes the problem go away
Comment 3 Chris Wilson 2011-03-02 05:11:31 UTC
We should be able to keep around the Picture for a Pixmap, we just need to make sure that doing so doesn't incur a reference.

However, go for simplicity replace the attach_snapshot with a comment:
/* XXX Consider keeping the Picture associated with this source until
 * it is modified or destroyed; a snapshot. However, it must not
 * increase the lifetime of the underlying source.
 */
Comment 4 Uli Schlachter 2011-03-02 14:27:26 UTC
Turns out the problem lies deeper than this. Thanks to Andrea, I dug a little deeper and found a cyclic reference:

- _cairo_xcb_surface_picture() is called for a XCB surface
- This calls _copy_to_picture(source, FALSE) to make use of the existing Picture
- Let's assume the source owns its pixmap, then this calls _cairo_xcb_picture_copy()
- This creates a surface with: surface->owner = cairo_surface_reference (&target->base); 
  (-> The _cairo_xcb_picture_t has a reference to the "source" surface)
-  _cairo_xcb_surface_picture() now calls _cairo_surface_attach_snapshot() which causes the reference count for "source" to be increased.

So the result is:
"source" has the _cairo_xcb_picture_t as a snapshot and owns a reference for it.
At the same time, _cairo_xcb_picture_t owns a reference for the "source" since it co-uses its Picture.

(valgrind won't detect this because cairo_device_finish for the xcb device "cleans up" by finishing all surfaces which are still alive)

Easiest fix would be to remove _cairo_xcb_picture_copy.
Comment 5 Uli Schlachter 2011-03-02 14:28:33 UTC
Created attachment 44036 [details] [review]
Disable use of _cairo_xcb_picture_copy
Comment 6 meng 2011-03-02 21:52:45 UTC
(In reply to comment #5)
> Created an attachment (id=44036) [details]
> Disable use of _cairo_xcb_picture_copy

Test in cmmmit c0dc933efda7672b07e188a1195821340f911a66 with above patch,it works fine.
Comment 7 Uli Schlachter 2011-03-04 07:47:11 UTC
Fixed in master by removing _cairo_xcb_picture_copy (f1d313e042).
Comment 8 meng 2011-03-06 16:40:29 UTC
Verified with the commit f1d313e042af89b2f5f5d09d3eb1703d0517ecd7,it works fine when run firefox-talos-gfx.trace.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.