Lately my X setup has started hitting kgem.h:341: kgem_bo_submit: Assertion `bo->refcnt' failed. (also similar asserts of bo->refcnt in other locations of intel ddx), this happens quite often .. maybe once a day, sometimes even more. I am not sure what the causality here is, though. Please instruct in how to help debug this issue. In re: previous IRC discussions, the badalloc issue still lingers, but this assert issue is hitting me rather badly. -- Window manager: WindowMaker 0.95.5 -- chipset: 00:02.0 VGA compatible controller: Intel Corporation 82G33/G31 Express Integrated Graphics Controller (rev 10) -- system architecture: i686 / 32bit -- xf86-video-intel: GIT 4d8f78bc95f8dd36693f74365dbc3c442fbbf8a9 -- xserver: X.Org X Server 1.14.5-1 from latest Debian testing -- mesa: 9.2.2-1 from Debian testing -- libpixman: 0.30.2-1 -- libdrm version: 2.4.50-1 -- kernel version: 3.12.6 (vanilla+grsec) -- Linux distribution: current Debian Testing -- Machine or mobo model: Asus P5KPL-CM -- Display connector: VGA
Other than hoping that debug=full catches it within the first 2GiB, all I can suggest is wait until I have even more assertions in there to try and catch the bug earlier. It still will be nigh impossible to find the double unref, but the hope is that with many stacktraces (i.e. who called the fatal submit) we should be able to build a picture of which bo is being unreffed too often, and even more hopefully when.
As promised: commit bfabdb7ebf5e491da1e74f8b362f9c2f0b6f1ac5 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jan 9 12:34:58 2014 +0000 sna: Add regular refcnt checks on pixmap bo References: https://bugs.freedesktop.org/show_bug.cgi?id=73406 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Thanks, will try it out.
Any news? Tons of nice new paranoia in xf86-video-intel.git that should be interesting... :)
Sorry, nothing to report. The issue seems to have magically vanished with the additional asserts (or whatever commits came before those) .. but will keep on running to see if anything happens.
I have seen this (twice now), but have failed to capture it myself. However, there has been an interesting development, commit c6a21f0355447d398a8b857ad046cd27141d4744 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Feb 4 08:51:17 2014 +0000 sna/glyphs: Reset composite state between switching glyph formats One path uses the mask channel, the other does not. We cannot rely on overwriting all reused state in this case, and so we must clear the composite state prior to use each time. Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74494 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Arkadiusz Miskiewicz <arekm@maven.pl> which had a slightly different path - but could just conceivably be the cause here as well.
Hmm, I've managed to reproduce on kgem_bo_submit() assert failure - but your original report suggests that the assert(bo->refcnt) was more widespread. commit 0906769c1b92520351729c4d8f2ab684d3ddf2eb Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Feb 4 17:51:51 2014 +0000 sna: Rearrange assertion to ease use of substitute cached bo Since we call kgem_bo_submit() along one path when synchronising a cached bo (which is known to be inactive) but still want to keep the assertion on the refcnt, simply rearrange the code to only assert on the active path. References: https://bugs.freedesktop.org/show_bug.cgi?id=73406 Reported-by: Matti Hamalainen <ccr@tnsp.org> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> commit de73c5fd1cd4f948b8bd3582ae788f6f855c5b16 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Feb 4 20:57:24 2014 +0000 sna: Tweak assert_bo_retired() to be callable on cached bo References: https://bugs.freedesktop.org/show_bug.cgi?id=73406 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Fixes up a couple of assert(bo->refcnt) for paths where refcnt was genuinely 0. However, I don't think this explains your original bug. :|
Matti please reopen if you have another sighting. I'm confident between bug 74496 and the bogus asserts I hit, I have the underlying issue and symptoms resolved.
Will update and monitor the situation. Thanks for your hard work, Chris, despite my near-inability to assist in this issue :|
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.