Bug 73406

Summary: [SNA G33/31] xorg/ddx hits assertion `bo->refcnt' failed maybe once per day
Product: xorg Reporter: Matti Hämäläinen <ccr>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: critical    
Priority: high    
Version: unspecified   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 79238    

Description Matti Hämäläinen 2014-01-08 17:53:01 UTC
Lately my X setup has started hitting kgem.h:341: kgem_bo_submit: Assertion `bo->refcnt' failed. (also similar asserts of bo->refcnt in other locations of intel ddx), this happens quite often .. maybe once a day, sometimes even more. I am not sure what the causality here is, though.

Please instruct in how to help debug this issue.

In re: previous IRC discussions, the badalloc issue still lingers, but this assert issue is hitting me rather badly.

-- Window manager: WindowMaker 0.95.5
-- chipset: 00:02.0 VGA compatible controller: Intel Corporation 82G33/G31
Express Integrated Graphics Controller (rev 10)
-- system architecture: i686 / 32bit
-- xf86-video-intel: GIT 4d8f78bc95f8dd36693f74365dbc3c442fbbf8a9
-- xserver: X.Org X Server 1.14.5-1 from latest Debian testing
-- mesa: 9.2.2-1 from Debian testing
-- libpixman: 0.30.2-1
-- libdrm version: 2.4.50-1
-- kernel version: 3.12.6 (vanilla+grsec)
-- Linux distribution: current Debian Testing
-- Machine or mobo model: Asus P5KPL-CM
-- Display connector: VGA
Comment 1 Chris Wilson 2014-01-08 21:44:04 UTC
Other than hoping that debug=full catches it within the first 2GiB, all I can suggest is wait until I have even more assertions in there to try and catch the bug earlier. It still will be nigh impossible to find the double unref, but the hope is that with many stacktraces (i.e. who called the fatal submit) we should be able to build a picture of which bo is being unreffed too often, and even more hopefully when.
Comment 2 Chris Wilson 2014-01-09 12:36:23 UTC
As promised:

commit bfabdb7ebf5e491da1e74f8b362f9c2f0b6f1ac5
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Jan 9 12:34:58 2014 +0000

    sna: Add regular refcnt checks on pixmap bo
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=73406
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 3 Matti Hämäläinen 2014-01-09 18:58:04 UTC
Thanks, will try it out.
Comment 4 Chris Wilson 2014-01-17 10:59:53 UTC
Any news? Tons of nice new paranoia in xf86-video-intel.git that should be interesting... :)
Comment 5 Matti Hämäläinen 2014-01-18 10:45:52 UTC
Sorry, nothing to report. The issue seems to have magically vanished with the additional asserts (or whatever commits came before those) .. but will keep on running to see if anything happens.
Comment 6 Chris Wilson 2014-02-04 10:00:04 UTC
I have seen this (twice now), but have failed to capture it myself. However, there has been an interesting development,

commit c6a21f0355447d398a8b857ad046cd27141d4744
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Feb 4 08:51:17 2014 +0000

    sna/glyphs: Reset composite state between switching glyph formats
    
    One path uses the mask channel, the other does not. We cannot rely on
    overwriting all reused state in this case, and so we must clear the
    composite state prior to use each time.
    
    Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74494
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Tested-by: Arkadiusz Miskiewicz <arekm@maven.pl>

which had a slightly different path - but could just conceivably be the cause here as well.
Comment 7 Chris Wilson 2014-02-04 21:49:47 UTC
Hmm, I've managed to reproduce on kgem_bo_submit() assert failure - but your original report suggests that the assert(bo->refcnt) was more widespread.

commit 0906769c1b92520351729c4d8f2ab684d3ddf2eb
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Feb 4 17:51:51 2014 +0000

    sna: Rearrange assertion to ease use of substitute cached bo
    
    Since we call kgem_bo_submit() along one path when synchronising a
    cached bo (which is known to be inactive) but still want to keep the
    assertion on the refcnt, simply rearrange the code to only assert on the
    active path.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=73406
    Reported-by: Matti Hamalainen <ccr@tnsp.org>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

commit de73c5fd1cd4f948b8bd3582ae788f6f855c5b16
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Feb 4 20:57:24 2014 +0000

    sna: Tweak assert_bo_retired() to be callable on cached bo
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=73406
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Fixes up a couple of assert(bo->refcnt) for paths where refcnt was genuinely 0. However, I don't think this explains your original bug. :|
Comment 8 Chris Wilson 2014-02-05 10:47:48 UTC
Matti please reopen if you have another sighting. I'm confident between bug 74496 and the bogus asserts I hit, I have the underlying issue and symptoms resolved.
Comment 9 Matti Hämäläinen 2014-02-05 17:43:03 UTC
Will update and monitor the situation. Thanks for your hard work, Chris, despite my near-inability to assist in this issue :|

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.