Bug 53353 - Corruption and gpu hang running pgadmin3 on gen5/SNA
Summary: Corruption and gpu hang running pgadmin3 on gen5/SNA
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Chris Wilson
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-08-10 20:27 UTC by Clemens Eisserer
Modified: 2012-08-13 20:39 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
screenshot (60.34 KB, image/png)
2012-08-10 20:27 UTC, Clemens Eisserer
no flags Details
error-state of wedged gpu (1.38 MB, text/plain)
2012-08-10 20:27 UTC, Clemens Eisserer
no flags Details

Description Clemens Eisserer 2012-08-10 20:27:23 UTC
Created attachment 65398 [details]
screenshot

When running pgadmin3 on my i5-540M, I get rendering corruptions as shown on the screenshot which result in a GPU hang.
Error state attached, the debug=full log is available at: http://93.83.133.214/Xorg_pgadmin3.log.7za

I also get rendering erros with the wedged GPU which seems to be a problem with copyArea. Please let me know if I should open another report about that.
Comment 1 Clemens Eisserer 2012-08-10 20:27:54 UTC
Created attachment 65399 [details]
error-state of wedged gpu
Comment 2 Chris Wilson 2012-08-10 20:39:21 UTC
If you can reproduce the corruption whilst wedged using #define NO_HW 1, then yes I'd worry. As it stands, the GPU is performing a stray write so any memory is suspect.
Comment 3 Chris Wilson 2012-08-10 20:44:19 UTC
I've just pushed a patch to add a debug option - I've add some more assertions based on the debug log as well, this is just a quick guess based on the error state.

Can you update and set #define NO_TILE_8x8 1 in src/sna/sna_accel.c and see if that causes the hang to go away?
Comment 4 Clemens Eisserer 2012-08-10 20:50:12 UTC
The corruptions I get in wedged mode are different, no need to get nervous ;)
I'll give your new patches a try...
Comment 5 Clemens Eisserer 2012-08-10 20:59:38 UTC
NO_TILE_8x8 helps - I don't get any corruption or hang with it set.

However, the added assertions don't catch the cause.
Comment 6 Chris Wilson 2012-08-10 21:03:23 UTC
Found a couple of bugs so far in the tiled_8x8 function:

commit b33f6754a99f6d11e423d6a03739fa2c04eeed88
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Aug 10 21:59:36 2012 +0100

    sna: Add assertions to 8x8 tiled BLTs and reset BLT state afterwards
    
    Reported-by: Clemens Eisserer <linuxhippy@gmail.com>
    References: https://bugs.freedesktop.org/show_bug.cgi?id=53353
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Judging by your error-state, that may indeed be the root cause.
Comment 7 Clemens Eisserer 2012-08-10 21:13:44 UTC
Unfourtunatly I still don't hit any assertion :/
Comment 8 Chris Wilson 2012-08-10 21:19:04 UTC
Hmm, can you try:

diff --git a/src/sna/sna_accel.c b/src/sna/sna_accel.c
index 935f1bc..5385c02 100644
--- a/src/sna/sna_accel.c
+++ b/src/sna/sna_accel.c
@@ -9414,8 +9414,8 @@ sna_poly_fill_rect_tiled_8x8_blt(DrawablePtr drawable,
        int16_t dx, dy;
        uint32_t *b;
 
-       if (NO_TILE_8x8)
-               return false;
+       if (1)
+               return true;
 
        DBG(("%s x %d [(%d, %d)+(%d, %d)...], clipped=%x\n",
             __FUNCTION__, n, r->x, r->y, r->width, r->height, clipped));
Comment 9 Clemens Eisserer 2012-08-10 21:28:18 UTC
Jap, made the hang as well as the corruption go away.
Comment 10 Chris Wilson 2012-08-10 21:29:51 UTC
Also be sure to be running with --enable-debug[=full] to enable assertions.
Comment 11 Clemens Eisserer 2012-08-10 21:36:31 UTC
Just double-checked and did a clean build - however I don't hit any assertions, without that unconditional return true, should I try with it?
Comment 12 Chris Wilson 2012-08-10 21:48:39 UTC
(In reply to comment #11)
> Just double-checked and did a clean build - however I don't hit any assertions,
> without that unconditional return true, should I try with it?

No, I was just double checking that the hang didn't go away as a side-effect of forcing a fallback at that point. Returning true instead meant we kept going as usual, just omitting the tiled_8x8 blt.

Ok, the cause isn't immediately obvious. I've check that we are rendering in function within the target pixmap, so I need to find the bit of state that isn't being reset such that the GPU is interpreting these or subsequent commands in an unexpected manner. :|
Comment 13 Chris Wilson 2012-08-10 22:08:18 UTC
Another possibility:

commit 5d6d9231cd2003fda1c6f2dd3174014317a45704
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Aug 10 23:07:07 2012 +0100

    sna: Reset BLT state after copy-boxes
Comment 14 Clemens Eisserer 2012-08-11 05:22:45 UTC
Unfourtunatly still no luck :/
Comment 15 Chris Wilson 2012-08-11 08:08:01 UTC
Do you mind refreshing the error-state? Hopefully I've eliminated a few of the broken commands...
Comment 16 Chris Wilson 2012-08-11 18:46:16 UTC
Third time lucky?

commit 44f848f9b2f2a2dcd9087210ea46bc4fdb63c057
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Aug 11 19:44:15 2012 +0100

    sna: Fix typo in computation of texel offsets for tiled 8x8 blts
    
    Reported-by: Clemens Eisserer <linuxhippy@gmail.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=53353
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Oh boy, that was a stupid typo.
Comment 17 Clemens Eisserer 2012-08-13 20:39:14 UTC
Thanks! :)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.