Bug 71983

Summary: libdrm 2.4.49 makes gpu crash (HD7770)
Product: Mesa Reporter: Arek Ruśniak <arek.rusi>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: dawitbro
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: kernel log when i run some ogl apps (unigine-*;some 3d games; even google-chrome)
kernel log when i run some ogl apps (unigine-*;some 3d games; even google-chrome)
SI: Update unaligned offset for 2D->1D transition

Description Arek Ruśniak 2013-11-25 12:31:42 UTC
Created attachment 89740 [details]
kernel log when i run some ogl apps (unigine-*;some 3d games; even google-chrome)

I've gzipped log because it has ~ 14MB. 

main part are bazillion lines like that:
<3>[  414.876442] radeon 0000:01:00.0: GPU fault detected: 146 0x0fe32004
<3>[  414.876446] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x000021FF
<3>[  414.876448] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x03020004
<4>[  414.876450] VM fault (0x04, vmid 1) at page 8703, write from CB (32)
<3>[  414.876455] radeon 0000:01:00.0: GPU fault detected: 146 0x0fc32004
<3>[  414.876457] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
<3>[  414.876458] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
<4>[  414.876460] VM fault (0x00, vmid 0) at page 0, read from unknown (0)

Usually my PC doesn't respone but sometimes like for unigine-sanctuary i can see graphics corruption for a while. When i downgrade libdrm everything works like before.
Comment 1 Arek Ruśniak 2013-11-25 13:09:37 UTC
Created attachment 89745 [details]
kernel log when i run some ogl apps (unigine-*;some 3d games; even google-chrome)

my software/hardware:
archlinux x86/x86_64
LIBDRM 2.4.49 / 2.4.48

linux 3.13rc1 / 3.12.1
mesa 10.1 git-e6a0eca
LLVM 3.5 r195632
glamor - latest from git repo
xf86-video-ati - latest form git repo
xorg-server 1.14.4

GPU HD7770 GHz Ed. Chipset: "VERDE" (ChipID = 0x683d)


I am wild and woolly today, sorry for that.
Comment 2 Maarten Lankhorst 2013-11-25 13:10:20 UTC
Can you bisect?
Comment 3 Marek Olšák 2013-11-25 13:51:23 UTC
Could you bisect?
Comment 4 Arek Ruśniak 2013-11-25 13:58:39 UTC
yep:) 
if i good understand word "bisect" here it is:

http://cgit.freedesktop.org/mesa/drm/commit/?id=ce8af454259279c14c44bcd32c429640ca5e1691

btw i try turn off tiling but without succes, gpu still crashes. Before this commit it works ok.
Comment 5 Dave Witbrodt 2013-11-25 14:49:22 UTC
Same problem here on HD 7850 (PITCAIRN 0x1002:0x6819 0x1787:0x2320).

Yesterday I was upgrading my X server to test 1.15 RC2, which requires some libraries not available on Debian yet.  All was well when I began:

    libdrm     :  2.4.48
    mesa       :  10.1.0-devel (commit 53f89a43 of Nov. 17)
    xorg-server:  1.14.99.3
    radeon DDX :  7.2.99 (commit d571d6af of Nov. 13)
    glamor     :  0.5.1 (commit 890a7738 of Nov. 6)
    linux      :  3.12.1 (+ some DRM from 3.13rc1)

I had some build failures last time I tried to update the X server, and I saw bigger than usual changes coming for 1.15, so I thought I should invest some time into trying to get things working.  After getting the dependencies satisfied and having succesfully built 1.14.99.902 (with the drivers rebuilt against it) everything was fine.  In fact, I saw a very noticeable performance boost.  On a roll, I decided to update Mesa and libdrm, ending up with this set of packages:

    libdrm     :  2.4.49
    mesa       :  10.1.0-devel (commit f56f875b of Nov. 21)
    xorg-server:  1.14.99.902
    radeon DDX :  7.2.99 (commit d571d6af of Nov. 13)
    glamor     :  0.5.1 (commit 71e7168d of Nov. 13)
    linux      :  3.12.1 (+ some DRM from 3.13rc1)

That's when it all stopped working.  Even downgrading the X stack and Mesa (in any combination) to the previous working versions still causes failures.  The symptoms are that everything seems to be fine (kernel boots, X starts, you can use the desktop, etc.) until you touch OpenGL.  Any attempt to try a game like 'torcs' (or even 'prboom-plus', which is much less GL intensive) causes X to freeze.  It is sometimes possible to switch to a VT, but the screen goes black in a few moments so there is nothing to see; if the system does not reboot on its own, it is possible to use sysrq keys to shut it down somewhat sanely.

When changing everything back (except libdrm) to previous working versions _also_ was not working, I began to suspect libdrm.  After downgrading to libdrm 2.4.48, everything was working again flawlessly; even upgrading Mesa and the X stack continued to work perfectly, so I can once again enjoy the 1.15 performance improvements!

It took so many hours to identify libdrm 2.4.49 as the culprit -- I was assuming that xorg-server, mesa, or glamor were at fault -- that I ran out of time last night.  I only have to work 2 days this week, so I should be able to bisect it in 2 days (if no developer can reproduce it by then).
Comment 6 Michel Dänzer 2013-11-26 07:06:21 UTC
Created attachment 89812 [details] [review]
SI: Update unaligned offset for 2D->1D transition

Does this patch fix the problem?
Comment 7 Arek Ruśniak 2013-11-26 07:26:24 UTC
Yes it does, thank you Michel.
Comment 8 Michel Dänzer 2013-11-26 09:20:32 UTC
Thanks for testing! Fixed in Git:

commit c8a437f4c76527b3c8385699ccee07f35fe3f166
Author: Michel Dänzer <michel.daenzer@amd.com>
Date:   Tue Nov 26 18:16:03 2013 +0900

    radeon: Update unaligned offset for 2D->1D tiling transition on SI
Comment 9 Dave Witbrodt 2013-11-26 14:53:54 UTC
(In reply to comment #8)

Confirmed.  Thanks!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.