Bug 33139

Summary: Radeon HD 5750 locks up when using 3D apps with r600g
Product: Mesa Reporter: Dave Witbrodt <dawitbro>
Component: Drivers/Gallium/r600Assignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: bugs.xorg, jlp.bugs
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: Backtrace of /usr/bin/X once GPU was locked
dmesg from kernel 2.6.37 (with cherry-pick mentioned in report)
Xorg.0.log with r600c
possible fix

Description Dave Witbrodt 2011-01-14 19:13:27 UTC
Created attachment 42065 [details]
Backtrace of /usr/bin/X once GPU was locked

Overview:

I have recently begun experimenting with r600g on my HD 5750 (JUNIPER) card to see if I can get it to work.  My distribution is Debian, and the Debian X Strike Force is not currently providing r600g (only r600c) so I have been packaging my own builds.

I have found that DOSBox, which I configure to use OpenGL 2D acceleration, runs fine with r600g.  However, any program which uses 3D causes my GPU to lock up.  I do not think I was able to obtain any useful debugging info when I was able to SSH into the locked up machine, but I tried; the most recent version of Mesa I tried locks the kernel, so I cannot even use SSH once the GPU locks.  (Details below.)

  Steps to reproduce:

1.  Stop X and install r600g Mesa drivers.
2.  Start X and run a program using 3D (prboom, torcs, etc.)

  Actual results:

Both prboom and torcs will run their menuing system without crashing.  Starting an actual game in prboom will work for a few moments, then lock the GPU.  Attempting to start the game in torcs causes the GPU to lock before the first 3D frame is rendered (the final text message, "Get Ready," does display... then it locks).

  Expected results:

Eventually, I hope r600g will work without locking the GPU.  Currently, r600c works fine on this hardware, other than the fact that classic is clearly inferior in performance to gallium at this point.  (See below.)

  System info:

GPU:  Powercolor Radeon HD 5750 SCS 1GB

Kernel:  2.6.37 + cherry-pick drm-core-next 17db7042 (Jan. 4, 2011)

Linux distribution:
    Debian unstable

Machine:  self-built
    AMD Phenom II X4 955
    MSI 790FX-GD70 motherboard
    4x2GB DDR3 1600

Software versions:
    libdrm-2.4.23 (built against kernel source listed above)

    mesa:  7.10-devel at commit ada9c78 (Jan. 4, 2011)
           7.11-devel at commit 69191d4 (Jan. 9, 2011)

    xorg-server-1.9.3.901

    xf86-video-ati-6.13.99 at commit f9bbb26 (Dec. 3, 2010)


  Additional Information:

This bug may be related to one or more of the following fdo bugs:

    29978  HD 3200 locks up with r600c and r600g
    31530  HD 5750 has problems with r600g
    31532  r600g lockup (no hardware mentioned)

With the Mesa I pulled from git on Jan. 4 (see above), I was able to SSH to the GPU-locked machine and try to debug with 'gdb'.  I was able to get a backtrace on /usr/bin/X, but not on the program causing the lock.  I doubt this is useful, but I am attaching it anyway.

Strangely, if I used 'strace' to attach to 'prboom' by process ID, it would prevent the GPU from hanging!  (This is why I was able to comment above that I know that r600g performance is superior to r600c on this hardware; with 'strace' attached, nothing I tried in 'prboom' would make the GPU lock!)  I tried building a debug package of 'prboom', but this (instead of the stripped binary provided by Debian) also would not lock the GPU.

Using 'torcs' always locks the GPU, and trying to attach to its PID with 'gdb' provides no information:  it simply becomes unresponsive.  (Sorry.  I tried everything I could, but maybe I'm doing it wrong.)  I was able to get 'strace' to work, but once the GPU (and kernel) locked only garbage was send to the file.  The machine runs 'fsck' after I reboot because it was not properly shutdown, so the garbage might just be random bits from the hard disk.  That file is 8.6 MB raw, and truncating the garbage and gzip'ing results in something just over 300 KB.  I don't think this bugzilla will take something that big, but if someone wants to see it I can attach it to an email.

The Mesa I pulled on Jan. 9 causes the kernel to hang once the GPU locks, so I can't even attempt to use 'gdb'; I can get 'strace' going, but the output after the lockup is random garbage.  (Please, if someone knows tricks for getting useful debugging info when a GPU locks, let me know.  It's very frustrating knowing I'm this close to superior Mesa performance, and I hate going back to r600c now that I've seen what r600g can do!!!)
Comment 1 Dave Witbrodt 2011-01-14 19:20:47 UTC
Created attachment 42066 [details]
dmesg from kernel 2.6.37 (with cherry-pick mentioned in report)
Comment 2 Dave Witbrodt 2011-01-14 19:22:21 UTC
Created attachment 42067 [details]
Xorg.0.log with r600c
Comment 3 Rubén Fernández 2011-01-19 19:44:35 UTC
I also experience GPU lockups with about half the games I've tried in an HD5750
(all of which work in an older r300g card)

In my case, in all but one I can still do SSH (even with latest mesa), so I'll
try to post debugging information here.

GPU: ATI Technologies Inc Juniper HDMI Audio [Radeon HD 5700 Series]
Kernel: 2.6.38
libdrm-2.4.23
xf86-video-ati git 57fbddfc21d8c6794f378489b764cc2a0ad4a48c
Mesa git 3ee60a3558a3546b3c3a0a9732d384afcf02994a
X.Org X Server 1.9.0
Comment 4 Siganderson 2011-01-26 09:26:13 UTC
It could be the same as https://bugs.freedesktop.org/show_bug.cgi?id=33381
I add that with the 2.6.35 kernel all games seem to work without this problem.
Comment 5 Alex Deucher 2011-01-27 14:14:20 UTC
Created attachment 42615 [details] [review]
possible fix

Does this drm patch help?
Comment 6 Benjamin Franzke 2011-01-28 06:36:27 UTC
(In reply to comment #5)
> Created an attachment (id=42615) [details]
> possible fix
> 
> Does this drm patch help?

fixes it here (tested with 2.6.37+patch on an HD 5770).
Comment 7 Dave Witbrodt 2011-01-28 09:05:00 UTC
(In reply to comment #5)
> Created an attachment (id=42615) [details]
> possible fix
> 
> Does this drm patch help?

Well Alex, I have tried and tried to make something lock my 5750 with this patch applied, but I just can't do it!

May I ask how you can work such magic from such a distance?  Do you have similar hardware with which you can reproduce the bug, and then just make the bug go away on your hardware?  Or do you know the code so well that you just make educated guesses about what might be the cause of the problem?


Either way, much thanks!
Dave W.
Comment 8 Alex Deucher 2011-01-28 13:10:17 UTC
(In reply to comment #7)
> May I ask how you can work such magic from such a distance?  Do you have
> similar hardware with which you can reproduce the bug, and then just make the
> bug go away on your hardware?  Or do you know the code so well that you just
> make educated guesses about what might be the cause of the problem?

Another bug pointed to a similar issue beginning around the time I reworked the blit code, so I figured it was a likely candidate.
Comment 9 Rubén Fernández 2011-01-31 20:04:00 UTC
*** Bug 31870 has been marked as a duplicate of this bug. ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.