Bug 59592

Summary: Radeon HD 5670: reproducable GPU lockups with htile enabled
Product: Mesa Reporter: nine
Component: Drivers/Gallium/r600Assignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: major    
Priority: medium CC: anonymous, lakostis
Version: gitKeywords: regression
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: dmesg after lockups
Mesa fix

Description nine 2013-01-19 18:31:55 UTC
Created attachment 73293 [details]
dmesg after lockups

Since upgrading my kernel to 3.8.0-rc3 using openSUSE packages, I reproducably get GPU lockups when running FlightGear right after the simulation starts. Takes a couple of minutes with some screen blanking to quit the application again. Afterwards the system continues running fine. Activating KDE's desktop effects may produce the same behaviour.

I'm using the following versions:
Mesa-9.1_git20130117-230.1.x86_64
libdrm2-2.4.99_git20130117-1.1.x86_64

X.Org X Server 1.12.3
Release Date: 2012-07-09
[    17.264] X Protocol Version 11, Revision 0
[    17.264] Build Operating System: openSUSE SUSE LINUX
[    17.264] Current Operating System: Linux sphinx 3.8.0-rc3-1-desktop #1 SMP PREEMPT Thu Jan 10 20:49:22 UTC 2013 (7ce28dd) x86_64
[    17.264] Kernel command line: BOOT_IMAGE=/vmlinuz-3.8.0-rc3-1-desktop root=/dev/mapper/system-root resume=/dev/system/swap quiet splash=silent
[    17.264] Build Date: 08 January 2013  11:56:04AM


I could do some git bisecting on the kernel or on Mesa if it helps and I'm fluent in C. Just tell me how I can help fixing this.
Comment 1 nine 2013-01-20 18:46:19 UTC
I spent the afternoon bisecting the kernel and found the commit seeming to cause the GPU lockups

4ac0533abaec2b83a7f2c675010eedd55664bc26 is the first bad commit
commit 4ac0533abaec2b83a7f2c675010eedd55664bc26
Author: Jerome Glisse <jglisse@redhat.com>
Date:   Thu Dec 13 12:08:11 2012 -0500

    drm/radeon: fix htile buffer size computation for command stream checker
    
    Fix the size computation of the htile buffer.
    
    Signed-off-by: Jerome Glisse <jglisse@redhat.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

:040000 040000 cf30bb09a4096c41959a27c6fc7d391dfa718028 fc571d6379b3b697a2bad0e5d097797f77c0a1b6 M      drivers
Comment 2 Alex Deucher 2013-01-21 21:27:19 UTC
I think this is actually a mesa bug.  The kernel commit you bisected just allows the problematic feature to be enabled in mesa.  The mesa commits are:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=24b1206ab2dcd506aaac3ef656aebc8bc20cd27a
http://cgit.freedesktop.org/mesa/mesa/commit/?id=6532eb17baff6e61b427f29e076883f8941ae664
Comment 3 nine 2013-01-24 22:03:31 UTC
I can confirm that http://cgit.freedesktop.org/mesa/mesa/commit/?id=6532eb17baff6e61b427f29e076883f8941ae664 is the first Mesa commit where the lockups occur.

With http://cgit.freedesktop.org/mesa/mesa/commit/?id=24b1206ab2dcd506aaac3ef656aebc8bc20cd27a it still works on kernel 3.8.0-rc4

Is there anything else I can do to help fixing this bug?
Comment 4 Alex Deucher 2013-02-08 02:10:40 UTC
*** Bug 60347 has been marked as a duplicate of this bug. ***
Comment 5 Jerome Glisse 2013-02-12 18:57:06 UTC
Created attachment 74707 [details] [review]
Mesa fix

Please try if attached mesa patch fix it
Comment 6 nine 2013-02-12 22:05:22 UTC
The patch seems to fix the problem indeed. I've been trying more than half an hour to get the GPU to lock up without result. Thank you very very much!
Comment 7 Jerome Glisse 2013-02-13 01:09:50 UTC
Proper fix commited to mesa
Comment 8 nine 2013-03-02 07:00:03 UTC
While it did seem to work in my first test with your patch, I've experienced frequent GPU lockups since. Last weekend I tried to bisect it, since it got back to reliable immediate lockups during fadeout of the splash screen.  Unfortunately they were not as reliable as they looked like, making the whole bisection useless.

Unfortunately I don't have more information yet. I'm running with --enable-debug but never got any assertion failures.

What can I do to debug this problem?
Comment 9 Jerome Glisse 2013-04-24 19:23:44 UTC
Please check if below patch fix the issue:

http://people.freedesktop.org/~glisse/0001-r600g-force-full-cache-for-hyperz.patch
Comment 10 nine 2013-04-30 20:03:22 UTC
Sorry for the late reply. Took me some time to get my test setup working after a distribution upgrade. It seems like your patch does indeed fix the problem. I've played around with FlightGear for several hours without any lockups whatsoever with R600_HYPERZ=1. Before I tried it without your patch and got an immediate lockup.

It seems like your patch is already committed to master. After upgrading to current master it continues to work flawlessly. Thanks!
Comment 11 Jerome Glisse 2013-05-06 14:47:01 UTC
Closing pushed to master and going to push to 9.1

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.