Bug 33381

Summary: [RADEON:KMS:R600G] es2gears freeze
Product: Mesa Reporter: Siganderson <dj_def>
Component: Drivers/Gallium/r600Assignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: major    
Priority: medium CC: dj_def
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: possible fix
dmesg trace from radeon 5750 running es2gears

Description Siganderson 2011-01-23 06:21:26 UTC
There is this strange behaviour when I use the opengl output. I made some test:

Supertuxkart:
When selecting the first option in the menu (1 player) the game stops.
After some second the screen becomes black, the monitor goes out of sync and a freeze can occur (altgr + rsist + r/e/i/s/u/b don't work).

Nexuiz:
The same but this time after selecting "start single player" in the menu (before the real game).

I get this behaviour only with 2.6.37/2.6.38 kernels and not with 2.6.35.
I use r600g (Radeon HD 5670 - Redwood).

In the dmesg output these lines can be read:


[  354.164019] radeon 0000:03:00.0: GPU lockup CP stall for more than 10040msec
[  354.164023] ------------[ cut here ]------------
[  354.164048] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:248 radeon_fence_wait+0x2e2/0x330 [radeon]()
[  354.164051] Hardware name: System Product Name
[  354.164053] GPU lockup (waiting for 0x0000AF5F last fence id 0x0000AF5A)
[  354.164055] Modules linked in: binfmt_misc snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq arc4 rt61pci crc_itu_t rt2x00pci rt2x00lib mac80211 snd_timer snd_seq_device radeon snd soundcore snd_page_alloc ttm drm_kms_helper usbhid drm intel_agp hwmon_vid asus_atk0110 coretemp hid intel_gtt i2c_algo_bit ppdev agpgart cfg80211 eeprom_93cx6 parport_pc lp parport floppy
[  354.164094] Pid: 869, comm: Xorg Not tainted 2.6.38-rc2-ubuntu #1
[  354.164096] Call Trace:
[  354.164105]  [<c0149ac2>] ? warn_slowpath_common+0x72/0xa0
[  354.164121]  [<f8508832>] ? radeon_fence_wait+0x2e2/0x330 [radeon]
[  354.164138]  [<f8508832>] ? radeon_fence_wait+0x2e2/0x330 [radeon]
[  354.164142]  [<c0149b93>] ? warn_slowpath_fmt+0x33/0x40
[  354.164158]  [<f8508832>] ? radeon_fence_wait+0x2e2/0x330 [radeon]
[  354.164164]  [<c01673c0>] ? autoremove_wake_function+0x0/0x50
[  354.164180]  [<f8508f91>] ? radeon_sync_obj_wait+0x11/0x20 [radeon]
[  354.164188]  [<f839ce87>] ? ttm_bo_wait+0xe7/0x180 [ttm]
[  354.164195]  [<f839dba2>] ? ttm_bo_list_ref_sub+0x22/0x30 [ttm]
[  354.164213]  [<f851fd62>] ? radeon_gem_wait_idle_ioctl+0x82/0xe0 [radeon]
[  354.164226]  [<f82e2d1f>] ? drm_ioctl+0x1df/0x430 [drm]
[  354.164244]  [<f851fce0>] ? radeon_gem_wait_idle_ioctl+0x0/0xe0 [radeon]
[  354.164250]  [<c010ad33>] ? restore_i387_fxsave+0x83/0x90
[  354.164260]  [<f82e2b40>] ? drm_ioctl+0x0/0x430 [drm]
[  354.164265]  [<c022a0cc>] ? do_vfs_ioctl+0x8c/0x5e0
[  354.164269]  [<c010aff0>] ? restore_i387_xstate+0xe0/0x210
[  354.164272]  [<c021ace8>] ? rw_verify_area+0x68/0x120
[  354.164276]  [<c0171338>] ? ktime_get_ts+0xd8/0x110
[  354.164280]  [<c01020b6>] ? restore_sigcontext+0xc6/0xe0
[  354.164283]  [<c022a697>] ? sys_ioctl+0x77/0x80
[  354.164286]  [<c010301f>] ? sysenter_do_call+0x12/0x28
[  354.164289] ---[ end trace 39eda9dd9d49fde9 ]---
[  354.165372] radeon 0000:03:00.0: GPU softreset 
[  354.165375] radeon 0000:03:00.0:   GRBM_STATUS=0xB2733828
[  354.165378] radeon 0000:03:00.0:   GRBM_STATUS_SE0=0x1C000007
[  354.165380] radeon 0000:03:00.0:   GRBM_STATUS_SE1=0x00000007
[  354.165383] radeon 0000:03:00.0:   SRBM_STATUS=0x200000C0
[  354.165393] radeon 0000:03:00.0:   GRBM_SOFT_RESET=0x00007F6B
[  354.165496] radeon 0000:03:00.0:   GRBM_STATUS=0x00003828
[  354.165498] radeon 0000:03:00.0:   GRBM_STATUS_SE0=0x00000007
[  354.165501] radeon 0000:03:00.0:   GRBM_STATUS_SE1=0x00000007
[  354.165504] radeon 0000:03:00.0:   SRBM_STATUS=0x200000C0
[  354.166500] radeon 0000:03:00.0: GPU reset succeed
[  354.166502] radeon 0000:03:00.0: GPU softreset 
[  354.166505] radeon 0000:03:00.0:   GRBM_STATUS=0x00003828
[  354.166507] radeon 0000:03:00.0:   GRBM_STATUS_SE0=0x00000007
[  354.166510] radeon 0000:03:00.0:   GRBM_STATUS_SE1=0x00000007
[  354.166512] radeon 0000:03:00.0:   SRBM_STATUS=0x200000C0
[  354.166522] radeon 0000:03:00.0:   GRBM_SOFT_RESET=0x00007F6B
[  354.166625] radeon 0000:03:00.0:   GRBM_STATUS=0x00003828
[  354.166627] radeon 0000:03:00.0:   GRBM_STATUS_SE0=0x00000007
[  354.166630] radeon 0000:03:00.0:   GRBM_STATUS_SE1=0x00000007
[  354.166632] radeon 0000:03:00.0:   SRBM_STATUS=0x200000C0
[  354.172484] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(3).
[  354.172487] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB !
[  354.186436] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(4).
[  354.186439] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB !
[  354.235194] radeon 0000:03:00.0: WB enabled
[  354.235635] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(5).
[  354.235638] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB !
[  354.240171] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(6).
[  354.240173] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB !
[  354.245072] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(7).
[  354.245075] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB !
[  354.249889] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(8).
[  354.249892] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB !
[  354.251416] [drm] ring test succeeded in 1 usecs
[  354.251422] [drm] ib test succeeded in 1 usecs
Comment 1 Alex Deucher 2011-01-26 09:32:03 UTC
If this is a regression can you bisect?
Comment 2 Siganderson 2011-01-26 10:16:31 UTC
(In reply to comment #1)
> If this is a regression can you bisect?

As it happens with gallium and not with classic mesa i discovered in IRC that this is the same bug as https://bugs.freedesktop.org/show_bug.cgi?id=33139

I really don't know if this is a regression because I always had this issue since I tried kernels >= 2.6.37
Comment 3 Alex Deucher 2011-01-26 10:40:40 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > If this is a regression can you bisect?
> 
> As it happens with gallium and not with classic mesa i discovered in IRC that
> this is the same bug as https://bugs.freedesktop.org/show_bug.cgi?id=33139
> 
> I really don't know if this is a regression because I always had this issue
> since I tried kernels >= 2.6.37

You said in the description that you didn't have the issue with 2.6.35.  So to clarify:

2.6.35:
r600c works?
r600g works?

2.6.37:
r600c works?
r600g doesn't work?

If changing the kernel to 2.6.35 fixes r600g, can you bisect the kernel to see what change broke r600g for you?
Comment 4 Siganderson 2011-01-26 11:06:53 UTC
2.6.35:
r600c "works" (supertuxkart works, nexuiz doesn't start If I well remember... I will retest it after the kernels)
r600g works
 
2.6.37:
r600c works
r600g doesn't work

I'm going to test 2.6.36 and eventually the .36.x versions.
Comment 5 Alex Deucher 2011-01-26 12:13:02 UTC
(In reply to comment #4)
> 2.6.35:
> r600c "works" (supertuxkart works, nexuiz doesn't start If I well remember... I
> will retest it after the kernels)
> r600g works
> 
> 2.6.37:
> r600c works
> r600g doesn't work
> 
> I'm going to test 2.6.36 and eventually the .36.x versions.

If 2.6.36 works, then use git to bisect between 2.6.36 and 2.6.37.  If 2.6.36 doesn't work, then use git to bisect between 2.6.35 and 2.6.36.
Comment 6 Siganderson 2011-01-26 14:23:41 UTC
2.6.36.3 works (I'm testing only gallium driver as it seems that it's the buggy one)
2.6.37-rc5 doesn't work...
I'm going to test 2.6.37-rc1.
Comment 7 Siganderson 2011-01-27 02:55:45 UTC
2.6.37-rc1 does not work while 2.6.36.3 does
Comment 8 Alexandros Frantzis 2011-01-27 04:13:34 UTC
I can confirm this issue (or at least something that is *very* similar) with a 5750 card using r600g and kernel >= 2.6.37-x. Things work fine with 2.6.36.3. FWIW, I am using 64-bit builds.

etracer:
Computer freezes when selecting a stage.

supertaxkart:
Computer freezes when selecting single player game from the menu.

es2gears (from mesa-demos):
Sometimes it works, sometimes 1-2 gears are missing or have corrupted geometry, sometimes it freezes the computer.

I don't get any interesting output in dmesg or any other standard log.
Comment 9 Alex Deucher 2011-01-27 09:04:37 UTC
Please try and bisect between 2.6.36 and 2.6.37 and see if you can track down the problematic commit.
Comment 10 Siganderson 2011-01-27 09:36:58 UTC
I started now to bisect 2.6.36/2.6.37-rc1, I need some hour because on my system every compilation takes about 50 minutes.
Comment 11 Alex Deucher 2011-01-27 13:50:46 UTC
Created attachment 42614 [details] [review]
possible fix

Does this drm patch help?
Comment 12 Siganderson 2011-01-28 05:32:20 UTC
These are the results:

ec46475f3e3163dd80bfee086fa71b36455ecc0b is the first bad commit
commit ec46475f3e3163dd80bfee086fa71b36455ecc0b
Author: Heikki Krogerus <ext-heikki.krogerus@nokia.com>
Date:   Thu Aug 19 15:09:36 2010 +0300

    power_supply: Add isp1704 charger detection driver
    
    NXP ISP1704 is Battery Charging Specification 1.0 compliant USB
    transceiver. This adds a power supply driver for ISP1704 and
    ISP1707 USB transceivers.
    
    Signed-off-by: Heikki Krogerus <ext-heikki.krogerus@nokia.com>
    Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>

:040000 040000 7b10b20275d71a6ba2062426bc1f0a63693a4e5e db3ba3e86de1e90a947de26f5e040468886c7e85 M      drivers


I'm going to test the patch.
Comment 13 Siganderson 2011-01-28 09:11:50 UTC
yes, the patch sloves the problem!
Comment 14 Siganderson 2011-01-28 09:12:47 UTC
yes, the patch solves the problem!
Comment 15 Alexandros Frantzis 2011-01-28 12:32:37 UTC
Created attachment 42670 [details]
dmesg trace from radeon 5750 running es2gears
Comment 16 Alexandros Frantzis 2011-01-28 12:33:32 UTC
The latest kernel from mainline (which includes the proposed fix) only partially solves the issue in my case. Specifically, the two games I tried (etracer and supertuxkart) now work without problem, however issues remain when running es2gears from mesa-demos.

When running es2gears it usually runs fine, sometimes there is flickering in the GL window but more importantly: sometimes the monitor turns off and on again continuously (every couple of seconds).

The good news is that this time I have an interesing dmesg trace (attached).
Comment 17 Alex Deucher 2011-01-28 13:03:30 UTC
(In reply to comment #16)

> When running es2gears it usually runs fine, sometimes there is flickering in
> the GL window but more importantly: sometimes the monitor turns off and on
> again continuously (every couple of seconds).
> 
> The good news is that this time I have an interesing dmesg trace (attached).

This is a result of the GPU being reset due to a detected lockup.
Comment 18 Michel Dänzer 2011-07-15 04:40:32 UTC
If you can still reproduce this, please attach the output of

EGL_LOG_LEVEL=debug es2gears

and maybe the output of es2_info.
Comment 19 Jerome Glisse 2012-02-22 10:05:43 UTC
Closing reopen if it's still an issue with r600g from mesa 8.0 or newer

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.