Bug 32447

Summary: GPU lockup with Braid
Product: DRI Reporter: Marti Raudsepp <marti>
Component: DRM/RadeonAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: voas0113
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Marti Raudsepp 2010-12-16 10:16:19 UTC
When trying to run the "Braid" game, recently released for Linux as part of the Humble Indie Bundle, I see the game with corrupt graphics, and a segfault crash after a short while. After several such tries, it started locking up the GPU. I'm using Radeon HD 4870 (RV770) on 64-bit Arch Linux. The game is 32-bit.

In order to run this game in the first place, I had to set force_s3tc_enable in drirc and install the libtxc_dxtn library.

Versions:
kernel 2.6.36.2
xorg-server 1.9.2
ati-dri 7.9.0.git20101207
mesa 7.9.0.git20101207
xf86-video-ati 6.13.2
glproto 1.4.12


At first when I run it I get:
% gdb --args ./braid -windowed
(gdb) run
unsupported texture format in setup_hardware_state
failed to validate texture for unit 0.
... ^ repeats lots of times

Program received signal SIGSEGV, Segmentation fault.
0xf63e40ed in radeonEmitVec4 () from /usr/lib32/xorg/modules/dri/r600_dri.so
(gdb) bt
#0  0xf63e40ed in radeonEmitVec4 () from /usr/lib32/xorg/modules/dri/r600_dri.so
#1  0xf63c1fbe in r700DrawPrims () from /usr/lib32/xorg/modules/dri/r600_dri.so
#2  0xf64ab187 in vbo_exec_DrawArrays () from /usr/lib32/xorg/modules/dri/r600_dri.so
#3  0xf64a1797 in neutral_DrawArrays () from /usr/lib32/xorg/modules/dri/r600_dri.so
#4  0x081974cb in Display_System_OGL::immediate_flush() ()
#5  0x080cdfd1 in draw_intro_notification(float, char*, char*) ()
#6  0x080cf9e4 in draw_overlays_for_gameplay() ()
#7  0x080d0c2a in draw_world_view() ()
#8  0x080eaed1 in draw_game_mode() ()
#9  0x080eda1e in app_main(int, char**) ()
#10 0x0815223e in main ()

----

After a few tries, I start getting these GPU lockups, which turn my screen into a garbled mess. After a while the desktop environment re-appears, but then crashes again and finally suspends both my screens.

I can switch to a VT terminal and 'kill -9 braid'. When switching back to Xorg, everything is back to normal. :)

Here's from dmesg:

radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
------------[ cut here ]------------
WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:235 radeon_fence_wait+0x376/0x3e0 [radeon]()
Hardware name: System Product Name
GPU lockup (waiting for 0x0001D61D last fence id 0x0001D61C)
Modules linked in: fuse ip6table_filter ip6_tables xt_CHECKSUM bridge stp llc hwmon_vid cpufreq_ondemand sit tunnel4 iptable_mangle xt_state ipt_REJECT xt_NFQUEUE iptable_filter ipt_REDIRECT xt_tcpudp ipt_MASQUERADE xt_owner iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables btrfs zlib_deflate crc32c libcrc32c usbhid hid snd_hda_codec_atihdmi usb_storage snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device uvcvideo videodev v4l1_compat v4l2_compat_ioctl32 snd_hda_codec_via radeon snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd ttm soundcore snd_page_alloc drm_kms_helper drm i2c_piix4 i2c_algo_bit i2c_core ohci_hcd dm_mod ehci_hcd edac_core edac_mce_amd evdev sg asus_atk0110 parport_pc usbcore ppdev button thermal psmouse wmi serio_raw shpchp pci_hotplug k10temp lp parport kvm_amd kvm r8169 mii powernow_k8 freq_table processor mperf ipv6 autofs4 ext4 mbcache jbd2 crc16 sr_mod cdrom floppy pata_atiixp pata_acpi sd_mod ahci libahci libata scsi_mod
Pid: 3274, comm: X Not tainted 2.6.36-ARCH #1
------------[ cut here ]------------
WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:235 radeon_fence_wait+0x376/0x3e0 [radeon]()
Hardware name: System Product Name
GPU lockup (waiting for 0x0001D61F last fence id 0x0001D61C)
Modules linked in: fuse ip6table_filter ip6_tables xt_CHECKSUM bridge stp llc hwmon_vid cpufreq_ondemand sit tunnel4 iptable_mangle xt_state ipt_REJECT xt_NFQUEUE iptable_filter ipt_REDIRECT xt_tcpudp ipt_MASQUERADE xt_owner iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables btrfs zlib_deflate crc32c libcrc32c usbhid hid snd_hda_codec_atihdmi usb_storage snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device uvcvideo videodev v4l1_compat v4l2_compat_ioctl32 snd_hda_codec_via radeon snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd ttm soundcore snd_page_alloc drm_kms_helper drm i2c_piix4 i2c_algo_bit i2c_core ohci_hcd dm_mod ehci_hcd edac_core edac_mce_amd evdev sg asus_atk0110 parport_pc usbcore ppdev button thermal psmouse wmi serio_raw shpchp pci_hotplug k10temp lp parport kvm_amd kvm r8169 mii powernow_k8 freq_table processor mperf ipv6 autofs4 ext4 mbcache jbd2 crc16 sr_mod cdrom floppy pata_atiixp pata_acpi sd_mod ahci libahci libata scsi_mod
Pid: 3830, comm: braid Not tainted 2.6.36-ARCH #1
Call Trace:
Call Trace:
 [<ffffffff81054f7a>] warn_slowpath_common+0x7a/0xb0
 [<ffffffff81054f7a>] warn_slowpath_common+0x7a/0xb0
 [<ffffffff81055051>] warn_slowpath_fmt+0x41/0x50
 [<ffffffff81055051>] warn_slowpath_fmt+0x41/0x50
 [<ffffffffa04445c6>] radeon_fence_wait+0x376/0x3e0 [radeon]
 [<ffffffffa04445c6>] radeon_fence_wait+0x376/0x3e0 [radeon]
 [<ffffffff81075b40>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81075b40>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa045c3c1>] radeon_ib_get+0x121/0x1e0 [radeon]
 [<ffffffffa0444e7c>] radeon_sync_obj_wait+0xc/0x10 [radeon]
 [<ffffffffa02b5189>] ttm_bo_wait+0xf9/0x1b0 [ttm]
 [<ffffffffa045dbd9>] radeon_cs_ioctl+0x89/0x1e0 [radeon]
 [<ffffffffa0288e4f>] ? drm_mode_cursor_ioctl+0xbf/0x170 [drm]
 [<ffffffffa045bcbe>] radeon_gem_wait_idle_ioctl+0x8e/0x110 [radeon]
 [<ffffffffa027a914>] drm_ioctl+0x3d4/0x4b0 [drm]
 [<ffffffffa027a914>] drm_ioctl+0x3d4/0x4b0 [drm]
 [<ffffffffa045db50>] ? radeon_cs_ioctl+0x0/0x1e0 [radeon]
 [<ffffffffa045bc30>] ? radeon_gem_wait_idle_ioctl+0x0/0x110 [radeon]
 [<ffffffff811b8411>] ? tomoyo_path_number_perm+0x41/0x140
 [<ffffffff81065c06>] ? recalc_sigpending+0x16/0x40
 [<ffffffff8100a30d>] ? do_signal+0x17d/0x7c0
 [<ffffffff8112d8a2>] ? do_sync_read+0xd2/0x110
 [<ffffffff812b9820>] ? input_event_to_user+0x50/0x60
 [<ffffffffa04b8f8f>] radeon_kms_compat_ioctl+0xf/0x30 [radeon]
 [<ffffffff81172fa1>] compat_sys_ioctl+0xe1/0x11c0
 [<ffffffff811bc185>] ? tomoyo_init_request_info+0x35/0x60
 [<ffffffff8113e535>] do_vfs_ioctl+0x95/0x540
 [<ffffffff811b0896>] ? security_file_permission+0x76/0xa0
 [<ffffffff8113ea61>] sys_ioctl+0x81/0xa0
 [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
 [<ffffffff81012fe9>] ? read_tsc+0x9/0x20
 [<ffffffff8107f9e0>] ? getnstimeofday+0x60/0xf0
---[ end trace ccbb302085d73014 ]---
 [<ffffffff8107fad5>] ? do_gettimeofday+0x15/0x50
[drm] Disabling audio support
 [<ffffffff8103d670>] cstar_dispatch+0x7/0x2e
---[ end trace ccbb302085d73015 ]---
radeon 0000:01:00.0: GPU softreset 
radeon 0000:01:00.0:   R_008010_GRBM_STATUS=0xE77324A4
radeon 0000:01:00.0:   R_008014_GRBM_STATUS2=0x00FF0102
radeon 0000:01:00.0:   R_000E50_SRBM_STATUS=0x200000C0
radeon 0000:01:00.0: ffff8801277c9c00 unpin not necessary
radeon 0000:01:00.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
radeon 0000:01:00.0: ffff88012b3f6800 unpin not necessary
radeon 0000:01:00.0: ffff8801277c8200 unpin not necessary
radeon 0000:01:00.0: GPU softreset 
radeon 0000:01:00.0:   R_008010_GRBM_STATUS=0xB0003028
radeon 0000:01:00.0:   R_008014_GRBM_STATUS2=0x00000002
radeon 0000:01:00.0:   R_000E50_SRBM_STATUS=0x200000C0
radeon 0000:01:00.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001
radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001
radeon 0000:01:00.0:   R_008010_GRBM_STATUS=0x00003028
radeon 0000:01:00.0:   R_008014_GRBM_STATUS2=0x00000002
radeon 0000:01:00.0:   R_000E50_SRBM_STATUS=0x200000C0
radeon 0000:01:00.0:   R_008010_GRBM_STATUS=0x00003028
radeon 0000:01:00.0:   R_008014_GRBM_STATUS2=0x00000002
radeon 0000:01:00.0:   R_000E50_SRBM_STATUS=0x200000C0
radeon 0000:01:00.0: GPU reset succeed
radeon 0000:01:00.0: GPU reset succeed
[drm] ring test succeeded in 1501 usecs
[drm] ib test succeeded in 1 usecs
[drm] Enabling audio support
[drm] ring test succeeded in 1 usecs
[drm] ib test succeeded in 0 usecs
[drm] Enabling audio support




In my Xorg.0.log I get:
[ 37336.384] [mi] EQ overflowing. The server is probably stuck in an infinite loop.
[ 37336.384] 
Backtrace:
[ 37336.406] 0: /usr/bin/X (xorg_backtrace+0x28) [0x49f1a8]
[ 37336.406] 1: /usr/bin/X (mieqEnqueue+0x1f4) [0x49e5b4]
[ 37336.406] 2: /usr/bin/X (xf86PostMotionEventP+0xc4) [0x47b724]
[ 37336.406] 3: /usr/lib/xorg/modules/input/evdev_drv.so (0x7f47b1a72000+0x42cf) [0x7f47b1a762cf]
[ 37336.406] 4: /usr/bin/X (0x400000+0x69737) [0x469737]
[ 37336.407] 5: /usr/bin/X (0x400000+0x118413) [0x518413]
[ 37336.407] 6: /lib/libpthread.so.0 (0x7f47b5a5d000+0xf1c0) [0x7f47b5a6c1c0]
[ 37336.407] 7: /lib/libc.so.6 (ioctl+0x7) [0x7f47b4a8b7f7]
[ 37336.407] 8: /usr/lib/libdrm.so.2 (drmIoctl+0x28) [0x7f47b3875568]
[ 37336.407] 9: /usr/lib/libdrm.so.2 (drmCommandWriteRead+0x1c) [0x7f47b38778dc]
[ 37336.407] 10: /usr/lib/libdrm_radeon.so.1 (0x7f47b2b44000+0x1b39) [0x7f47b2b45b39]
[ 37336.407] 11: /usr/lib/xorg/modules/drivers/radeon_drv.so (0x7f47b2d4a000+0xc9bd3) [0x7f47b2e13bd3]
[ 37336.407] 12: /usr/lib/xorg/modules/drivers/radeon_drv.so (0x7f47b2d4a000+0xca5fb) [0x7f47b2e145fb]
[ 37336.407] 13: /usr/lib/libdrm.so.2 (drmHandleEvent+0xe4) [0x7f47b3879954]
[ 37336.407] 14: /usr/bin/X (WakeupHandler+0x4b) [0x4312bb]
[ 37336.407] 15: /usr/bin/X (WaitForSomething+0x1a4) [0x459754]
[ 37336.407] 16: /usr/bin/X (0x400000+0x2cf32) [0x42cf32]
[ 37336.407] 17: /usr/bin/X (0x400000+0x212ce) [0x4212ce]
[ 37336.408] 18: /lib/libc.so.6 (__libc_start_main+0xfd) [0x7f47b49e2c4d]
[ 37336.408] 19: /usr/bin/X (0x400000+0x20e79) [0x420e79]
[ 37352.023] [mi] EQ overflowing. The server is probably stuck in an infinite loop.
[ 37352.024] 
Backtrace:
[ 37352.024] 0: /usr/bin/X (xorg_backtrace+0x28) [0x49f1a8]
[ 37352.024] 1: /usr/bin/X (mieqEnqueue+0x1f4) [0x49e5b4]
[ 37352.024] 2: /usr/bin/X (xf86PostKeyEventP+0x67) [0x47bd47]
[ 37352.024] 3: /usr/bin/X (xf86PostKeyboardEvent+0x19) [0x47beb9]
[ 37352.024] 4: /usr/lib/xorg/modules/input/evdev_drv.so (0x7f47b1a72000+0x430f) [0x7f47b1a7630f]
[ 37352.024] 5: /usr/bin/X (0x400000+0x69737) [0x469737]
[ 37352.024] 6: /usr/bin/X (0x400000+0x118413) [0x518413]
[ 37352.025] 7: /lib/libpthread.so.0 (0x7f47b5a5d000+0xf1c0) [0x7f47b5a6c1c0]
[ 37352.025] 8: /lib/libc.so.6 (ioctl+0x7) [0x7f47b4a8b7f7]
[ 37352.025] 9: /usr/lib/libdrm.so.2 (drmIoctl+0x28) [0x7f47b3875568]
[ 37352.025] 10: /usr/lib/libdrm.so.2 (drmCommandWriteRead+0x1c) [0x7f47b38778dc]
[ 37352.025] 11: /usr/lib/libdrm_radeon.so.1 (0x7f47b2b44000+0x1b39) [0x7f47b2b45b39]
[ 37352.025] 12: /usr/lib/xorg/modules/drivers/radeon_drv.so (0x7f47b2d4a000+0xc9bd3) [0x7f47b2e13bd3]
[ 37352.025] 13: /usr/lib/xorg/modules/drivers/radeon_drv.so (0x7f47b2d4a000+0xca5fb) [0x7f47b2e145fb]
[ 37352.025] 14: /usr/lib/libdrm.so.2 (drmHandleEvent+0xe4) [0x7f47b3879954]
[ 37352.025] 15: /usr/bin/X (WakeupHandler+0x4b) [0x4312bb]
[ 37352.025] 16: /usr/bin/X (WaitForSomething+0x1a4) [0x459754]
[ 37352.025] 17: /usr/bin/X (0x400000+0x2cf32) [0x42cf32]
[ 37352.025] 18: /usr/bin/X (0x400000+0x212ce) [0x4212ce]
[ 37352.025] 19: /lib/libc.so.6 (__libc_start_main+0xfd) [0x7f47b49e2c4d]
[ 37352.025] 20: /usr/bin/X (0x400000+0x20e79) [0x420e79]
[ 37355.569] (II) AIGLX: Suspending AIGLX clients for VT switch
[ 37366.442] (II) AIGLX: Resuming AIGLX clients after VT switch
[ 37366.578] (II) RADEON(0): EDID vendor "VSC", prod id 58651
[ 37366.578] (II) RADEON(0): Using hsync ranges from config file
[ 37366.578] (II) RADEON(0): Using vrefresh ranges from config file
[ 37366.578] (II) RADEON(0): Printing DDC gathered Modelines:
*snip modelines*
Comment 1 Marti Raudsepp 2010-12-16 10:30:25 UTC
Another data point, when running with MESA_DEBUG=verbose I get these:

Mesa: User error: GL_INVALID_OPERATION in glProgramStringARB(invalid ARB fragment program option)

Mesa: 19 similar GL_INVALID_OPERATION errors
Mesa: User error: GL_INVALID_ENUM in glFramebufferTexture2DEXT(attachment)
Mesa: User error: GL_INVALID_VALUE in glDrawBuffersARB(n)
Mesa: User error: GL_INVALID_ENUM in glFramebufferTexture2DEXT(attachment)
unsupported texture format in setup_hardware_state
failed to validate texture for unit 0.
Mesa: User error: GL_INVALID_ENUM in glFramebufferTexture2DEXT(attachment)
Mesa: User error: GL_INVALID_ENUM in glFramebufferTexture2DEXT(attachment)
Mesa: User error: GL_INVALID_ENUM in glFramebufferTexture2DEXT(attachment)
Mesa: User error: GL_INVALID_VALUE in glDrawBuffersARB(n)
Mesa: User error: GL_INVALID_ENUM in glFramebufferTexture2DEXT(attachment)
unsupported texture format in setup_hardware_state
failed to validate texture for unit 0.
Comment 2 marek 2010-12-21 00:08:31 UTC
(In reply to comment #1)
> Another data point, when running with MESA_DEBUG=verbose I get these:
> 
> Mesa: User error: GL_INVALID_OPERATION in glProgramStringARB(invalid ARB
> fragment program option)
> 
> Mesa: 19 similar GL_INVALID_OPERATION errors
> Mesa: User error: GL_INVALID_ENUM in glFramebufferTexture2DEXT(attachment)
> Mesa: User error: GL_INVALID_VALUE in glDrawBuffersARB(n)
> Mesa: User error: GL_INVALID_ENUM in glFramebufferTexture2DEXT(attachment)
> unsupported texture format in setup_hardware_state
> failed to validate texture for unit 0.
> Mesa: User error: GL_INVALID_ENUM in glFramebufferTexture2DEXT(attachment)
> Mesa: User error: GL_INVALID_ENUM in glFramebufferTexture2DEXT(attachment)
> Mesa: User error: GL_INVALID_ENUM in glFramebufferTexture2DEXT(attachment)
> Mesa: User error: GL_INVALID_VALUE in glDrawBuffersARB(n)
> Mesa: User error: GL_INVALID_ENUM in glFramebufferTexture2DEXT(attachment)
> unsupported texture format in setup_hardware_state
> failed to validate texture for unit 0.


I have exactly same symptoms as Marti, 
but on intel gma945, i686 intel-dri, kernel
Linux beruska 2.6.36-ARCH #1 SMP PREEMPT Fri Dec 10 20:01:53 UTC 2010 i686 Intel(R) Atom(TM) CPU N270 @ 1.60GHz


Should I file in another bug report or just join this for intel & i686? 


[marek@beruska Braid]$ MESA_DEBUG=verbose braid -windowed
ALSA lib pcm.c:7245:(snd_pcm_recover) underrun occured
Mesa: User error: GL_INVALID_OPERATION in glProgramStringARB(invalid ARB fragment program option)

Mesa: 19 similar GL_INVALID_OPERATION errors
Mesa: User error: GL_INVALID_VALUE in glDrawBuffersARB(n)
ALSA lib pcm.c:7245:(snd_pcm_recover) underrun occured
i915_program_error: Exceeded max nr indirect texture lookups (8 out of 4)
i915_program_error: Exceeded max nr indirect texture lookups (8 out of 4)
i915_program_error: Exceeded max ALU instructions (83 out of 64)
Mesa: User error: GL_INVALID_VALUE in glDrawBuffersARB(n)
^CMesa: User error: GL_INVALID_VALUE in glDrawBuffersARB(n)
^Z
[1]+  Stopped                 MESA_DEBUG=verbose braid -windowed
[marek@beruska Braid]$ kill -9 %1

[1]+  Stopped                 MESA_DEBUG=verbose braid -windowed
[marek@beruska Braid]$ yaourt -Qs intel-dri
testing/intel-dri 7.9.99.git20101217-1 [8.87 M]
    Mesa DRI drivers for Intel


Thanks, Marek
Comment 3 Marti Raudsepp 2010-12-21 00:15:13 UTC
(In reply to comment #2)
> I have exactly same symptoms as Marti, 
> but on intel gma945, i686 intel-dri, kernel

> Should I file in another bug report or just join this for intel & i686?

What you quoted is probably just the application misusing OpenGL, not a symptom of any bug. You should report another bug.
Comment 4 Laurent carlier 2011-03-03 12:23:20 UTC
Braid works for me with mesa 7.10.1 without s3tc, no blank screen or lock up.

Perhaps this bug report should be closed ?
Comment 5 Marti Raudsepp 2011-03-03 12:32:12 UTC
Works for me as well, full screen and windowed.
Marking this resolved. Thanks to all developers involved!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.