Created attachment 38388 [details] GPU lockup call trace GPU lockup loading savefile in Vega Strike (SVN), with r600 classic driver. Bisected to mesa: 5ad74779cea07cc6a19a52874cdaef8b018e2f1b (Eric Anholt, ir_to_mesa: Load all the STATE_VAR elements of a builtin uniform to a temp.) Verified by reverting the above commit against mesa head (6e3cbeb3614152ea3aa188666d6166b484ee3f56). System environment: -- system architecture: amd64 -- Linux distribution: Gentoo -- GPU: RS780G -- Model: ATI Radeon HD 3200 -- Display connector: VGA -- xf86-video-ati: 2b98ec1f7e931019a4ab699a56d5dfaa395946fb -- xserver: 1.9.0 -- mesa: git -- drm: 8a76244a0fd09d0e3298fe68af812d7eaa4dbcb5 -- kernel: 2.6.35.3
from testing with piglit glsl-vs-texturematrix2 the likely cause seems that now we use more temp registers in vs than we'v programmed to SQ_GPR_RESOURCE_MGMT. The commit above increased the register usage but we should not hang the card anyway in this case. Don't know how to fix this though, SQ_GPR_RESOURCE_MGMT is hardcoded for now and to fix this correctly we should probably set the values to max we will use in a cs. However there's no infrastructure in driver to change values already (in the beginning) of the cs. Alos I don't know the exact constraints/dependencies for the RESOURCE_MGMT things
Narrowing down, as of mesa commit f061524f0737bf59dad6ab9bb2e0015df804e4b5: This GPU lockup doesn't appear with shaders disabled. It appears with shaders set to "simplest shaders", and it can be reproduced by starting a new campaign. "Simplest shaders" are defined in http://vegastrike.svn.sourceforge.net/viewvc/vegastrike/trunk/data/techniques/2_ps1.4/default_simple.technique?revision=12869&view=markup which points to http://vegastrike.svn.sourceforge.net/viewvc/vegastrike/trunk/data/programs/fixed5.vp?revision=12671&view=markup
For now, fixed with 280665be7026c978acead9713c10271c36a571ee
OK, with the new version after the revert, this should (still) be fixed. If it isn't, could you attach the output of MESA_GLSL=dump? (In general, this is probably a good idea for shader-related issues)
Created attachment 38535 [details] output of MESA_GLSL=dump Locking up again. I'm using mesa-git at commit a09a8ec12d76e1fb1583fa99cf9f48246c108d7b. This is the output of "MESA_GLSL=dump vegastrike -j nothing.mission". This avoids the game menu in order to minimize output. According to /var/log/messages, there were 26 lockups in a row. I'm not sure how useful this output is - my impression was that the lockups started right after initialisation, but at the end of the dump I still see initialisation messages.
Created attachment 38540 [details] working output of MESA_GLSL=dump For comparison purposes, this is the same output with mesa-git a09a8ec12d76e1fb1583fa99cf9f48246c108d7b and reverting acd7c21541110d7ae6b9e63647391f65946e5c5d and 6c0ba32fd1466e8c1700acab3003dc1fe1deb337. Working nicely. Apparently, there's not much output after the final initialisation messages.
(Still) locking up using mesa-git at commit 777f352e6087e3ef05f7a88232f23e4f971bc5a0
I noticed that I can make some other applications lock up the GPU in a quite similar way, using MESA_GLSL=nopt: Celestia locks up on start, but not immediately: Sun gets displayed without problem, the GPU locks up as soon as Earth should get rendered. Oolite also locks up on start, as soon as a spaceship should get rendered. Currently I'm using mesa 9476efe77ff196993937c3aa2e5bca725ceb0b41 and kernel 2.6.35.4 (with Marek Olšák's 10 seconds patch http://comments.gmane.org/gmane.comp.video.dri.devel/49821 ). Without MESA_GLSL=nopt both are working without a problem. However, with it both also lock up using mesa a09a8ec12d76e1fb1583fa99cf9f48246c108d7b and reverting acd7c21541110d7ae6b9e63647391f65946e5c5d and 6c0ba32fd1466e8c1700acab3003dc1fe1deb337, as in comment #6.
Removing myself from CC. This bug is that the radeon driver needs to cleanly reject shaders at link time that it can't handle. (incidentally, i915 has the same issue, and i965 to some extent).
Still the same. Here's a GPU lockup with r600g: System environment: -- system architecture: amd64 -- Linux distribution: Gentoo -- GPU: RS780G -- Model: ATI Radeon HD 3200 -- Display connector: VGA -- xf86-video-ati: f9bbb26dd97254b66de11bb2abd821aa293ecba5 -- xserver: 1.9.2.901 -- mesa: 859106f196ade77f59f8787b071739901cd1a843 -- drm: 8420743301a36dc1316fadf53bf8e1478068400a -- kernel: 2.6.37-rc4-next-20101203-03967-g43cebba Dec 4 20:38:23 absol kernel: radeon 0000:01:05.0: GPU lockup CP stall for more than 10000msec Dec 4 20:38:23 absol kernel: ------------[ cut here ]------------ Dec 4 20:38:23 absol kernel: WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:244 radeon_fence_wait+0x235/0x2d3 [radeon]() Dec 4 20:38:23 absol kernel: Hardware name: P-M3A3200 Dec 4 20:38:23 absol kernel: GPU lockup (waiting for 0x0001BF3E last fence id 0x0001BF3D) Dec 4 20:38:23 absol kernel: Modules linked in: ipt_MASQUERADE iptable_nat nf_nat ipt_REJECT xt_limit ipt_ULOG xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter xt_multiport xt_iprange xt_mark ip_tables x_tables dm_mod hid_logitech usbhid radeon ttm drm_kms_helper drm i2c_piix4 i2c_algo_bit cfbcopyarea cfbimgblt cfbfillrect speedtch usbatm [last unloaded: usb_storage] Dec 4 20:38:23 absol kernel: Pid: 14067, comm: vegastrike Not tainted 2.6.37-rc4-next-20101203-03967-g43cebba #115 Dec 4 20:38:23 absol kernel: Call Trace: Dec 4 20:38:23 absol kernel: [<ffffffff81033180>] ? warn_slowpath_common+0x78/0x8c Dec 4 20:38:23 absol kernel: [<ffffffff81033233>] ? warn_slowpath_fmt+0x45/0x4a Dec 4 20:38:23 absol kernel: [<ffffffffa00d1829>] ? radeon_fence_wait+0x235/0x2d3 [radeon] Dec 4 20:38:23 absol kernel: [<ffffffff81048b69>] ? autoremove_wake_function+0x0/0x2a Dec 4 20:38:23 absol kernel: [<ffffffffa00987a8>] ? ttm_bo_wait+0xca/0x171 [ttm] Dec 4 20:38:23 absol kernel: [<ffffffffa00e3e7a>] ? radeon_gem_wait_idle_ioctl+0x7d/0xe9 [radeon] Dec 4 20:38:23 absol kernel: [<ffffffffa00470c4>] ? drm_ioctl+0x236/0x2ea [drm] Dec 4 20:38:23 absol kernel: [<ffffffffa00e3dfd>] ? radeon_gem_wait_idle_ioctl+0x0/0xe9 [radeon] Dec 4 20:38:23 absol kernel: [<ffffffff8102244d>] ? do_page_fault+0x306/0x33f Dec 4 20:38:23 absol kernel: [<ffffffff81089dc1>] ? mmap_region+0x3a7/0x4bc Dec 4 20:38:23 absol kernel: [<ffffffff810abe64>] ? do_vfs_ioctl+0x3f3/0x440 Dec 4 20:38:23 absol kernel: [<ffffffff810abeed>] ? sys_ioctl+0x3c/0x5c Dec 4 20:38:23 absol kernel: [<ffffffff81001f3b>] ? system_call_fastpath+0x16/0x1b Dec 4 20:38:23 absol kernel: ---[ end trace 9edad903af395b1b ]--- Dec 4 20:38:23 absol kernel: radeon 0000:01:05.0: GPU softreset Dec 4 20:38:23 absol kernel: radeon 0000:01:05.0: R_008010_GRBM_STATUS=0xA25334E0 Dec 4 20:38:23 absol kernel: radeon 0000:01:05.0: R_008014_GRBM_STATUS2=0x00000103 Dec 4 20:38:23 absol kernel: radeon 0000:01:05.0: R_000E50_SRBM_STATUS=0x20000040 Dec 4 20:38:23 absol kernel: radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00007FEE Dec 4 20:38:23 absol kernel: radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00000001 Dec 4 20:38:24 absol kernel: radeon 0000:01:05.0: R_008010_GRBM_STATUS=0xA0003030 Dec 4 20:38:24 absol kernel: radeon 0000:01:05.0: R_008014_GRBM_STATUS2=0x00000003 Dec 4 20:38:24 absol kernel: radeon 0000:01:05.0: R_000E50_SRBM_STATUS=0x20008040 Dec 4 20:38:24 absol kernel: radeon 0000:01:05.0: GPU reset succeed Dec 4 20:38:24 absol kernel: radeon 0000:01:05.0: WB enabled Dec 4 20:38:24 absol kernel: [drm] ring test succeeded in 1 usecs Dec 4 20:38:24 absol kernel: [drm] ib test succeeded in 1 usecs
Yay! One week ago it was still locking up, but now it's fixed. No lock-up any more, working with r600g as well as r600c. Thanks! System environment: -- system architecture: amd64 -- Linux distribution: Gentoo -- GPU: RS780 -- Model: ATI Radeon HD 3200 (780G) -- Display connector: VGA -- xf86-video-ati: 6.14.0 -- xserver: 1.9.4 -- mesa: 0adeaf00e6c4592e78cca36c3b365110b83c965d -- drm: 550fe2ca3b29ad2191eab4fdfbed9ed21e25492d -- kernel: 2.6.38-rc5
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.