Created attachment 17433 [details] strace of glxgears Running any 3D applcation on my DEC Alpha will cause X to lock solid. Quake3 triggers it after being in the Q3 menu for two or three seconds. glxgears will sporadically trigger it, and triggers it with higher frequency if any part of the gears window is covered, dragging the cursor over or placing another window over the top. Specs: Dual 833MHz EV68 Radeon 9100 xserver-1.4.2 mesa-7.0.3 libdrm-git 5d27fd94afaaf434c3a92af0075420b550055bfb (June 5, 2008) x11-drm-git xf86-video-ati-6.8.0 or 6.9.0. Attached is the contents of glxgears.log as generated by 'strace glxgears &> glxgears.log'; Note: it is 9MB of text generated in only a few seconds, so I've bz2'd it to < 200 K. The log was originally 26MB, but I've trimmed the 17MB of the same exact repeating line from the end of the file.
Created attachment 17434 [details] dmesg from a fresh boot and X startup
Created attachment 17435 [details] kernel config for 2.6.24-gentoo-r3
Can you attach your xorg log and config as well?
Created attachment 17447 [details] xorg.conf I can't believe I forgot to attach this.
Created attachment 17448 [details] output of `dmesg` with drm debug before running glxgears This is the output of `dmesg` before glxgears has been run. I modprobed drm with `modprobe drm debug=1` and modprobed radeon before starting X.
Created attachment 17449 [details] output of `dmesg` with drm debug after running glxgears This is the output of `dmesg` after glxgears has been run. I modprobed drm with `modprobe drm debug=1` and modprobed radeon before starting X. Note, pid=3770 was glxgears during this run.
Created attachment 17450 [details] Xorg.0.log This log is as it is after glxgears has been run. Running glxgears adds nothing to the log.
I just tried this again on my UP1500 with the Radeon 9100, kernel 2.6.29_rc2 with DRM/radeon built as modules, xserver-1.5.3, libdrm-2.4.4, mesa-7.2, xf86-video-ati.6.10. glxgears fails with -22. The following is output to dmesg, but other than that it does not kill the system like before. [drm:radeon_cp_cmdbuf] *ERROR* radeon_cp_cmdbuf called without lock held, held 0 owner fffffc00f24a1600 fffffc00f24a1600 What can I do to debug further?
(In reply to comment #8) > > [drm:radeon_cp_cmdbuf] *ERROR* radeon_cp_cmdbuf called without lock held, held > 0 owner fffffc00f24a1600 fffffc00f24a1600 Looks like maybe the DRM lock code is broken on alpha, maybe try writing 1 to /sys/module/drm/parameters/debug before starting glxgears to get more DRM debugging output. This may be a separate issue which just masks the one originally reported here though.
Created attachment 22217 [details] output of `dmesg` with drm debug before running glxgears In this, pid=3409 is /usr/bin/X as seen by `ps aux` root 3409 0.4 0.2 21808 9232 tty7 Ss+ 17:55 0:02 /usr/bin/X :0 vt7 -auth /etc/X11/xdm/authdir/authfiles/A:0-7ZT4Pa
Created attachment 22218 [details] output of `dmesg` with drm debug after running glxgears pid 3409 is /usr/bin/X. I can only assume pid 3454 is glxgears.
Using libdrm-2.3.0, mesa-6.5.2 and xserver-1.3 -> DRI works Nothing in the locking code in libdrm/xf86drm.h (DRM_CAS etc) has changed from libdrm 2.3.0 to 2.4.4. Other suggestions for where to look?
(In reply to comment #12) > Using libdrm-2.3.0, mesa-6.5.2 and xserver-1.3 -> DRI works What about the problem you reported originally? > Nothing in the locking code in libdrm/xf86drm.h (DRM_CAS etc) has changed from > libdrm 2.3.0 to 2.4.4. Is that using the same kernel / DRM modules? If so, you should be able to isolate the userspace change which broke things for you. Anyway, looking at the debugging output: [drm:drm_lock] 3 (pid 3454) requests lock (0x00000001), flags = 0x00000000 [drm:drm_lock] 3 has lock [drm:drm_ioctl] pid=3454, cmd=0x80206450, nr=0x50, dev 0xe200, auth=1 [drm:radeon_cp_cmdbuf] *ERROR* radeon_cp_cmdbuf called without lock held, held 0 owner fffffc00f2a57c00 fffffc00f2a57c00 The first two lines indicate that drm_lock_take() succeeds for glxgears' context, but then LOCK_TEST_WITH_RETURN() fails in radeon_cp_cmdbuf(). There does seem to be an inconsistency.
(In reply to comment #13) > (In reply to comment #12) > > Using libdrm-2.3.0, mesa-6.5.2 and xserver-1.3 -> DRI works > > What about the problem you reported originally? I haven't tried to reproduce it again, but I would think it is a symptom of the locking bug. > > > Nothing in the locking code in libdrm/xf86drm.h (DRM_CAS etc) has changed from > > libdrm 2.3.0 to 2.4.4. > > Is that using the same kernel / DRM modules? If so, you should be able to > isolate the userspace change which broke things for you. Yes, same kernel and DRM modules. I have tested the following configurations. All fail the same way. Holding constant: kernel-2.6.29_rc2 (and DRM modules) xorg-server-1.5.3-r1 mesa-7.2 Variations: libdrm-2.4.4 xf86-video-ati-6.10.0 libdrm-2.4.4 xf86-video-ati-6.8.0-r1 libdrm-2.3.1 xf86-video-ati-6.10.0 libdrm-2.3.1 xf86-video-ati-6.7.197 Including so much code from libdrm statically into Mesa and X.Org server make finding this bug difficult.
I retested with a Radeon 9800 AGP (forced into PCI mode) and can reproduce with the following versions (glxgears fails with drmRadeonCmdBuf: -22) linux kernel-2.6.29_rc2 xserver-1.5.3 mesa-7.3 libdrm-2.4.4 xf86-video-ati-6.8.0 or xf86-video-ati-6.9.0. Using 6.10.0, X, hal, and sshd all fail with out of memory errors. (I did not see this running the 9100 PCI card)
Created attachment 22419 [details] output of `dmesg` using 9800 with 6.10.0 (shows out of memory errors)
Created attachment 22420 [details] Xorg.0.log using 9800 with 6.10.0 (shows where X stops loading)
Created attachment 22506 [details] Logs for Rage128+DRM The attached file contains the xorg log output and dmesg output (with and without debug=1 for drm) for my Rage 128. 01:05.0 VGA compatible controller: ATI Technologies Inc Rage 128 Pro Ultra TF (prog-if 00 [VGA controller]) Subsystem: PC Partner Limited Device 7106 Flags: bus master, stepping, 66MHz, medium devsel, latency 255, IRQ 5 Memory at fc000000 (32-bit, prefetchable) [size=64M] I/O ports at 8000 [size=256] Memory at f9000000 (32-bit, non-prefetchable) [size=16K] Expansion ROM at fa000000 [disabled] [size=128K] Capabilities: [50] AGP version 2.0 Capabilities: [5c] Power Management version 2 # uname -sr Linux 2.6.29-rc2 (+patches) Xorg is v1.5.3
I retested xserver-1.3, mesa-6.5.2, libdrm-2.3 with 2.6.29_rc2. It fails the same way. Now I don't have any idea where the problem is. Going to try an older kernel.
Tried with 2.6.26, same thing. Also, I ran the 'lock' test program included with libdrm. Got the following output for both 2.6.29_rc2 and 2.6.26 kernels. lt-lock: Unlocking unlocked lock succeeded: Invalid argument I think we're finally on the right track.
Created attachment 22697 [details] [review] First try at fixing DRM_CAS on Alpha First try at fixing DRM_CAS on Alpha. Whereas before dmesg showed stuff like > [drm:radeon_cp_cmdbuf] *ERROR* radeon_cp_cmdbuf called without lock held, held 0 owner fffffc00f2a57c00 fffffc00f2a57c00 glxgears would die with -22, with the patch, I only get in dmesg > [drm:radeon_cp_init] *ERROR* radeon_cp_init called without lock held, held 0 owner (null) fffffc00e8add980 and glxgears runs (albeit software rendered). Looks like I've got a corner case to fix with the patch. Tobias, could you test the patch? To do so, patch and build libdrm and rebuild mesa, xorg-server and your video driver.
Created attachment 22898 [details] [review] alpha locking fix I see two problems with alpha DRM_CAS() implementation: - it doesn't retry on the lock contention; - the return value is stored on the stack (ouch...), which may produce "interesting" results depending on compiler and usage of this macro. This patch makes DRM_CAS() on alpha to behave the same way as on x86 or powerpc. Also it's better to define DRM_CAS_RESULT() as "long" - this eliminates extra instruction for sign extension. Matt says that the patch works for him. Ivan.
Applied, thanks!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.