Summary: | drmCommandNone( fd, DRM_R128_CCE_IDLE ) - gives errno 22 | ||
---|---|---|---|
Product: | Mesa | Reporter: | Miroslav Šustek <sustmidown> |
Component: | Drivers/DRI/r128 | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED WONTFIX | QA Contact: | |
Severity: | normal | ||
Priority: | high | CC: | alexdeucher, jhansonxi, marauder, peter, whysogreen |
Version: | git | ||
Hardware: | x86 (IA32) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
patch to disable span locking
log showing two consecutive locking attempts patch to fix various span locking issues span locking patch patch to add support for projective textures |
Description
Miroslav Šustek
2006-12-17 11:07:13 UTC
I can confirm this bug, Paintball2_alpha016 exit during loading first map. in dmseg: [drm] Initialized drm 1.1.0 20060810 ACPI: PCI Interrupt 0000:01:00.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11 [drm] Initialized r128 2.5.0 20030725 on minor 0 agpgart: Found an AGP 1.0 compliant device at 0000:00:00.0. agpgart: Putting AGP V2 device at 0000:00:00.0 into 2x mode agpgart: Putting AGP V2 device at 0000:01:00.0 into 2x mode [drm:r128_cce_idle] *ERROR* r128_cce_idle called without lock held, held 0 owne r c947c5a0 c947c5a0 And i'm not sure related this with kpovmodeler lockup? I can see program window, but after all UI elements rendered - whole X freeze. [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock SysRq : Terminate All Tasks kde 3.5.5, X.org 7.1.1 , Mesa/drm from git (22/12/2006). (also with Mesa 6.5.2/kernel 2.6.17.14 drm) AlphaPC 164SX/UP2000 XOrg-7.1 Mesa-6.5 and Mesa-6.5.2 Kernel-2.6.13.1 [alpha linux] Radeon 7500 PCI Have the same error with GLCORE/GLX enabled... GL screensavers generate this same error 100% of the time, along with ANY other applications using DRI/OpenGL: Jan 28 20:46:18 jericho kernel: [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock Created attachment 8627 [details] [review] patch to disable span locking I found similar problem in R300 drivers and I also found patch which should solve it (but I don´t know if its real solution). So I rewrote this patch for r128 with little change - disabling span locking by default. Set R128_ENABLE_SPAN_LOCKING if you want to temporarily disable this patch. Edit:
> Set R128_SPAN_ENABLE_LOCKING if you want to temporarily disable this patch.
Patch from comment #3 works fine! No more lockups with kpovmodeler, paintball2 nearly hangs, but system is responsible, i can switch to another VTs and kill game. More or less, it work! (today Mesa git + patch, and 2-3 days old drm git, kernel 2.6.19.7 X.org 7.2 xf86-video-ati from 22 feb 2007 git ) HW: p2-420Mhz/416 Mb RAM/ video-HW: (--) R128(0): Chipset: "ATI Rage 128 Pro GL PF (AGP)" (ChipID = 0x5046) (--) R128(0): VideoRAM: 16384 kByte (64-bit SDR SGRAM 1:1) I'm afraid the patch is not acceptable in this form. If anything, it should default to the span locking being enabled, with the possibility of disabling it via an environment variable, as is done in the r300 driver. However, AFAIK the latter hasn't actually needed it in a while (certainly not in my experience), so it would be even better to identify and fix the actual cause of the hangs. It sounds like other parts of the driver may not handle the locking correctly. On my core2duo running Ubuntu Edgy amd64, mesa and dri from git (as of March 2), dual seat i965+r128 (X -sharevts ...), I can reproduce the "r128_cce_idle called without lock held", program exits, thing with armagetron on the r128 X server. peter@tesla:~$ armagetron menu comes up. start a game and... Error: Rage 128 timed out... exiting peter@tesla:~$ echo $? 255 dmesg: [1465378.259327] [drm:r128_cce_idle] *ERROR* r128_cce_idle called without lock held, held 0 owner ffff81003b14e180 ffff81003b14e180 peter@tesla:~$ LIBGL_DEBUG=verbose MESA_DEBUG=1 armagetron libGL: XF86DRIGetClientDriverName: 4.1.0 r128 (screen 0) libGL: OpenDriver: trying /usr/local/lib/dri/r128_dri.so drmOpenDevice: node name is /dev/dri/card0 drmOpenDevice: open result is 5, (OK) drmOpenByBusid: Searching for BusID pci:0000:06:01.0 drmOpenDevice: node name is /dev/dri/card0 drmOpenDevice: open result is 5, (OK) drmOpenByBusid: drmOpenMinor returns 5 drmOpenByBusid: drmGetBusid reports pci:0000:00:02.0 drmOpenDevice: node name is /dev/dri/card1 drmOpenDevice: open result is 5, (OK) drmOpenByBusid: drmOpenMinor returns 5 drmOpenByBusid: drmGetBusid reports pci:0000:06:01.0 Mesa warning: couldn't open libtxc_dxtn.so, software DXTn compression/decompression unavailable libGL error: Can't open configuration file /etc/drirc: No such file or directory. Error: Rage 128 timed out... exiting Segmentation fault peter@tesla:~$ echo $? 139 dmesg: [1465931.617255] [drm:r128_cce_idle] *ERROR* r128_cce_idle called without lock held, held 0 owner ffff81002c9ce0c0 ffff81002c9ce0c0 [1465931.733266] armagetron.real[3733]: segfault at 00002ad9f4b427f9 rip 00002ad9f4b427f9 rsp 00007fffb8ab5948 error 14 It didn't segfault the previous two times, without the debug env vars... My armagetron settings include full mirroring of everything, dual textured plane, and a bunch of stuff that my i965 can handle just fine. It's probably unplayably slow on r128. If anyone tries to reproduce this, but can't, I can send my armagetron config file... BTW, https://bugs.freedesktop.org/show_bug.cgi?id=7994 mentions seeing the same kernel log message. does mesa commit 25f21b5331d27225b1f6b7aaf2c9bf3f32764d91 (see bug 7994) help? > does mesa commit 25f21b5331d27225b1f6b7aaf2c9bf3f32764d91 (see bug 7994) help? Unfortunately no, it doesn't. armagetron still crashes. In System Settings->Display Settings->Detail Settings, if I set "object textures" to "off", it runs fine when I start a game. In game, if I hit escape and change the "object textures" setting, the program crashes right away with stderr of: Error: Rage 128 timed out... exiting kernel log: [drm:r128_cce_idle] *ERROR* r128_cce_idle called without lock held, held 0 owner ffff810052cc0680 ffff810052cc0680 Another problem that's still present (and was mentioned in this bug re: kpovmodeler) is the lockup of the whole server until the client program is killed. This is https://bugs.freedesktop.org/show_bug.cgi?id=3493. If you have access to any r128 hardware, progs/demos/engine is the best test case for it. It locks up, with 100% repeatability, before it can even draw its window the first time (the previous contents of that screen area are still visible inside the window frame). And killall engine restores everything to normal. The behaviour of the "engine" lockup is completely unchanged by that commit. It still says: [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock Created attachment 11428 [details]
log showing two consecutive locking attempts
(Sorry for the double post, I'm a first-time bugzilla user.) re: [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock I have attached some heavily formatted log output, which shows drm debug messages at the time of the hang (the messages come from the kernel log, obtained using "modrobe drm debug=1". You can see that after the cce_idle ioctl call, context 3 tries to lock again without unlocking first. I recompiled the Mesa DRI driver with the DEBUG_LOCKING flag set in r128_lock.h, and found out that this double locking behavior occurs in the depth buffer calls in r128_span.c. Basically /* These functions require locking */ +/* #undef HW_LOCK #undef HW_UNLOCK #define HW_LOCK() LOCK_HARDWARE(R128_CONTEXT(ctx)); #define HW_UNLOCK() UNLOCK_HARDWARE(R128_CONTEXT(ctx)); +*/ /* 16-bit depth buffer functions */ fixed the problem for me. Apparently, the lock is taken before these functions get called (AFAICR, r128SpanRenderStart() does the job, but I'm not sure). re: *ERROR* r128_cce_idle called without lock held drmP.h has a LOCK_TEST_WITH_RETURN macro, which returns -EINVAL along this error message when the lock isn't held. Because of this, the infinite loop in r128WaitForIdleLocked exits with ret == -EINVAL, so the actual error message is quite misleading in this case. So, the real error is that r128WaitForIdleLocked got called without a lock held. I modified DEBUG_LOCK in r128_lock.h to print all lock operations to the standard error output, and modified the code to call DEBUG_LOCK on unlocks too. r128WaitForIdleLocked was also replaced with a macro to show where it's called from. It turned out that sometimes r128SpanRenderFinish() is called without calling r128SpanRenderStart() first -- so no lock is held during the call to r128WaitForIdleLocked. I think this may be a problem with the software fallback. Armagetron exhibits this behavior, while gl-117 calls RenderStart and RenderFinish in pairs; lock operations also. Using kernel 2.6.22 with drm and mesa from the freedesktop git. Created attachment 13855 [details] [review] patch to fix various span locking issues Here goes my proposed patch for fixing some locking issues with the r128. I think that most of the problems are caused by some primitives which ended up in the software rasterizer without setting up the software part first. The defined HW_LOCK() and HW_UNLOCK() macros mentioned in comment #11 caused "double locking" ("*ERROR* 3 holds heavyweight lock"), if the correct fallback path was taken, and r128SpanRenderStart was called before reading/writing buffers. Removing these #defines as in the radeon source - leaving the job solely to ...SpanRenderStart and ...SpanRenderFinish - in turn left the software rasterizer without *any* locking for some primitives, which resulted in early application termination (because now eg. r128WaitForIdleLocked could be called without acquiring the locks -- hence the "*ERROR* r128_cce_idle called without lock held"). I added two new fallback flags based on error logs - after removing the #defines, blender crashed shortly after enabling GL_LINE_STIPPLE, and armagetron had some problems with GL_POLYGON_SMOOTH. (armagetron also has GL_LINE_SMOOTH, but it didn't crash when enabled, so I didn't add a separate case for it.) Another issue is caused by projective textures (seen in armagetron). In r128RenderStart, when the routine determines from the vertex format (texture coordinate size) that software rendering should be enabled, it is already "too late" - some of the function pointer changes that _swsetup_Wakeup makes (particularly tnl->Driver.Render.Start, which is changed to _swsetup_RenderStart in _swsetup_Wakeup, which in turn would call r128SpanRenderStart to acquire proper locking) will only take effect (if at all) at the beginning of the next frame. So, once again, no locking would take place for that iteration. I added a check to see if R128_FALLBACK_PROJTEX was actually set in FALLBACK, and a call to run tnl->Driver.Render.Start if needed. This part is commented out, because I'm not sure if this is the right way to handle this; so, for now, projective textures are disabled by this patch. Comment on attachment 13855 [details] [review] patch to fix various span locking issues I tried to run more games after applying the patch, and some of them terminated with a segmentation fault and/or put multiple "r128_cce_* called without lock held" messages into the kernel log. So it's not working properly. I believe I've encountered this same bug on Ubuntu with Armagetron Advanced and Chromium: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-r128/+bug/214442 Created attachment 24949 [details] [review] span locking patch The previous patch left three calls to the software rasterizer without locking (these are called when both hardware accelerated and software rendered primitives are drawn), this one wraps them correctly. Texture projection is still commented out, because not too many OpenGL applications have an option to disable it (if I remember correctly, armagetronad recently had this setting removed), and rendering these kind of primitives in software hits the frame rate quite hard. Since only the first and second texture coordinate components are left in place, leaving it this way (=unhandled) can cause rendering artifacts (some texcoord transformation magic could solve this, but it could only occur at the vertex level, and the texture would still be linearly interpolated pixel-wise). The attached patch applies to git commit dc1153ce83041a397b1d1815db4133ce8c53eaa1 and mesa_7.4-0ubuntu3 source package from the ubuntu jaunty repository, but was tested only with the latter; to test, start a free flight in gl-117, leave the camera in first person mode and fly your plane into the Sun :) Created attachment 27576 [details] [review] patch to add support for projective textures Enables hardware-accelerated rendering of primitives that have vertices with 'q' texture coordinates used in projective texture mapping. It must be applied after attachment 24949 [details] [review]. The patch is generally based on the "ptex hack" found in the S3 Savage sources (which uses the standard vertex format, then shuffles values around to their correct position just before sending the vertices to the DMA buffer), modified to support two sets of texture coordinates; the vertex layout used by the r128, along with the additional vertex format flag needed, were obtained from earlier Mesa/DRI sources (eg. [1][2]). The performed calculations are essentially the same in both cases (savage & r128). Both patches apply against the mesa_7.4-0ubuntu3.1 source package and git commit eb33c0ab8b3594f0b1d58534a13a26e3fb050cff, however only the former has been tested. (ps. I'm unsure which vertex size the SAREA structure should store, but I also couldn't find any piece of code where that member gets referenced, so it may not matter either way.) [1] http://www.koders.com/c/fid0D7829772F2499BEA37AC17768976CD64AD3459C.aspx?s=rhw2#L62 [2] http://www.koders.com/c/fidDF4ACA3257B4F85D79E4E734EA089EC50D1DBB7F.aspx#L79 Comment on attachment 27576 [details] [review] patch to add support for projective textures Well, apparently it *does* matter, or something else is amiss, because after further testing I managed to hardlock the machine and got a few related segfaults in the process. I will try to look into this as time permits. Mass version move, cvs -> git Miroslav Šustek, Do you still experience this issue with newer drivers ? Please check the status of your issue. (In reply to comment #19) > Miroslav Šustek, > Do you still experience this issue with newer drivers ? > Please check the status of your issue. I'm sorry, I no longer have the HW to test it. Please check the status of your issue. Or close this bug. Guys, thank you for all your work here. I also spent good times hacking r128 drivers. Unfortunately, I gave the video card away four years ago, so I can't participate on this bug anymore. Closing. *** drying nostalgic tear *** |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.