Summary: | radeon-rewrite: DirectX 8 SDK samples crash the X server with KMS enabled | ||
---|---|---|---|
Product: | Mesa | Reporter: | Stefan Dösinger <stefandoesinger> |
Component: | Drivers/DRI/r200 | Assignee: | Default DRI bug account <dri-devel> |
Status: | NEW --- | QA Contact: | |
Severity: | normal | ||
Priority: | medium | ||
Version: | unspecified | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Don't crash if pPriv is NULL
Different crash Another backtrace |
Description
Stefan Dösinger
2009-05-20 15:06:12 UTC
Git revisions used: Mesa: a13e96359baaa0331561f86ef6487feba6540464 DDX: 48156758ec2c406c28b52b3cd65e77f29d98f79b(Arlied's KMS ddx) DRM: 1edb70f1b909d06f1c0ee7c9c794aec99454e488 Kernel: ed70a54c74f78f426497654676dda7ac53055d13(airlied's KMS kernel) Could this be a duplicate of the radeonRefillCurrentDmaRegion bug? (Or wherever that crash occurred?) I cannot find the bug to check the details, but I remember it from reading the(long deleted) dri-devel mails. I talked to Maciej Cencora on IRC, and he suggested that this is not a dupe of 21582 because I am not using Compiz. Wine uses AIGLX in some cases of Child Window OpenGL rendering, but this is not the case in the crashing samples. Ironically, mfcfog, the only working sample does use child window GL rendering and AIGLX(it did not work prior to radeon-rewrite or without KMS because GL redirecting or whatever it is called is missing) Another update - I recompiled Mesa with -O0 -g and the crash seemed to go away. I will see if the default compile flags bring the crash back. If it does I guess there's some uninitialized variable somewhere. Yup, recompiled with standard CFLAGs and the crash is back(ie, just run configure without specifying any CFLAGs) I tried to retest this with jglisse's DDX and kernel, but bug #21851 crossed my way. (In reply to comment #5) > Yup, recompiled with standard CFLAGs and the crash is back(ie, just run > configure without specifying any CFLAGs) What are the compiler flags it ends up using in that case? The problem persists after a kernel and DDX update. I had to disable AGP in the kernel module to make the X server start though.(bug #21851) I'll get you the compile flags in a second. This is a compiler invocation for one of the source files with the default compile flags: gcc -c -I. -I../../../../../src/mesa/drivers/dri/common -Iserver -I../../../../../include -I../../../../../src/mesa -I../../../../../src/egl/main -I../../../../../src/egl/drivers/dri -I/usr/local/include -I/usr/local/include/drm -g -O2 -Wall -Wmissing-prototypes -std=c99 -ffast-math -fno-strict-aliasing -fPIC -DUSE_X86_ASM -DUSE_MMX_ASM -DUSE_3DNOW_ASM -DUSE_SSE_ASM -D_GNU_SOURCE -DPTHREADS -DHAVE_POSIX_MEMALIGN -DUSE_EXTERNAL_DXTN_LIB=1 -DIN_DRI_DRIVER -DGLX_DIRECT_RENDERING -DGLX_INDIRECT_RENDERING -DHAVE_ALIAS -DHAVE_LIBDRM_RADEON=1 -I/usr/local/include -I/usr/local/include/drm -DRADEON_COMMON=1 -DRADEON_COMMON_FOR_R200 r200_pixel.c -o r200_pixel.o The relevant flags are I think "-g -O2" Can you get a backtrace of the X server crash? Preferably with gdb, but even just a log file with a backtrace might be a start. Hmm. When I run the X server in gdb it never crashes. Even more things work. The clear color issue(21569) is gone too. I get a broken pipe signal every now and then, but that happens when I terminate an app, and if I just enter 'c' the server continues just fine. Here's a backtrace though(pretty helpless because my server was compiled without debug symbols. dang) Program received signal SIGPIPE, Broken pipe. 0xb7ee6424 in __kernel_vsyscall () (gdb) bt #0 0xb7ee6424 in __kernel_vsyscall () #1 0xb7b6caa2 in ?? () from /lib/libc.so.6 #2 0x0810cd2d in ?? () #3 0x00000025 in ?? () #4 0xbfd007b4 in ?? () #5 0x00000001 in ?? () #6 0x00000188 in ?? () #7 0x0005b77d in ?? () #8 0x09511800 in ?? () #9 0x08193ff4 in ?? () #10 0x0810bedd in _XSERVTransWritev () #11 0x09bdffd8 in ?? () #12 0xbfd007b4 in ?? () #13 0x00000001 in ?? () #14 0x00001428 in ?? () #15 0x09a4be70 in ?? () #16 0xb7ebd633 in __read_nocancel () from /lib/libpthread.so.0 #17 0x08108ccd in FlushClient () #18 0x09a4c86c in ?? () #19 0x00000025 in ?? () #20 0x09bdffd8 in ?? () #21 0x00000000 in ?? () I'll see if I can get some help in the logs I'm sorry, this issue just seems to have dissolved entirely. I still suspect that there's some uninitialized variable or buffer overrun or something like that. If it returns I'll try to collect more info. And no, I did not change anything since the last time I saw the crash. Ok, I got the crash again out of the blue, without changing anything. I just let the X server sit for a while and ran a SDK sample again and it crashed. The backtrace in the log was completely useless unfortunately. Just one line, magic address somewhere in the X server. I'll recompile the server with debug symbols enabled. I also saw a similar crash on my r500 card where I got a somewhat more useful backtrace, but the host from which I ssh'ed kernel paniced shortly afterwards. (Unrelated. Nvidia binary driver troubles I think) Ok, now I have something useable: Here's a gdb backtrace from the r500 card: Program received signal SIGSEGV, Segmentation fault. DRI2GetBuffers (pDraw=0x9434fa0, width=0xbffeff34, height=0xbffeff30, attachments=0xb70fd05c, count=2, out_count=0xbffeff2c) at dri2.c:143 143 if (pPriv->buffers == NULL || (gdb) bt #0 DRI2GetBuffers (pDraw=0x9434fa0, width=0xbffeff34, height=0xbffeff30, attachments=0xb70fd05c, count=2, out_count=0xbffeff2c) at dri2.c:143 #1 0xb7fc878f in ProcDRI2Dispatch (client=0x9432f60) at dri2ext.c:212 #2 0x0808ceaf in Dispatch () at dispatch.c:437 #3 0x08071c1d in main (argc=5, argv=0xbfff0084, envp=0x93cd160) at main.c:397 (gdb) X server: 5cd5a01259ba349f1868ca4af04207cf120d69e4 DDX: fba534017e581fcd9b9e49ba0b281fb500f576a7 Mesa: d7cc0eb47930d6c8ebfd18fefbe48fe8eec696a0 kernel: 9bf2b46874ff4b59f79ba4d2984727ddf056496d libdrm: 1edb70f1b909d06f1c0ee7c9c794aec99454e488 I'll see if I can get useful info like this from the r250 card too. This one is from the r250: It does not look that nice because the X server and libdri2 are distro-provided, but the crash seesm to be the same: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xb7af0b00 (LWP 13286)] 0xb7f16dc1 in DRI2GetBuffers () from /usr/lib/xorg/modules/extensions//libdri2.so (gdb) bt #0 0xb7f16dc1 in DRI2GetBuffers () from /usr/lib/xorg/modules/extensions//libdri2.so #1 0xb7f1797b in ?? () from /usr/lib/xorg/modules/extensions//libdri2.so #2 0x09a93128 in ?? () #3 0xbfc46398 in ?? () #4 0xbfc46394 in ?? () #5 0x09a93bd4 in ?? () #6 0x00000002 in ?? () #7 0xbfc46390 in ?? () #8 0xbfc46378 in ?? () #9 0x081652f9 in Xfree () #10 0xb7f17c75 in ?? () from /usr/lib/xorg/modules/extensions//libdri2.so #11 0x09a17048 in ?? () #12 0x0939fc48 in ?? () #13 0x01000000 in ?? () #14 0x00000000 in ?? () Mesa: d7cc0eb47930d6c8ebfd18fefbe48fe8eec696a0 DDX: fba534017e581fcd9b9e49ba0b281fb500f576a7 libdrm: 1edb70f1b909d06f1c0ee7c9c794aec99454e488 kernel: 422756249a1717608618d6c940538ac8bdd851e8(HEAD - 1) server: xorg-server-1.6.1.901-r1 from gentoo (In reply to comment #11) > Hmm. When I run the X server in gdb it never crashes. Even more things work. > The clear color issue(21569) is gone too. Is it loading the driver you think it is? grep _dri /var/log/Xorg.0.log Also, when testing something with direct rendering you may want to set LIBGL_DEBUG=verbose to verify libGL is loading the right driver. > I get a broken pipe signal every now and then, but that happens when I > terminate an app, and if I just enter 'c' the server continues just fine. Right, that's normal as part of the X server operation. Use handle SIGPIPE nostop noprint to make gdb ignore it. (In reply to comment #12) > I still suspect that there's some uninitialized variable or buffer overrun or > something like that. Running the X server in valgrind might be interesting if so. (In reply to comment #14) > Program received signal SIGSEGV, Segmentation fault. > DRI2GetBuffers (pDraw=0x9434fa0, width=0xbffeff34, height=0xbffeff30, > attachments=0xb70fd05c, count=2, out_count=0xbffeff2c) at dri2.c:143 > 143 if (pPriv->buffers == NULL || Is pPriv NULL? (Try 'bt full' next time) > X server: 5cd5a01259ba349f1868ca4af04207cf120d69e4 Apparently you're using server-1.6-branch. Can you try master or at least cherry-picking the DRI2 changes nominated on http://wiki.x.org/wiki/Server16Branch ? Ok, I switched to the X server master branch(and dri2proto master), and so far no crashes. If they show up again I'll attach the requested info. So far I only updated my r500 box, I'll play with the r250 card later. Ok, got the crash again. Similar user-visible symptoms, slightly different backtrace(I guess the code changed): Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xb7c006b0 (LWP 21043)] 0xb80b5056 in do_get_buffers (pDraw=0xa254668, width=0xbf9eb920, height=0xbf9eb924, attachments=0xa24b1f4, count=1, out_count=0xbf9eb928, has_format=1) at dri2.c:190 190 && (pDraw->height == pPriv->height); (gdb) bt full #0 0xb80b5056 in do_get_buffers (pDraw=0xa254668, width=0xbf9eb920, height=0xbf9eb924, attachments=0xa24b1f4, count=1, out_count=0xbf9eb928, has_format=1) at dri2.c:190 ds = (DRI2ScreenPtr) 0xa19b130 pPriv = (DRI2DrawablePtr) 0x0 buffers = <value optimized out> need_real_front = <value optimized out> need_fake_front = <value optimized out> have_fake_front = <value optimized out> front_format = <value optimized out> dimensions_match = 0 i = <value optimized out> #1 0xb80b5d50 in ProcDRI2Dispatch (client=0xa240128) at dri2ext.c:280 stuff = <value optimized out> #2 0x08090b8f in Dispatch () at dispatch.c:432 result = <value optimized out> client = (ClientPtr) 0xa240128 nready = 0 start_tick = 280 #3 0x0806932d in main (argc=5, argv=0xbf9eba74, envp=0x21000e) at main.c:283 i = 1 alwaysCheckForInput = {0, 1} (gdb) The dri driver is the correct one: libGL: OpenDriver: trying /opt/gfx-test/lib/dri/r300_dri.so (II) AIGLX: Loaded and initialized /opt/gfx-test/lib/dri/r300_dri.so Created attachment 26128 [details] [review] Don't crash if pPriv is NULL Okay, so it looks like the client side calls DRI2GetBuffers without previous DRI2CreateDrawable. This can't work, however the server shouldn't crash... Does this patch fix it? Created attachment 26130 [details]
Different crash
With this patch applied I get different crashes at two different places. I'll attach the backtraces as files because they are longer this time.
There's one speciality in WineD3D that may be related to the crashes: When WineD3D loads, it creates a GL context and attaches it to a hidden X11 Window. This context is used to read the OpenGL extension string, OpenGL limits and other GL characteristics. No rendering is done(that's undefined on a hidden window), but we do try to load some textures to test PBO and FBO functionality to detect a few known bugs on some implementations. We have to do this because we have to be able to give information about our capabilities before a Direct3D device is created(D3D device ~ GL context). I'll make a Wine log to see how far the WineD3D init gets.
I don't know the code, but it seems to me that the patch just tries to hide the crash rather than addressing the core issue.
Created attachment 26131 [details]
Another backtrace
Looks like the patch fixed the xserver crash, now the Mesa code needs to be fixed. Not sure if we should keep tracking that here or in a separate report. BTW, are you intentionally using indirect rendering? With direct rendering it looks like only the client should crash with this patch. No, I am not using indirect rendering intentionally. I just noticed that I forgot to set LD_LIBRARY_PATH, so I was using fglrx's libGL. Doh! Anyway, my debugging suggests that the problem does occur during the GL context init, either in winex11.drv or wined3d.dll. In both cases we're creating a GL context on a hidden window to figure out what we can advertise to the Windows app(winex11: OpenGL supported at all? Which extension functions to load? WineD3D: Which d3d features can we advertise to the D3D app?) To clarify: I forgot to set LD_LIBRARY_PATH in that specific shell instance where I recorded the backtraces, the other times I am pretty sure I set this. I did not get another X server crash with direct rendering + your patch, in fact I did not even get a client app crash. (In reply to comment #24) > I did not get another X server crash with direct rendering + your patch, in > fact I did not even get a client app crash. Hmm, so maybe there's also something wrong in xserver/glx/glxdri2.c or so... |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.