21842 – radeon-rewrite: DirectX 8 SDK samples crash the X server with KMS enabled

Bug 21842 - radeon-rewrite: DirectX 8 SDK samples crash the X server with KMS enabled

Summary: radeon-rewrite: DirectX 8 SDK samples crash the X server with KMS enabled

Status:	NEW

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/DRI/r200 (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	Default DRI bug account
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2009-05-20 15:06 UTC by Stefan Dösinger
Modified:	2009-05-23 04:00 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments
Don't crash if pPriv is NULL (730 bytes, patch) 2009-05-22 13:07 UTC, Michel Dänzer	Details \| Splinter Review
Different crash (2.78 KB, text/plain) 2009-05-22 14:14 UTC, Stefan Dösinger	Details
Another backtrace (1.96 KB, text/plain) 2009-05-22 14:14 UTC, Stefan Dösinger	Details
View All

Description Stefan Dösinger 2009-05-20 15:06:12 UTC

When running the DirectX 8 SDK samples with Wine crashes the X server on my r250 card. This happens with Kernel Modesetting enabled. With KMS off the X server does not crash, but there are other issues(different bug report though)

Steps to reproduce:
1) Install Wine(www.winehq.org). I am using the git version, but I think any semi-recent Wine should be ok

2) Download the DX8 sdk from e.g. http://www.darwinbots.com/numsgil/dx81sdk_full.exe . I am not sure if it fully installs with Wine, but it should at least unpack the .exe which gives you all the files you need.

3) Run some samples. You can find them in {sdk_root}/samples/Multimedia/Direct3D/bin, and run them with "wine foo.exe". The Lighting.exe sample is a good one to use, others may require a native d3dxof.dll helper library. If the sample complains that required media files are not found, cd into {sdk_root}/samples/Multimedia/Media and run ../Direct3D/bin/foo.exe.

Pretty much all samples crash. The only working one I found was mfcfog.exe. I tried Lighting, DolphinVS, Text3D, BumpLens, BumpEarth.

On a sidenode, all D3D8 samples should work with Wine(some may need a native d3dxof.dll), so they're pretty helpful to test how well the driver works with Wine and D3D apps.

Comment 1 Stefan Dösinger 2009-05-20 15:09:54 UTC

Git revisions used:

Mesa: a13e96359baaa0331561f86ef6487feba6540464
DDX: 48156758ec2c406c28b52b3cd65e77f29d98f79b(Arlied's KMS ddx)
DRM: 1edb70f1b909d06f1c0ee7c9c794aec99454e488
Kernel: ed70a54c74f78f426497654676dda7ac53055d13(airlied's KMS kernel)

Comment 2 Stefan Dösinger 2009-05-20 15:13:06 UTC

Could this be a duplicate of the radeonRefillCurrentDmaRegion bug? (Or wherever that crash occurred?) I cannot find the bug to check the details, but I remember it from reading the(long deleted) dri-devel mails.

Comment 3 Stefan Dösinger 2009-05-20 15:31:30 UTC

I talked to Maciej Cencora on IRC, and he suggested that this is not a dupe of 21582 because I am not using Compiz.

Wine uses AIGLX in some cases of Child Window OpenGL rendering, but this is not the case in the crashing samples. Ironically, mfcfog, the only working sample does use child window GL rendering and AIGLX(it did not work prior to radeon-rewrite or without KMS because GL redirecting or whatever it is called is missing)

Comment 4 Stefan Dösinger 2009-05-20 15:42:04 UTC

Another update - I recompiled Mesa with -O0 -g and the crash seemed to go away. I will see if the default compile flags bring the crash back.

If it does I guess there's some uninitialized variable somewhere.

Comment 5 Stefan Dösinger 2009-05-20 16:07:18 UTC

Yup, recompiled with standard CFLAGs and the crash is back(ie, just run configure without specifying any CFLAGs)

Comment 6 Stefan Dösinger 2009-05-21 05:58:48 UTC

I tried to retest this with jglisse's DDX and kernel, but bug #21851 crossed my way.

Comment 7 Michel Dänzer 2009-05-22 02:25:22 UTC

(In reply to comment #5)
> Yup, recompiled with standard CFLAGs and the crash is back(ie, just run
> configure without specifying any CFLAGs)

What are the compiler flags it ends up using in that case?

Comment 8 Stefan Dösinger 2009-05-22 03:10:30 UTC

The problem persists after a kernel and DDX update. I had to disable AGP in the kernel module to make the X server start though.(bug #21851)

I'll get you the compile flags in a second.

Comment 9 Stefan Dösinger 2009-05-22 03:12:31 UTC

This is a compiler invocation for one of the source files with the default compile flags:

gcc -c -I. -I../../../../../src/mesa/drivers/dri/common -Iserver -I../../../../../include -I../../../../../src/mesa -I../../../../../src/egl/main -I../../../../../src/egl/drivers/dri -I/usr/local/include -I/usr/local/include/drm    -g -O2 -Wall -Wmissing-prototypes -std=c99 -ffast-math -fno-strict-aliasing  -fPIC  -DUSE_X86_ASM -DUSE_MMX_ASM -DUSE_3DNOW_ASM -DUSE_SSE_ASM -D_GNU_SOURCE -DPTHREADS -DHAVE_POSIX_MEMALIGN -DUSE_EXTERNAL_DXTN_LIB=1 -DIN_DRI_DRIVER -DGLX_DIRECT_RENDERING -DGLX_INDIRECT_RENDERING -DHAVE_ALIAS -DHAVE_LIBDRM_RADEON=1 -I/usr/local/include -I/usr/local/include/drm   -DRADEON_COMMON=1 -DRADEON_COMMON_FOR_R200 r200_pixel.c -o r200_pixel.o

The relevant flags are I think "-g -O2"

Comment 10 Michel Dänzer 2009-05-22 03:33:27 UTC

Can you get a backtrace of the X server crash? Preferably with gdb, but even just a log file with a backtrace might be a start.

Comment 11 Stefan Dösinger 2009-05-22 04:21:18 UTC

Hmm. When I run the X server in gdb it never crashes. Even more things work. The clear color issue(21569) is gone too.

I get a broken pipe signal every now and then, but that happens when I terminate an app, and if I just enter 'c' the server continues just fine. Here's a backtrace though(pretty helpless because my server was compiled without debug symbols. dang)

Program received signal SIGPIPE, Broken pipe.
0xb7ee6424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7ee6424 in __kernel_vsyscall ()
#1  0xb7b6caa2 in ?? () from /lib/libc.so.6
#2  0x0810cd2d in ?? ()
#3  0x00000025 in ?? ()
#4  0xbfd007b4 in ?? ()
#5  0x00000001 in ?? ()
#6  0x00000188 in ?? ()
#7  0x0005b77d in ?? ()
#8  0x09511800 in ?? ()
#9  0x08193ff4 in ?? ()
#10 0x0810bedd in _XSERVTransWritev ()
#11 0x09bdffd8 in ?? ()
#12 0xbfd007b4 in ?? ()
#13 0x00000001 in ?? ()
#14 0x00001428 in ?? ()
#15 0x09a4be70 in ?? ()
#16 0xb7ebd633 in __read_nocancel () from /lib/libpthread.so.0
#17 0x08108ccd in FlushClient ()
#18 0x09a4c86c in ?? ()
#19 0x00000025 in ?? ()
#20 0x09bdffd8 in ?? ()
#21 0x00000000 in ?? ()

I'll see if I can get some help in the logs

Comment 12 Stefan Dösinger 2009-05-22 04:33:36 UTC

I'm sorry, this issue just seems to have dissolved entirely. I still suspect that there's some uninitialized variable or buffer overrun or something like that. If it returns I'll try to collect more info.

And no, I did not change anything since the last time I saw the crash.

Comment 13 Stefan Dösinger 2009-05-22 04:50:19 UTC

Ok, I got the crash again out of the blue, without changing anything. I just let the X server sit for a while and ran a SDK sample again and it crashed. The backtrace in the log was completely useless unfortunately. Just one line, magic address somewhere in the X server.

I'll recompile the server with debug symbols enabled. I also saw a similar crash on my r500 card where I got a somewhat more useful backtrace, but the host from which I ssh'ed kernel paniced shortly afterwards. (Unrelated. Nvidia binary driver troubles I think)

Comment 14 Stefan Dösinger 2009-05-22 05:10:10 UTC

Ok, now I have something useable:

Here's a gdb backtrace from the r500 card:
Program received signal SIGSEGV, Segmentation fault.
DRI2GetBuffers (pDraw=0x9434fa0, width=0xbffeff34, height=0xbffeff30,
    attachments=0xb70fd05c, count=2, out_count=0xbffeff2c) at dri2.c:143
143         if (pPriv->buffers == NULL ||
(gdb) bt
#0  DRI2GetBuffers (pDraw=0x9434fa0, width=0xbffeff34, height=0xbffeff30,
    attachments=0xb70fd05c, count=2, out_count=0xbffeff2c) at dri2.c:143
#1  0xb7fc878f in ProcDRI2Dispatch (client=0x9432f60) at dri2ext.c:212
#2  0x0808ceaf in Dispatch () at dispatch.c:437
#3  0x08071c1d in main (argc=5, argv=0xbfff0084, envp=0x93cd160) at main.c:397
(gdb)

X server: 5cd5a01259ba349f1868ca4af04207cf120d69e4
DDX: fba534017e581fcd9b9e49ba0b281fb500f576a7
Mesa: d7cc0eb47930d6c8ebfd18fefbe48fe8eec696a0
kernel: 9bf2b46874ff4b59f79ba4d2984727ddf056496d
libdrm: 1edb70f1b909d06f1c0ee7c9c794aec99454e488

I'll see if I can get useful info like this from the r250 card too.

Comment 15 Stefan Dösinger 2009-05-22 05:14:45 UTC

This one is from the r250: It does not look that nice because the X server and libdri2 are distro-provided, but the crash seesm to be the same:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7af0b00 (LWP 13286)]
0xb7f16dc1 in DRI2GetBuffers () from /usr/lib/xorg/modules/extensions//libdri2.so
(gdb) bt
#0  0xb7f16dc1 in DRI2GetBuffers () from /usr/lib/xorg/modules/extensions//libdri2.so
#1  0xb7f1797b in ?? () from /usr/lib/xorg/modules/extensions//libdri2.so
#2  0x09a93128 in ?? ()
#3  0xbfc46398 in ?? ()
#4  0xbfc46394 in ?? ()
#5  0x09a93bd4 in ?? ()
#6  0x00000002 in ?? ()
#7  0xbfc46390 in ?? ()
#8  0xbfc46378 in ?? ()
#9  0x081652f9 in Xfree ()
#10 0xb7f17c75 in ?? () from /usr/lib/xorg/modules/extensions//libdri2.so
#11 0x09a17048 in ?? ()
#12 0x0939fc48 in ?? ()
#13 0x01000000 in ?? ()
#14 0x00000000 in ?? ()

Mesa: d7cc0eb47930d6c8ebfd18fefbe48fe8eec696a0
DDX: fba534017e581fcd9b9e49ba0b281fb500f576a7
libdrm: 1edb70f1b909d06f1c0ee7c9c794aec99454e488
kernel: 422756249a1717608618d6c940538ac8bdd851e8(HEAD - 1)
server: xorg-server-1.6.1.901-r1 from gentoo

Comment 16 Michel Dänzer 2009-05-22 05:45:05 UTC

(In reply to comment #11)
> Hmm. When I run the X server in gdb it never crashes. Even more things work.
> The clear color issue(21569) is gone too.

Is it loading the driver you think it is?

grep _dri /var/log/Xorg.0.log

Also, when testing something with direct rendering you may want to set LIBGL_DEBUG=verbose to verify libGL is loading the right driver.

> I get a broken pipe signal every now and then, but that happens when I
> terminate an app, and if I just enter 'c' the server continues just fine.

Right, that's normal as part of the X server operation. Use

handle SIGPIPE nostop noprint

to make gdb ignore it.


(In reply to comment #12)
> I still suspect that there's some uninitialized variable or buffer overrun or
> something like that.

Running the X server in valgrind might be interesting if so.


(In reply to comment #14)
> Program received signal SIGSEGV, Segmentation fault.
> DRI2GetBuffers (pDraw=0x9434fa0, width=0xbffeff34, height=0xbffeff30,
>     attachments=0xb70fd05c, count=2, out_count=0xbffeff2c) at dri2.c:143
> 143         if (pPriv->buffers == NULL ||

Is pPriv NULL? (Try 'bt full' next time)


> X server: 5cd5a01259ba349f1868ca4af04207cf120d69e4

Apparently you're using server-1.6-branch. Can you try master or at least cherry-picking the DRI2 changes nominated on http://wiki.x.org/wiki/Server16Branch ?

Comment 17 Stefan Dösinger 2009-05-22 07:52:26 UTC

Ok, I switched to the X server master branch(and dri2proto master), and so far no crashes. If they show up again I'll attach the requested info.

So far I only updated my r500 box, I'll play with the r250 card later.

Comment 18 Stefan Dösinger 2009-05-22 08:07:41 UTC

Ok, got the crash again. Similar user-visible symptoms, slightly different backtrace(I guess the code changed):

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7c006b0 (LWP 21043)]
0xb80b5056 in do_get_buffers (pDraw=0xa254668, width=0xbf9eb920, height=0xbf9eb924,
    attachments=0xa24b1f4, count=1, out_count=0xbf9eb928, has_format=1) at dri2.c:190
190             && (pDraw->height == pPriv->height);
(gdb) bt full
#0  0xb80b5056 in do_get_buffers (pDraw=0xa254668, width=0xbf9eb920,
    height=0xbf9eb924, attachments=0xa24b1f4, count=1, out_count=0xbf9eb928,
    has_format=1) at dri2.c:190
        ds = (DRI2ScreenPtr) 0xa19b130
        pPriv = (DRI2DrawablePtr) 0x0
        buffers = <value optimized out>
        need_real_front = <value optimized out>
        need_fake_front = <value optimized out>
        have_fake_front = <value optimized out>
        front_format = <value optimized out>
        dimensions_match = 0
        i = <value optimized out>
#1  0xb80b5d50 in ProcDRI2Dispatch (client=0xa240128) at dri2ext.c:280
        stuff = <value optimized out>
#2  0x08090b8f in Dispatch () at dispatch.c:432
        result = <value optimized out>
        client = (ClientPtr) 0xa240128
        nready = 0
        start_tick = 280
#3  0x0806932d in main (argc=5, argv=0xbf9eba74, envp=0x21000e) at main.c:283
        i = 1
        alwaysCheckForInput = {0, 1}
(gdb)

The dri driver is the correct one:
libGL: OpenDriver: trying /opt/gfx-test/lib/dri/r300_dri.so
(II) AIGLX: Loaded and initialized /opt/gfx-test/lib/dri/r300_dri.so

Comment 19 Michel Dänzer 2009-05-22 13:07:23 UTC

Created attachment 26128 [details] [review]
Don't crash if pPriv is NULL

Okay, so it looks like the client side calls DRI2GetBuffers without previous DRI2CreateDrawable. This can't work, however the server shouldn't crash... Does this patch fix it?

Comment 20 Stefan Dösinger 2009-05-22 14:14:23 UTC

Created attachment 26130 [details]
Different crash

With this patch applied I get different crashes at two different places. I'll attach the backtraces as files because they are longer this time.

There's one speciality in WineD3D that may be related to the crashes: When WineD3D loads, it creates a GL context and attaches it to a hidden X11 Window. This context is used to read the OpenGL extension string, OpenGL limits and other GL characteristics. No rendering is done(that's undefined on a hidden window), but we do try to load some textures to test PBO and FBO functionality to detect a few known bugs on some implementations. We have to do this because we have to be able to give information about our capabilities before a Direct3D device is created(D3D device ~ GL context). I'll make a Wine log to see how far the WineD3D init gets.

I don't know the code, but it seems to me that the patch just tries to hide the crash rather than addressing the core issue.

Comment 21 Stefan Dösinger 2009-05-22 14:14:56 UTC

Created attachment 26131 [details]
Another backtrace

Comment 22 Michel Dänzer 2009-05-22 14:25:14 UTC

Looks like the patch fixed the xserver crash, now the Mesa code needs to be fixed. Not sure if we should keep tracking that here or in a separate report.

BTW, are you intentionally using indirect rendering? With direct rendering it looks like only the client should crash with this patch.

Comment 23 Stefan Dösinger 2009-05-22 14:46:42 UTC

No, I am not using indirect rendering intentionally. I just noticed that I forgot to set LD_LIBRARY_PATH, so I was using fglrx's libGL. Doh!

Anyway, my debugging suggests that the problem does occur during the GL context init, either in winex11.drv or wined3d.dll. In both cases we're creating a GL context on a hidden window to figure out what we can advertise to the Windows app(winex11: OpenGL supported at all? Which extension functions to load? WineD3D: Which d3d features can we advertise to the D3D app?)

Comment 24 Stefan Dösinger 2009-05-22 16:35:43 UTC

To clarify: I forgot to set LD_LIBRARY_PATH in that specific shell instance where I recorded the backtraces, the other times I am pretty sure I set this.

I did not get another X server crash with direct rendering + your patch, in fact I did not even get a client app crash.

Comment 25 Michel Dänzer 2009-05-23 04:00:05 UTC

(In reply to comment #24)
> I did not get another X server crash with direct rendering + your patch, in
> fact I did not even get a client app crash.

Hmm, so maybe there's also something wrong in xserver/glx/glxdri2.c or so...

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.