Bug 97909 - X-Plane 10 crashes with SIGSEGV on radeonsi
Summary: X-Plane 10 crashes with SIGSEGV on radeonsi
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
: 98492 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-09-23 23:06 UTC by Christian Inci
Modified: 2018-08-31 05:59 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Patch/Hack (605 bytes, patch)
2016-09-23 23:06 UTC, Christian Inci
Details | Splinter Review
gdb backtrace with some previous errors (2.80 KB, text/plain)
2016-09-23 23:07 UTC, Christian Inci
Details
glxinfo output (99.61 KB, text/plain)
2016-09-23 23:08 UTC, Christian Inci
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Christian Inci 2016-09-23 23:06:46 UTC
Created attachment 126749 [details] [review]
Patch/Hack

The SIGSEGV is being raised because ib->buffer is NULL.

I don't know whether my solution is correct or not.

Is it okay to simply return if ib->buffer is NULL, is it okay to run those last lines of that if block if ib->buffer isn't NULL only, is it okay to call si_emit_draw_packets with ib->buffer being NULL, ...?

All I can say is that X-Plane is working fine with this patch/hack.
Comment 1 Christian Inci 2016-09-23 23:07:52 UTC
Created attachment 126750 [details]
gdb backtrace with some previous errors
Comment 2 Christian Inci 2016-09-23 23:08:07 UTC
Created attachment 126751 [details]
glxinfo output
Comment 3 Nicolai Hähnle 2016-09-28 13:59:32 UTC
Hi Christian, that really shouldn't happen. Can you provide an apitrace that shows the problem? My guess is that the index buffer tracking gets into an odd state because of the previous BufferData-related errors, but I don't yet see where that would be exactly.
Comment 4 Christian Inci 2016-10-25 03:59:36 UTC
I'm sorry for the long delay.

Here's the apitrace trace file: http://home.broke-the-inter.net:8082/
If there are any problems with the server, please mail me.
Comment 5 Nicolai Hähnle 2016-11-04 20:20:49 UTC
Okay, so I could reproduce this after all with the web demo.

There is a bug in X-Plane and also questionable behaviour of the driver. The bug in X-Plane is that it uses GL_AMD_pinned_memory with a size that is not a multiple of a page; as per the spec, the driver is allowed to reject that, and we do (apparently unlike the closed source driver...). X-Plane doesn't check this error condition, and continues rendering, hence the crash, which would also happen with a simple sequence of:

  glGenBuffers(1, &bo);
  glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, bo);
  glDrawElements(...);

Somewhat surprisingly, the OpenGL spec never states that a draw call that goes outside the element/index buffer should flag a GL_INVALID_OPERATION. There is also no mention of this in the GL_ARB_robust_buffer_access_behavior extension, which is surprising.

The patch you provide may or may not go in the right direction - I'm not sure. If we want to check that, we should do it in api_validate.c, but I'm not convinced that we should. Meanwhile, that check wouldn't properly fix the issue in X-Plane. To work around the bug in X-Plane, you need to run with:

MESA_EXTENSION_OVERRIDE=-GL_AMD_pinned_memory ./X-Plane-x86_64 --force_run

which will work with an unmodified driver.
Comment 6 Marek Olšák 2016-11-04 21:05:04 UTC
I think we should just drop draw calls with a 0-sized index buffer.

For GL_AMD_pinned_memory, I think we can just map whole pages that intersect the mapped range.
Comment 7 Marek Olšák 2016-11-04 21:07:46 UTC
The pinned_memory mapping gets complicated if 2 memory ranges don't intersect but the pages they touch do intersect.
Comment 8 Joonas Sarajärvi 2017-11-24 21:14:01 UTC
Just as an extra data point, this issue and the workaround from comment #5 seem to apply also to X-Plane 11. Tested with a Radeon RX 560 on Fedora 27 with the stock drivers. Currently this would be:

kernel 4.13.12-300
mesa 17.2.4-2

With the MESA_EXTENSION_OVERRIDE=-GL_AMD_pinned_memory ./X-Plane-x86_64 workaround, the simulator works ok.
Comment 9 Thomas Rohloff 2017-12-29 16:31:30 UTC
(In reply to Nicolai Hähnle from comment #5)
> Okay, so I could reproduce this after all with the web demo.
> 
> There is a bug in X-Plane and also questionable behaviour of the driver. The
> bug in X-Plane is that it uses GL_AMD_pinned_memory with a size that is not
> a multiple of a page; as per the spec, the driver is allowed to reject that,
> and we do (apparently unlike the closed source driver...). X-Plane doesn't
> check this error condition, and continues rendering, hence the crash, which
> would also happen with a simple sequence of:
> 
>   glGenBuffers(1, &bo);
>   glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, bo);
>   glDrawElements(...);
> 
> Somewhat surprisingly, the OpenGL spec never states that a draw call that
> goes outside the element/index buffer should flag a GL_INVALID_OPERATION.
> There is also no mention of this in the GL_ARB_robust_buffer_access_behavior
> extension, which is surprising.
> 
> The patch you provide may or may not go in the right direction - I'm not
> sure. If we want to check that, we should do it in api_validate.c, but I'm
> not convinced that we should. Meanwhile, that check wouldn't properly fix
> the issue in X-Plane. To work around the bug in X-Plane, you need to run
> with:
> 
> MESA_EXTENSION_OVERRIDE=-GL_AMD_pinned_memory ./X-Plane-x86_64 --force_run
> 
> which will work with an unmodified driver.

I opened a bug report at X-Plane and will inform you in case they reply.

BTW: Should I open a new bug report for the r600 bug (see below) ?

Here the message I wrote to the X-Plane devs:

Subject: Bug report: Wrog usage of GL_AMD_pinned_memory leads to undefined result on Mesa drivers

From= v10lator@myway.de
IP= [SNIPPED]
Product= XPlane
Version= 11.11
OS= Linux
Summary= Wrog usage of GL_AMD_pinned_memory leads to undefined behavior on Mesa drivers
Description= "The bug in X-Plane is that it uses GL_AMD_pinned_memory with a size that is not a multiple of a page; as per the spec, the driver is allowed to reject that, and we do (apparently unlike the closed source driver...). X-Plane doesn't check this error condition, and continues rendering, hence the crash" - Source: https://bugs.freedesktop.org/show_bug.cgi?id=97909#c5

Similar things as described in the linked bug report are happening on other Mesa drivers, too. For example see this stacktrace from r600:

[ 1930.559125] general protection fault: 0000 [#1] PREEMPT SMP
[ 1930.559980] Modules linked in: snd_seq_midi snd_usb_audio snd_hwdep snd_usbmidi_lib snd_rawmidi vboxpci(O) vboxnetadp(O) vboxnetflt(O) nfsd vboxdrv(O)
[ 1930.560867] CPU: 2 PID: 646 Comm: kworker/2:2 Tainted: G           O    4.13.0 #9
[ 1930.561771] Hardware name: To be filled by O.E.M. To be filled by O.E.M./SABERTOOTH 990FX R2.0, BIOS 2901 05/04/2016
[ 1930.562657] Workqueue: events radeon_mn_destroy
[ 1930.563588] task: ffffa246244e5b00 task.stack: ffffa5f0409a0000
[ 1930.564509] RIP: 0010:__mutex_lock.isra.1+0x82/0x518
[ 1930.565425] RSP: 0018:ffffa5f0409a3d60 EFLAGS: 00010282
[ 1930.566312] RAX: 800000015e292268 RBX: ffffa2435d99c228 RCX: 800000015e29226f
[ 1930.567238] RDX: 800000015e29226f RSI: ffffa246244e5b00 RDI: ffffffffb2a04c10
[ 1930.568174] RBP: ffffa5f0409a3df0 R08: ffffa2435d99c200 R09: 0000000100200007
[ 1930.569124] R10: ffffa5f0409a3e10 R11: ffffa2462c079ac0 R12: ffffa2463ec9c400
[ 1930.570037] R13: ffffa2435d99f9e8 R14: ffffa2463ec98300 R15: 0000000000000002
[ 1930.570982] FS:  0000000000000000(0000) GS:ffffa2463ec80000(0000) knlGS:0000000000000000
[ 1930.571938] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1930.572894] CR2: 0000000000ed7000 CR3: 00000003c8c51000 CR4: 00000000000406e0
[ 1930.573857] Call Trace:
[ 1930.574783]  ? __slab_free.isra.68+0x7a/0x210
[ 1930.575744]  ? __slab_free.isra.68+0x7a/0x210
[ 1930.576698]  ? radeon_mn_destroy+0x3a/0x188
[ 1930.577650]  ? radeon_mn_destroy+0x3a/0x188
[ 1930.578577]  ? process_one_work+0x151/0x2d0
[ 1930.579515]  ? worker_thread+0x1f0/0x380
[ 1930.580450]  ? kthread+0xf2/0x128
[ 1930.581381]  ? process_one_work+0x2d0/0x2d0
[ 1930.582309]  ? kthread_create_on_node+0x40/0x40
[ 1930.583195]  ? ret_from_fork+0x22/0x30
[ 1930.584100] Code: 85 c0 0f 84 31 02 00 00 65 48 8b 04 25 80 c2 00 00 48 8b 00 a8 08 75 23 e8 dc be 72 ff 49 8b 45 00 48 83 e0 f8 0f 84 3e 02 00 00 <8b> 58 60 e8 ee be 72 ff 85 db 0f 85 33 02 00 00 65 48 8b 04 25 
[ 1930.585083] RIP: __mutex_lock.isra.1+0x82/0x518 RSP: ffffa5f0409a3d60
[ 1930.592739] ---[ end trace 397a922d2c74a9bd ]---
[ 1932.388592] sched: RT throttling activated
[ 1935.978694] note: kworker/2:2[646] exited with preempt_count 1 
Steps= Run X-Plane 11 on Linux with Mesa drivers.
Comment 10 Timothy Arceri 2018-04-12 02:06:20 UTC
*** Bug 98492 has been marked as a duplicate of this bug. ***
Comment 11 Timothy Arceri 2018-04-12 06:05:10 UTC
Is this still and issue for you? I just gave the X-Plane 10 demo a try had no crash. Although I did see artifacts flickering across the screen similar to the apitrace in bug 87059.
Comment 12 Christian Inci 2018-04-12 06:53:41 UTC
Unfortunately I'm not able to test this till three weeks. I'll let you know the result of the test after that, if nothing unexpected happens.

What do you think about closing this bug as WONTFIX? Working around a third-party application bug at the operating system/at a library-level looks like a Microsoft thing to do. (shimming)
Comment 13 Joonas Sarajärvi 2018-04-12 07:45:49 UTC
I can test this week.

In my opinion, leaving this a WONTFIX would be quite unfortunate, because as far as I know, the free AMD GPU drivers are pretty much the only way to run X-Plane reasonably well without proprietary drivers. The more applications a driver can succesfully run, the more useful it is. The more useful the driver, the more likely users will use it and thus the more likely developers will actually bother testing on that driver.

Right now I would expect this driver to be really uncommon among X-Plane users despite it actually being capable of running that program pretty well. If X-Plane did work out of the box, the number could be much greater.

Leaving this as WONTFIX will just contribute to the status quo where most X-Plane users run the program with proprietary drivers.
Comment 14 Amarildo 2018-04-12 11:32:26 UTC
It's a shame they don't officially support OSS drivers despite them running X-Plane better.

All this contributes to driver separation, confusion, and Linux still being regarded as "not ready for quite a few games".

BTW, if you happen to run OSS drivers and wanna try X-Plane, be it v10 or v11, set a launch command (could be on Steam too) to:

--no_pinned

This does get rid of the crashes, which are observable when e.g. clouds are enabled.

Cheers
Comment 15 Joonas Sarajärvi 2018-04-12 16:46:52 UTC
Still crashes for me on Fedora 27.

kernel 4.15.14
mesa 17.3.6
x-plane 11.11r2

The workaround with MESA_EXTENSION_OVERRIDE=-GL_AMD_pinned_memory reported in earlier comments is also still useful.
Comment 16 Joonas Sarajärvi 2018-06-10 17:42:06 UTC
Now that I tried removing the workaround again, it looks like things still work for me. So now at least one working combination that does not trigger this bug looks like this:

- Fedora 28
- kernel 4.16.14-300.fc28.x86_64
- mesa 18.0.2
- x-plane 11.21

Compared to my earlier report, all the bits have changed their versions. Does anyone have any idea on if this was addressed in the driver stack or if it is actually X-plane behaving nicer?
Comment 17 Timothy Arceri 2018-08-31 05:59:14 UTC
(In reply to Joonas Sarajärvi from comment #16)
> Now that I tried removing the workaround again, it looks like things still
> work for me. So now at least one working combination that does not trigger
> this bug looks like this:
> 
> - Fedora 28
> - kernel 4.16.14-300.fc28.x86_64
> - mesa 18.0.2
> - x-plane 11.21
> 
> Compared to my earlier report, all the bits have changed their versions.
> Does anyone have any idea on if this was addressed in the driver stack or if
> it is actually X-plane behaving nicer?

It would seem its something in the driver since I doubt they are updating X-Plane 10 any more since X-Plane 11 has been out for some time. Anyway I'm going to close this as fixed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.