Bug 93004 - Guild Wars 2 crash on nouveau DX11 cards
Summary: Guild Wars 2 crash on nouveau DX11 cards
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/nouveau (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Nouveau Project
QA Contact: Nouveau Project
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-11-19 07:59 UTC by Patrick Rudolph
Modified: 2015-12-09 20:51 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Description Patrick Rudolph 2015-11-19 07:59:00 UTC
Guild Wars 2 32bit crashes on DirectX 11 capable Nvidia cards using OpenGL and Gallium Nine state tracker.
It doesn't crash on BARTS (AMD HD6850) and it doesn't crash on NV84 (Nvidia Geforce 8600GT).

Bug reports:
https://bugs.winehq.org/show_bug.cgi?id=34342
https://github.com/iXit/Mesa-3D/issues/153

While this is referred as "Out of Memory" my guess is that the GuildWars2 crash reporter show wrong memory usage. Don't focus on that.

Using d3dretrace and valgrind I got this:
https://github.com/iXit/Mesa-3D/issues/153#issuecomment-157977989
Comment 1 Ilia Mirkin 2015-11-19 16:59:03 UTC
Where is the trace? Could you educate me on how to replay it myself?
Comment 2 Patrick Rudolph 2015-11-19 19:02:05 UTC
Here's the trace (86MByte):
https://drive.google.com/file/d/0ByOfJQh38LRvSHk5YjgwdzVuRWc/view?usp=sharing

You need apitrace:
http://people.freedesktop.org/~jrfonseca/apitrace/

Run it using Gallium nine enabled wine:
wine ./apitrace-msvc/x86/bin/d3dretrace.exe Gw2.trace

To get the valgrind output I used:
valgrind -v --track-origins=yes --leak-check=full --trace-children=yes --vex-iropt-register-updates=allregs-at-mem-access --workaround-gcc296-bugs=yes wine ./apitrace-msvc/x86/bin/d3dretrace.exe Gw2.trace
Comment 3 Patrick Rudolph 2015-11-20 07:15:02 UTC
For the crash that occurs in nvc0_clear() I found it is likely a use after free.
It accesses a pipe_resource that has been destroyed, but is still bound.
The correct behaviour of nine would be to call set_vertex_buffers(..., NULL) first and then destroy the resource.
What does OpenGL on vertexbuffer destruction ?
Question is why does it work on other drivers, like R600 ?

For the second crash in nvc0_draw_vbo() I'm still investigating.
Comment 4 Patrick Rudolph 2015-12-01 08:23:04 UTC
For the second crash I found I simple solution:
It crashes in nvc0_vbo.c:static void nvc0_validate_vertex_buffers_shared(struct nvc0_context *nvc0), as buf is NULL:

396:    buf = nv04_resource(vb->buffer);
397:    offset = vb->buffer_offset;
398:    limit = buf->base.width0 - 1;

I'm not sure why it is possible to get to this point with a NULL vertex buffer and a NULL user_buffer. Nine seems to take care and only sets non NULL buffers. For some reason nvc0->num_vtxbufs always includes an additional NULL vb...

I fixed this problem by adding:

396:    buf = nv04_resource(vb->buffer);
        if (!buf) continue;
397:    offset = vb->buffer_offset;
398:    limit = buf->base.width0 - 1;

With this fixed Guild Wars 2 doesn't crash in every pipe->draw_vbo.
I was able to play the game for a few minutes.
Comment 5 Patrick Rudolph 2015-12-03 17:57:34 UTC
For the first crash when calling nvc0_clear(), it crashes here:
in file
nvc_context.c

in function
void nvc0_bufctx_fence(struct nvc0_context *nvc0, struct nouveau_bufctx *bufctx, bool on_flush)

in line
403:    if (res)
404:        nvc0_resource_validate(res, (unsigned)ref->priv_data);

as the pipe_resource res is pointing to has been already freed.
Comment 6 Ilia Mirkin 2015-12-03 18:28:22 UTC
    nvc0->dirty |= NVC0_NEW_ARRAYS;
    nouveau_bufctx_reset(nvc0->bufctx_3d, NVC0_BIND_VTX);

can you stick these 2 lines in under the if (!vb) section of nvc0_set_vertex_buffers and see if that improves things? I need to think about why those aren't already there... perhaps there's a reason. Doubtful though.
Comment 7 Ilia Mirkin 2015-12-09 20:51:28 UTC
Pushed this out as:

commit 432a798cf5c7fab18a3e32d4073840df7d0d37cb
Author: Patrick Rudolph <siro@das-labor.org>
Date:   Sun Dec 6 10:11:59 2015 +0100

    nv50,nvc0: fix use-after-free when vertex buffers are unbound
    
    Always reset the vertex bufctx to make sure there's no pointer to
    an already freed pipe_resource left after unbinding buffers.
    Fixes use after free crash in nvc0_bufctx_fence().
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93004
    Signed-off-by: Patrick Rudolph <siro@das-labor.org>
    [imirkin: simplify nvc0 fix, apply to nv50]
    Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
    Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>

I believe this should fix everything. Not sure why you didn't see issues with a G84... probably gets lucky somehow. Thanks for debugging it and tracking the issue down!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.