Summary: | [NV30/NV40] nouveau/pushbuf.c:238: pushbuf_krel: Assertion `bkref` failed. | ||
---|---|---|---|
Product: | Mesa | Reporter: | Severin Pappadeux <pappadeux> |
Component: | Drivers/DRI/nouveau | Assignee: | Nouveau Project <nouveau> |
Status: | RESOLVED MOVED | QA Contact: | Nouveau Project <nouveau> |
Severity: | normal | ||
Priority: | medium | CC: | fourdan, jeffbai |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: | Backtrace of Xwayland on a G71 |
Description
Severin Pappadeux
2016-03-28 03:01:19 UTC
nouveau doesn't handle multithreaded GL usage (even if it's done by having separate contexts per thread). If the application has a mode which adds an interlock to GL calls, please try enabling it. (The issue isn't with multiple contexts, but with concurrent use of multiple contexts.) Apparently it also affects some KDE applications: https://bugs.debian.org/822220#55 This makes any qt app using qtwebengine which "integrates chromium's fast moving web capabilities into Qt." crash. Affecting me by making kmail unusable. I was also able to reproduce this issue on my PowerMac G5 with GeForce 6600LE, but the crash was pretty random, in most of the cases this crash happened while I was using Linthesia. This is affecting Xwayland as well, as reported downstream in Fedora 25: https://bugzilla.redhat.com/show_bug.cgi?id=1372878 wrt comment 1, Xwayland is not using multithreaded GL, even through glamor, is it? A full backtrace of Xwayland that led to this abort() is found in the downstream bug here: https://bugzilla.redhat.com/attachment.cgi?id=1197379 (In reply to Olivier Fourdan from comment #5) > This is affecting Xwayland as well, as reported downstream in Fedora 25: > > https://bugzilla.redhat.com/show_bug.cgi?id=1372878 > > wrt comment 1, Xwayland is not using multithreaded GL, even through glamor, > is it? Oh dear. That's using glamor on a nv30 (or nv40) board. The nv30 driver is ... erm ... imperfect. And looking back at the first comments, the Quadro 1500M appears to be a G71, and the 6600LE is obviously some nv4x as well. In which case these issues are just as likely to not be caused by the multithreading stuff at all. But you're in (limited) luck - I have a NV34 plugged in. If you can send me an apitrace I can replay to cause this to happen, I can investigate [and the NV34 doesn't support as much stuff as a NV4x, but I can fake it]. I'll also stare at the nv30_resource_copy_region code to see what it's missing. OK, well staring at the code didn't do me much good. bkref == NULL means that the pushbuf thing doesn't know about the bo we're trying to reloc. However just a few lines above in nv30_transfer_rect_m2mf we do nouveau_pushbuf_refn (push, refs, 2) which should tell it about the src/dst bo's. So either there's concurrency going on, or something's not working as advertised. In the no-concurrency case, I'd need repro steps (that can't start with "install tons of software"). Created attachment 128368 [details]
Backtrace of Xwayland on a G71
Thanks for pointing out that it was on a G71, I have been able to reproduce using my own old G71 (Dell m1710) with Fedora 25!
Didn't have much luck trying to run Xwayland through apitrace, but I have a corefile of Xwayland, backtrace attached, if that's of any help...
(In reply to Ilia Mirkin from comment #7) > [...] If you can send me an apitrace I can replay to cause this to happen, > I can investigate [and the NV34 doesn't support as much stuff as a NV4x, > but I can fake it]. [...] Ah! I managed to reproduce with apitrace \o/ Xwayland: pushbuf.c:238: pushbuf_krel: Assertion `bkref' failed. apitrace: warning: caught signal 6 apitrace: flushing trace due to an exception /usr/bin/../lib64/apitrace/wrappers/glxtrace.so+0x224c8c [...] apitrace: info: taking default action for signal 6 The capture is there: https://people.freedesktop.org/~ofourdan/xwayland-apitrace.bz2 Please let me know if that helps! Looks like this is very similiar to #98039 (In reply to Tomasz Paweł Gajc from comment #11) > Looks like this is very similiar to #98039 Eventually, the assertion hit is the same, but it doesn't mean it's the same problem/root cause. Olivier, can you see if this patch with mesa helps? https://patchwork.freedesktop.org/patch/132414/ It helped in the repro of bug #99354. (In reply to Ilia Mirkin from comment #13) > Olivier, can you see if this patch with mesa helps? > > https://patchwork.freedesktop.org/patch/132414/ > > It helped in the repro of bug #99354. Sure! Thanks IIlia! The patch did not apply cleanly on top of 13.0.3 nor master though (in src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c line 406 [1]) so I changed "nouveau_pushbuf_space(push, 16, 2, 0);" to "nouveau_pushbuf_space(push, 32, 3, 0);" to match the content of your patch, I hope this is right (sorry, I'm not familiar with this code). But the issue is quite random and rather hard to reproduce at will, so I'll run with this for some time see if the issue reoccurs. I shall also prepare a test package for Fedora downstream with the (modified) patch included so that the original reporter can give it a try as well. [1] https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c#n406 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1372878 (In reply to Olivier Fourdan from comment #14) > The patch did not apply cleanly on top of 13.0.3 nor master though (in > src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c line 406 [1]) so I changed > "nouveau_pushbuf_space(push, 16, 2, 0);" to "nouveau_pushbuf_space(push, 32, > 3, 0);" to match the content of your patch, I hope this is right (sorry, I'm > not familiar with this code). Ah yeah, that sounds right. I have some local changes trying to fix that code (unsuccessfully). But that in no way affects your situation - the issue there is a functional one, not a crashing one. Haven't heard back from the original reporter downstream, but FAF (Fedora Analysis Framework) reports thousands (literally) of similar bugs: https://retrace.fedoraproject.org/faf/problems/2833836/ I see the patch in comment 13 has landed and is included in mesa-13.0.4, and Fedora has an updated mesa package for 13.0.4 and, interestingly, I don't see any mention of 13.0.4 in the FAF report, so it could be that this patch indeed fixes the issue with glamor/Xwayland on nv30. Thanks again for your help, Ilia! -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1099. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.