Summary: | [NVE7] NULL deref when putting card back to sleep after unsuccessful init (HUB_INIT timeout) | ||
---|---|---|---|
Product: | Mesa | Reporter: | Patrick Burroughs <celti> |
Component: | Drivers/DRI/nouveau | Assignee: | Nouveau Project <nouveau> |
Status: | RESOLVED MOVED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | ariscop, celti |
Version: | 10.2 | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Full output of journalctl, including kernel logs, between system boot and poweroff after crash.
Output from crashing glxinfo. Kernel logs filtered from journal output. Xorg.0.log from crash. dmesg output post initial patch. dmesg output using DRI3 dmesg output using firmware ripped from the blob dmesg output with errors from successful load |
Created attachment 104018 [details]
Output from crashing glxinfo.
Created attachment 104019 [details]
Kernel logs filtered from journal output.
Created attachment 104020 [details]
Xorg.0.log from crash.
There are two issues: (a) The null deref in the kernel when putting the card back to sleep (b) The fact that init of the card fails To mitigate the first, you could boot with "nouveau.runpm=0". However you still wouldn't get working accel with nouveau. The claim by NVIDIA was that the graph-not-powered-up problem was restricted to GK104/GK106. But looking at the latest code, it seems like it runs on GK107 as well (not in Ben's repo anymore, but still in linux-3.16) and perhaps has the reverse effect there. I wonder if a patch like diff --git a/nvkm/engine/graph/nve4.c b/nvkm/engine/graph/nve4.c index 51e0c07..4dd376e 100644 --- a/nvkm/engine/graph/nve4.c +++ b/nvkm/engine/graph/nve4.c @@ -350,7 +350,7 @@ nve4_graph_oclass = &(struct nvc0_graph_oclass) { .ctor = nvc0_graph_ctor, .dtor = nvc0_graph_dtor, .init = nve4_graph_init, - .fini = nve4_graph_fini, + .fini = _nouveau_graph_fini, }, .cclass = &nve4_grctx_oclass, .sclass = nve4_graph_sclass, will help you out. (You'll need to apply it with care... cd into drivers/gpu/drm/nouveau/core and apply it with patch -p2 ) I get the same crash and HUB_INIT timeout after the patch. Attaching dmesg. Created attachment 104026 [details]
dmesg output post initial patch.
If i look at the system + the kernel bug, this looks similar to a problem i was facing some weeks ago: so i'd suggest to try DRI3 with the whole package: Update your packages: xf86-video-intel mesa (all dependencies of course) Remove: xf86-video-nouveau (with DRI3 you wont need it to do: DRI_PRIME=1 myprog) you'll need a kernel with rendernodes enabled (boot with drm.rnodes=1) you may need to add a file to /etc/udev/rules.d/ containing: SUBSYSTEM=="drm", IMPORT{builtin}="path_id" to get ID_PATH tags for rendernodes. Created attachment 104031 [details]
dmesg output using DRI3
Using DRI3 defers all errors until after attempting to run an OpenGL application with DRI_PRIME=1, and prevents the crash from bringing down X or the kernel.
Created attachment 104033 [details]
dmesg output using firmware ripped from the blob
Using DRI3 and the firmware from the blob I still get crashes, but finally have a different error message in dmesg.
Tried again with Linux 3.19.3 and Mesa 10.5.2, no changes. Created attachment 122992 [details]
dmesg output with errors from successful load
With Linux 4.5.0-ARCH, Mesa 11.1.2-3, DRI3, and using modesetting_drv instead of intel_drv for the main display (not sure if that's relevant)... everything works! (If "everything" consists of glxinfo, glxgears, and a few minutes of Darwinia.)
I do still get errors in dmesg, though, as attached. I'll be happy to follow along and do whatever digging is necessary to eradicate them, if someone wants to take up that task.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1066. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 104017 [details] Full output of journalctl, including kernel logs, between system boot and poweroff after crash. Any OpenGL application, even as minor as glxinfo, either crashes Xorg or locks up the machine entirely (no network, magic sysrq fails) when started with DRI_PRIME=1. Has happened across multiple Mesa and kernel versions, most recently with Mesa 10.2.4 and Linux 3.15.8 on Arch Linux.