Summary: | Crash when creating a depth buffer on GeForce 320M | ||
---|---|---|---|
Product: | Mesa | Reporter: | Timo Wiren <timo.wiren> |
Component: | Drivers/DRI/nouveau | Assignee: | Nouveau Project <nouveau> |
Status: | RESOLVED FIXED | QA Contact: | Nouveau Project <nouveau> |
Severity: | critical | ||
Priority: | medium | ||
Version: | 18.2 | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
glxinfo
dmesg |
Please include your dmesg. The fact that nouveau_bo_new fails is extremely unexpected. Created attachment 142112 [details]
dmesg
Added dmesg.
Hrmph. Well, nothing in there. So ... what's different about your environment? I'm on Xorg 1.19, windowmaker, no compositor of any sort. glxgears works fine. Tell me about your setup. (In reply to Ilia Mirkin from comment #3) > Hrmph. Well, nothing in there. So ... what's different about your > environment? > > I'm on Xorg 1.19, windowmaker, no compositor of any sort. glxgears works > fine. > > Tell me about your setup. Nothing custom, just the freshly installed Lubuntu 18.10 64-bit, English localization, laptop's internal display, no encrypted disks, no compositor AFAIK. But I just found a workaround! I downloaded mesa 18.2.3, edited nv50_miptree.c and disabled compression in nv50_mt_choose_storage_type() for PIPE_FORMAT_Z24_X8_UNORM. That is, I put "compressed = false;" after tile_flags = 0x128 + ms;, compiled mesa and ran glxgears with my compiled version and it didn't crash. I don't know if it's a proper workaround. The issue seems to be with depth compression, I guess. Could you boot with nouveau.debug=mmu=debug and see what gets printed? I think I see why the -22 (EINVAL) is being generated -- the RAM is marked as stolen, and it rejects compressed memory on it. However I'm not sure that's actually correct. (In reply to Ilia Mirkin from comment #5) > Could you boot with > > nouveau.debug=mmu=debug > > and see what gets printed? I think I see why the -22 (EINVAL) is being > generated -- the RAM is marked as stolen, and it rejects compressed memory > on it. However I'm not sure that's actually correct. Thanks for looking into this. Here's the output: [ 110.000223] nouveau 0000:04:00.0: mmu: user: comp 3 0a [ 110.000227] nouveau 0000:04:00.0: mmu: user: invalid -22 [ 110.000275] nouveau 0000:04:00.0: mmu: user: comp 3 0a [ 110.000277] nouveau 0000:04:00.0: mmu: user: invalid -22 [ 110.003636] nouveau 0000:04:00.0: mmu: user: comp 3 0a [ 110.003639] nouveau 0000:04:00.0: mmu: user: invalid -22 [ 110.003676] nouveau 0000:04:00.0: mmu: user: comp 3 0a [ 110.003678] nouveau 0000:04:00.0: mmu: user: invalid -22 [ 110.003697] glxgears[1196]: segfault at 1a ip 00007fa0080b533d sp 00007ffc0a624b00 error 4 in nouveau_dri.so[7fa007ee1000+819000] [ 110.003705] Code: c6 44 24 07 00 49 8b 9d c0 01 00 00 48 85 db 0f 84 e4 00 00 00 80 bb 91 00 00 00 00 0f 85 03 01 00 00 48 8b 4b 58 0f b7 14 24 <0f> b7 41 1a 66 39 51 18 48 89 4c 24 48 66 0f 46 51 18 66 39 44 24 OK, "good". This is what I expected based on my reading of the code, so ... comforting to know that I can read code. Ben -- this seems wrong. I don't know if compression is or is not truly supported on the MCP89-stolen ram, but nouveau_bo_new should not fail there. There's various logic in there to turn off compression if it's not found, but that does not appear to be triggering here. Could you edit drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c:nv50_vmm_valid, and change it to set aper = 0 in the if (ram->stolen) case (so aper=0 in both of the VRAM cases). Let me know if you need me to make you a proper patch. Then see if ... stuff works. At least glxgears, but would be good to test more complex things too which make extensive use of depth as well as compressible color formats (something like xonotic would be more than sufficient). Unfortunately we're not sure if this works on MCP89 or not. (In reply to Ilia Mirkin from comment #8) > Could you edit > drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c:nv50_vmm_valid, and change > it to set aper = 0 in the if (ram->stolen) case (so aper=0 in both of the > VRAM cases). I tested it on kernel 4.18.16 and it caused a crash/hang on bootup. Some of the log messages (copied by hand): nouveau 0000:04:00.0: fb: trapped read at 015bc9dedc on channel 2 [0fba0000 DRM] engine 05 [PFIFO] client 08 [PFIFO_READ] subclient 00 [PUSHBUF] reason 0000000b [PT_NOT_PRESENT] nouveau 0000:04:00.0: fifo: DMA_PUSHER - ch 2 [DRM] get 015c25a7a4 put 015c25a7a4 ib_get 000000a2 ib_put 000000a7 state a0000000 (err: IB_EMPTY) nouveau 0000:04:00.0: DRM: GPU lockup - switching to software fbcon nouveau 0000:04:00.0: fb: trapped write at 0100131fc0 on channel -1 [0fedf000 unknown] engine 06 [BAR] client 04 [PFIFO_WRITE] subclient 01 [IN] reason 000000b [VRAM_LIMIT] Since Lubuntu 19.04 the crash has disappeared but I get broken depth testing instead in all GL applications, including glxgears. My workaround (disabling depth compression) still works. Current kernel: 5.0.0-13-generic Mesa: 19.0.2 Given the amount of time that this has gone on unfixed, I think we should just make mcp89 point at mcp77_mmu_new instead of g84_mmu_new (in nvkm/engine/device/base.c). Literally the only difference between those two is the ability to use compression. The quick test in comment #9 didn't yield positive results. Let's not make things extra-broken for people -- even if compression is somehow enableable on those chips, it's never worked on nouveau, I think. Timo - are you up to sending a change to fix the above in the kernel? If not, I can do it. (In reply to Ilia Mirkin from comment #11) > Timo - are you up to sending a change to fix the above in the kernel? If > not, I can do it. Well, I have never submitted a patch to the kernel before, but this is a good opportunity to learn the process :-). I'll try to make it happen in a few days. My fix seems to be included in Linux 5.3, so resolving as fixed: https://lists.freedesktop.org/archives/dri-devel/2019-July/227219.html |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 142101 [details] glxinfo Every OpenGL application that wants to use a depth buffer always crashes, including glxgears: glxgears: dri2.c:906: dri2_allocate_textures: Assertion `*zsbuf' failed. I debugged the assertion with gdb: templ structure contents passed to resource_create(): $2 = {reference = {count = 0}, width0 = 300, height0 = 300, depth0 = 1, array_size = 1, format = PIPE_FORMAT_Z24X8_UNORM, target = PIPE_TEXTURE_2D, last_level = 0, nr_samples = 0, nr_storage_samples = 0, usage = 0, bind = 1, flags = 0, next = 0x0, screen = 0x0} In nv50_miptree_create() in gallium/drivers/nouveau/nv50/nv50_miptree.c:389 the call to nouveau_bo_new() returns -22 that causes it to return NULL. MESA_DEBUG=1 glxgears prints the following before segfaulting: Mesa: User error: GL_OUT_OF_MEMORY in Resizing framebuffer Computer: MacBook Pro 2010 (NVIDIA GeForce 320M "MCP89") Resolution: 1280x800 OS: Lubuntu 18.10 Mesa: 18.2.2, but happens also with the versions that come with Lubuntu 16.04 and 18.04 I can compile and run mesa from sources, if it helps debugging.