Created attachment 115504 [details] Kernel Log When I launch the 3d game Xonotic or 0ad I get a systematically reproducible crash: screen freezes and system can't be shutdown anymore from ssh. How to reproduce: Launch the game Xonotic This has been so since at least linux 3.18 Does not happen on linux 3.14 lspci -v: Flags: bus master, VGA palette snoop, 66MHz, medium devsel, latency 64 Bus: primary=00, secondary=04, subordinate=04, sec-latency=64 Memory behind bridge: fe100000-fe1fffff 01:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 660 Ti] (rev a1) (prog-if 00 [VGA controller]) Subsystem: Micro-Star International Co., Ltd. [MSI] Device 2843 Flags: bus master, fast devsel, latency 0, IRQ 29 Memory at fd000000 (32-bit, non-prefetchable) [size=16M] Memory at f0000000 (64-bit, prefetchable) [size=128M] Memory at f8000000 (64-bit, prefetchable) [size=32M] I/O ports at e000 [size=128] Expansion ROM at fe000000 [disabled] [size=512K] Capabilities: <access denied> Kernel driver in use: nouveau Kernel modules: nouveau mesa 10.5.4-1 libdrm 2.4.60-2 xf86-video-nouveau 1.0.11-3 linux 4.0.1
The screen freeze is most likely unrelated to the error you quote, but is rather related to the PDISP errors earlier in the log. Can you try bisecting? (My bet is the display rework in 3.16/17...)
Never done that but I'll try... (so might take a while)
Created attachment 115506 [details] kernel_log_3.17 linux 3.17 fails to probe nouveau... nouveau: probe of 0000:01:00.0 failed with error -12
Probably because of your NvBIOS=PRAMIN thing
Every NvBios option gives me same result
How about removing it? :)
Same result :)
Created attachment 115513 [details] bisect_log found the patch that caused the PDISP errors
Great, thanks for doing that. The commit you landed on certainly *seems* related to the whole DISP thing, which is good: commit 7a014a872914a6bb5af8b67eba603f8546794ab9 Author: Ben Skeggs <bskeggs@redhat.com> Date: Fri May 16 14:36:15 2014 +1000 drm/nouveau/disp: add internal representaion of output paths and connectors This will, at some point, be used to replace various bits and pieces of code doing direct bios parsing. For now, it'll just be used for some DP improvements. Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Arthur, can you attach your VBIOS (cat /sys/kernel/debug/dri/0/vbios.rom) and tell us which connectors have monitors connected? Ben, any ideas?
Created attachment 115514 [details] vbios.rom using linux v4.0 xrandr: Screen 0: minimum 320 x 200, current 3200 x 1080, maximum 8192 x 8192 DVI-I-1 connected 1280x1024+1920+0 (normal left inverted right x axis y axis) 338mm x 270mm 1280x1024 60.02*+ 1152x864 75.00 1024x768 75.08 75.03 60.00 832x624 74.55 800x600 75.00 60.32 640x480 75.00 60.00 720x400 70.08 DVI-D-1 connected primary 1920x1080+0+0 (normal left inverted right x axis y axis) 510mm x 287mm 1920x1080 60.00*+ 1280x1024 75.02 60.02 1152x864 75.00 1024x768 75.08 60.00 800x600 75.00 60.32 640x480 75.00 60.00 720x400 70.08 HDMI-1 disconnected (normal left inverted right x axis y axis) DP-1 disconnected (normal left inverted right x axis y axis)
A suggestion from Ben: In drivers/gpu/drm/nouveau/nv50_display.c: - if (show && nv_crtc->cursor.nvbo) + if (show && nv_crtc->base.enabled && nv_crtc->cursor.nvbo)
I can confirm that this fixes the PDISP errors! But I still get the original error (screen freeze) while launching some games: nouveau E[ PFIFO][0000:01:00.0] read fault at 0x000a940000 [PTE] from CE2/GR_CE on channel 0x007f369000 [unknown] Using mesa 10.5.5-1 linux-git with this fix libdrm 2.4.61-1 xf86-video-nouveau 1.0.11-3 Should I file a new bug for that issue?
I tried bisecting the read fault error. Before the commit 3d9e3921f4d77bcaeea913c48b894d1208f0cb06 there are no errors. After that commit modesetting fails. This is the case until commit 13dfe1286d1ea1af4c9330b039c2316d0d92c484 with which modesetting does works again. The read fault error is present in this last version so the cause is something in between those two commits.
You can workaround it by applying the contents of commits: 79456e1a10d5f4e708822287ed0e97af469bf49b d979ab975ecdb336ed4da77a808be813a293b59e d7bda18c9102b65078c132fd7d7ffd835058f021 13dfe1286d1ea1af4c9330b039c2316d0d92c484 at each step of the bisection. Of course it's possible that those 5 patches (4 above and 3d9e3921f4d77bcaeea913c48b894d1208f0cb06) are the culprit, so first check whether 3d9e3921f4d77bcaeea913c48b894d1208f0cb06 + above patches works.
3d9e3921f4d77bcaeea913c48b894d1208f0cb06 + those 4 patches has the read fault error!
(In reply to Arthur Heymans from comment #15) > 3d9e3921f4d77bcaeea913c48b894d1208f0cb06 + those 4 patches has the read > fault error! Out of all of the changes in the aforementioned patches this one would seem to be most likely to cause the issue: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/diff/drivers/gpu/drm/nouveau/core/core/mm.c?id=d979ab975ecdb336ed4da77a808be813a293b59e Any chance you can revert that change in isolation and see if the faults disappear?
Reversing 3d9e3921f4d77bcaeea913c48b894d1208f0cb06 solves the problem. Howerever c39f472e9f14e49a9bc091977ced0ec45fc00c57 changes some names so I don't know what to for recent kernels.
(In reply to Arthur Heymans from comment #17) > Reversing 3d9e3921f4d77bcaeea913c48b894d1208f0cb06 solves the problem. > > Howerever c39f472e9f14e49a9bc091977ced0ec45fc00c57 changes some names so I > don't know what to for recent kernels. The code in question is still here: http://cgit.freedesktop.org/~darktama/nouveau/tree/drm/nouveau/nvkm/subdev/fb/ramgf100.c#n600 It's surprising that reverting that commit helps... it fixed issues for people with funny memory partitioning IIRC.
Ok there are still crashes but they tend to happen less easier/fast. Applications that once produced a reproducible instant crashes, now crashes with same errors after ~30min or more. (also works on more recent kernels)
(In reply to Arthur Heymans from comment #17) > Reversing 3d9e3921f4d77bcaeea913c48b894d1208f0cb06 solves the problem. > > Howerever c39f472e9f14e49a9bc091977ced0ec45fc00c57 changes some names so I > don't know what to for recent kernels. One possible issue (albeit unlikely) is that due to u32 maths the extra multiplication (combined with << 8) is causing an overflow. Fwiw latest upstream is explicitly using u64 typed variables.
Well freezes also happen on versions before that particular patch. They are less common and I have not found a way to make them reproducible (similar to reverting that patch on recent kernel).
(In reply to Arthur Heymans from comment #21) > Well freezes also happen on versions before that particular patch. They are > less common and I have not found a way to make them reproducible (similar to > reverting that patch on recent kernel). Can you see if you still have issues with mesa 11.0.3 and a regular (and recent) upstream kernel? Could you be you were just getting lucky/unlucky with the other kernel changes.
Using linux 4.2.3 mesa 11.0.3 I still get frequent things like nouveau E[ PFIFO][0000:03:00.0] read fault at 0x0011990000 [UNSUPPORTED_KIND] from CE2/GR_CE on channel 0x007f121000 [unknown] which freezes display. So I would say nothing really changed.
I have exactly the same problem with screens freeze, inability to shutdown desktop from remote and read fault at 0x00bca00000 [UNSUPPORTED_KIND] from CE2/GR_CE on channel 0x007f6ef000 [unknown] in the logs. 4.2.5-1-ARCH mesa 11.0.7-1 libdrm 2.4.65-1 xf86-video-nouveau 1.0.11+31+g1ff13a9-1 04:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 770] (rev a1) (prog-if 00 [VGA controller]) Subsystem: ASUSTeK Computer Inc. Device 8465 Physical Slot: 4 Flags: bus master, fast devsel, latency 0, IRQ 70 Memory at fa000000 (32-bit, non-prefetchable) [size=16M] Memory at f0000000 (64-bit, prefetchable) [size=128M] Memory at f8000000 (64-bit, prefetchable) [size=32M] I/O ports at e000 [size=128] Expansion ROM at fb000000 [disabled] [size=512K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Capabilities: [b4] Vendor Specific Information: Len=14 <?> Capabilities: [100] Virtual Channel Capabilities: [128] Power Budgeting <?> Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Capabilities: [900] #19 Kernel driver in use: nouveau Kernel modules: nouveau Game triggers the issue is Wasteland 2 (native, x64). This is the only game I have, so I can't extrapolate.
I suggest trying again with kernel 4.6, all freezes have stopped here on 660ti NVE4.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/184.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.