Bug 103421

Summary: Kernel 4.13+ nouveau breaks screen output
Product: xorg Reporter: kbaikov
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: adedominic, ahippo, freedesktopbmw, jan.public, m-mickan
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
kernel log
none
kernel log
none
kernel log
none
dmesg after plugging monitor in none

Description kbaikov 2017-10-23 13:28:16 UTC
Created attachment 135010 [details]
kernel log

There is a regression with my

02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF119 [NVS
310] [10de:107d] (rev a1)

where I would not get any graphics output anymore with Linux kernel
4.13.x, not even (framebuffer) text consoles.

There is a NULL pointer dereference around nvkm_dp_train_drive
visible in dmesg (attached)

and looking at the nouveau.ko disassembly, the faulting instruction was
a call to a variable address e.g. from
drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c:
	ior->func->dp.drive(ior, i, ocfg.pc, ocfg.dc,
				    ocfg.pe, ocfg.tx_pu);

which was introduced in commit af85389c614ae
so maybe you have a hint at what might be wrong or how to further debug
this problem...


With 4.12.8, there was a different NULL pointer deref but some working
graphics.
I also tried 4.14-rc3 or such but that still had broken graphics.

Thanks in advance for any help
Comment 1 Andrey Mazo 2017-11-10 00:13:30 UTC
I'm seeing the same problem on 4.13.11.

I have the same video card:
03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF119 [NVS 310] [10de:107d] (rev a1)

And a NULL pointer dereference:
[    2.354551] nouveau 0000:03:00.0: DRM: allocated 1920x1080 fb: 0x60000, bo ffffa3167a6b2000
[    2.455441] BUG: unable to handle kernel NULL pointer dereference at           (null)
[    2.455444] IP:           (null)
[    2.455445] PGD 0
[    2.455446] P4D 0

[    2.455448] Oops: 0010 [#1] SMP
[    2.455449] Modules linked in:
...
[    2.455471] Call Trace:
[    2.455476]  ? nvkm_dp_train_drive+0x1e8/0x2d0
[    2.455479]  nvkm_dp_acquire+0x8eb/0xcd0
[    2.455482]  nv50_disp_super_2_2+0x6b/0x430
[    2.455486]  ? nvkm_devinit_pll_set+0xa/0x10
[    2.455487]  gf119_disp_super+0x1ac/0x2f0
[    2.455492]  process_one_work+0x193/0x430
[    2.455493]  ? process_one_work+0x136/0x430
[    2.455495]  worker_thread+0x49/0x450
[    2.455499]  kthread+0x109/0x140
[    2.455500]  ? create_worker+0x1a0/0x1a0
[    2.455502]  ? kthread_create_on_node+0x40/0x40
[    2.455509]  ret_from_fork+0x27/0x40
[    2.455510] Code:  Bad RIP value.
[    2.455514] RIP:           (null) RSP: ffffbeed062afc10
[    2.455514] CR2: 0000000000000000
[    2.455517] ---[ end trace 85d5c96bafea0973 ]---

Please, see the full kernel.log attached.
Comment 2 Andrey Mazo 2017-11-10 00:18:37 UTC
Created attachment 135364 [details]
kernel log
Comment 3 m-mickan 2017-11-10 15:12:23 UTC
I think I have the same issue with the mobile version of that card:

01:00.0 VGA compatible controller: NVIDIA Corporation GF119M [Quadro NVS 4200M] (rev a1)

It is an optimus setup, so the nvidia card is only used when I plug in a monitor into the DisplayPort.

I compiled the kernel with commit af85389c614ae04970c0eea7a5c50fb889c8a480 and its parent.
With the parent, there were no error messages from the kernel, but the output was static and full of glitches. With the mentioned commit, there were error messages from the kernel and no output at all.

I attached the kernel log (with kernel version 4.13.11) and the dmesg output after plugging in a monitor into the Displayport (kernel compiled from commit af85389c614ae).
Comment 4 m-mickan 2017-11-10 15:14:28 UTC
Created attachment 135374 [details]
kernel log

4.13.11
Comment 5 m-mickan 2017-11-10 15:16:00 UTC
Created attachment 135375 [details]
dmesg after plugging monitor in

Commit: af85389c614ae04970c0eea7a5c50fb889c8a480
Comment 6 Rob Clark 2018-01-06 16:02:38 UTC
This patch should fix the issue:

https://patchwork.freedesktop.org/patch/196301/
Comment 7 kbaikov 2018-01-08 12:44:40 UTC
I can confirm that this patch fixes the issue:

https://patchwork.freedesktop.org/patch/196301/
Comment 8 Nikita 2018-01-16 21:09:50 UTC
I tested this patch and I can confirm that it fixed the issue.
Comment 9 Sven Joachim 2018-01-17 16:25:27 UTC
The patch in Comment 6 has been in Linus' tree since 4.15-rc8, and I have just requested that it should be applied to 4.14 as well.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.