Bug 94803

Summary: nouveau bug crashes kernel 4.4.6 on warm boot
Product: xorg Reporter: Michael Daum <mic.daum>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: NEW --- QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
picture from stack trace taken with camera
none
dump of the nv50_disp_intr_supervisor function none

Description Michael Daum 2016-04-02 23:14:46 UTC
Created attachment 122684 [details]
picture from stack trace taken with camera

Whenever doing a warm boot (reboot), kernel 4.4.6 crashes showing the stack trace in the attached picture. The kernel boots fine when doing a cold start or after pressing the reset button.

The GPU in question is a "PNY Quadro FX 580" (Tesla / G96) with two attached monitors, both connected on display port.

The crash happens quite early when the kernel sets mode on the framebuffer. Resolution is set to 1920x1200 px, 240x75 cols/lines.

attached:

picture of the screen showing the stack trace
dump of "nv50_disp_intr_supervisor" function from the 4.4.6 vmlinux
Comment 1 Michael Daum 2016-04-02 23:16:41 UTC
Created attachment 122685 [details]
dump of the nv50_disp_intr_supervisor function
Comment 2 Ilia Mirkin 2016-04-02 23:36:31 UTC
0x80c (2060) looks like this:

   0xffffffff814daf32 <+2050>:	xor    %edx,%edx
   0xffffffff814daf34 <+2052>:	mov    %rax,%rdi
   0xffffffff814daf37 <+2055>:	mov    $0xc,%eax
   0xffffffff814daf3c <+2060>:	divl   -0x88(%rbp)

which has gotta come from

nv50_disp_intr_unk20_2_dp(...) {

	u32 dpctrl = nvkm_rd32(device, 0x61c10c + loff);
	link_nr = hweight32(dpctrl & 0x000f0000);

...

 	value = value - (3 * !!(dpctrl & 0x00004000)) - (12 / link_nr);

Which means that on boot link_nr is 0. Michael, if you're up for some kernel patching, can you just add a

if (!link_nr) {
  nvkm_error(subdev, "link_nr = 0; dpctrl: %08x\n", dpctrl);
  return;
}

right after the link_nr assignment in that function?
Comment 3 Michael Daum 2016-04-03 01:11:49 UTC
The patch from Ilia prevents the kernel from crashing. Reboot does not fail any longer. Additionally _most_ of the time both monitors come up at reboot.

But on some reboots one of the monitors stays black and the kernel logs following error (from dmesg):

[    0.955154] nouveau 0000:01:00.0: disp: outp 04:0006:0384: link training failed
[    0.973972] nouveau 0000:01:00.0: disp: outp 04:0006:0384: link training failed
[    0.974279] nouveau 0000:01:00.0: disp: link_nr = 0; dpctrl: 00401101


The value of dpctrl is always the same (00401101) then.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.