Bug 101184

Summary: [NVE6] [bisected] Panic on boot with GK106
Product: xorg Reporter: S. Gilles <sgilles>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: major    
Priority: medium CC: mrblooter
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Transcription of kernel panic
none
output of `lspci -nnnnnnn -vvvvvvvvv'
none
output of dmesg on (working) 4.11.0-rc4 with nouveau.debug=debug
none
test fix none

Description S. Gilles 2017-05-25 10:17:09 UTC
Created attachment 131503 [details]
Transcription of kernel panic

Since e4311ee51d1e2676001b2d8fcefd92bdd79aad85 "drm/nouveau/therm: remove ineffective workarounds for alarm bugs", my machine with a GTX 650 panics on boot. I will attach lspci output and a transcription of the panic, and I can provide more information or test patches as needed.
Comment 1 S. Gilles 2017-05-25 10:17:49 UTC
Created attachment 131504 [details]
output of `lspci -nnnnnnn -vvvvvvvvv'
Comment 2 Ben Skeggs 2017-05-29 09:35:47 UTC
That's, uh, rather strange.  Is this fully reproducible?

Could I also see a kernel log with "nouveau.debug=debug" from a working kernel?
Comment 3 S. Gilles 2017-05-29 10:57:02 UTC
It certainly feels reproducible.  I've tested ~20 boots past the listed commit, and ~10 from very close to before it, and if that commit isn't the dividing line, it's doing a very good job of pretending to be. (I agree with you that it looks pretty harmless, though.)

I'm about to attach a dmesg from a working kernel (4.11.0-rc4) with "nouveau.modeset=1 nouveau.config=NvGrUseFW=1 nouveau.debug=debug" (my standard command line has "nouveau.modeset=1 nouveau.config=NvGrUseFW=1" if it matters).
Comment 4 S. Gilles 2017-05-29 10:57:53 UTC
Created attachment 131565 [details]
output of dmesg on (working) 4.11.0-rc4 with nouveau.debug=debug
Comment 5 S. Gilles 2017-05-29 11:37:19 UTC
I just reverted e4311ee51d1e2676001b2d8fcefd92bdd79aad85 on -mainline, and the resulting kernel doesn't panic.  It doesn't receive input from any USB devices, but I think that's probably unrelated. :)

Following that, I reset to -mainline, then went through the four changes of the commit and tried reverting them individually.  Reverting three of them does nothing, but reverting the change to drivers/gpu/drm/nouveau/nvkm/subdev/therm/fan.c does prevent the panic.
Comment 6 Ben Skeggs 2017-06-05 07:27:14 UTC
Created attachment 131704 [details] [review]
test fix

Can you give this patch a try please?

I'm not 100% convinced this is the issue here, but I'd like to rule it in/out.  I have identical hardware to a couple of the other reporters of this issue, but have been completely unable to reproduce for unknown reasons.
Comment 7 Karol Herbst 2017-06-05 12:55:03 UTC
(In reply to Ben Skeggs from comment #6)
> Created attachment 131704 [details] [review] [review]
> test fix
> 
> Can you give this patch a try please?
> 
> I'm not 100% convinced this is the issue here, but I'd like to rule it
> in/out.  I have identical hardware to a couple of the other reporters of
> this issue, but have been completely unable to reproduce for unknown reasons.

I can reproduce the issue and I never got the crash with this patch, but I also only tried a few times. Maybe if you ask others with that issue to try it out you get enough confirmations?
Comment 8 Karol Herbst 2017-06-05 17:23:49 UTC
*** Bug 101273 has been marked as a duplicate of this bug. ***
Comment 9 S. Gilles 2017-06-06 01:33:48 UTC
(In reply to Ben Skeggs from comment #6)
> Created attachment 131704 [details] [review] [review]
> test fix
> 
> Can you give this patch a try please?

I built against -mainline just now. Without the patch, I get the panic, and with the patch, I get no panic.  I'll call that successful.
Comment 10 ingo66 2017-06-07 17:46:33 UTC
Hello,

I have test the patch for linux 4.11.3 with the driver xf86-video-nouveau 1.0.15 and now my system starts without a kernel panic.

Without the patch only linux < 4.11.3 start correct.
Comment 11 mrblooter 2017-06-09 16:34:50 UTC
I can also confirm that it fixed the kernel panics for me with the patch applied.
Comment 12 mrblooter 2017-06-15 06:57:25 UTC
Using kernel 4.11.5 fixed this.
Comment 13 S. Gilles 2017-06-15 07:35:08 UTC
It appears the fix has been merged into mainline as b4e382ca7586a63b6c1e5221ce0863ff867c2df6 "drm/nouveau/tmr: fully separate alarm execution/pending lists". I can also confirm that unpatched -mainline now boots - thank you for the fix!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.