Bug 101184 - [NVE6] [bisected] Panic on boot with GK106
Summary: [NVE6] [bisected] Panic on boot with GK106
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
: 101273 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-05-25 10:17 UTC by S. Gilles
Modified: 2017-06-15 07:35 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Transcription of kernel panic (2.74 KB, text/plain)
2017-05-25 10:17 UTC, S. Gilles
no flags Details
output of `lspci -nnnnnnn -vvvvvvvvv' (31.62 KB, text/plain)
2017-05-25 10:17 UTC, S. Gilles
no flags Details
output of dmesg on (working) 4.11.0-rc4 with nouveau.debug=debug (47.07 KB, text/plain)
2017-05-29 10:57 UTC, S. Gilles
no flags Details
test fix (1.36 KB, patch)
2017-06-05 07:27 UTC, Ben Skeggs
no flags Details | Splinter Review

Description S. Gilles 2017-05-25 10:17:09 UTC
Created attachment 131503 [details]
Transcription of kernel panic

Since e4311ee51d1e2676001b2d8fcefd92bdd79aad85 "drm/nouveau/therm: remove ineffective workarounds for alarm bugs", my machine with a GTX 650 panics on boot. I will attach lspci output and a transcription of the panic, and I can provide more information or test patches as needed.
Comment 1 S. Gilles 2017-05-25 10:17:49 UTC
Created attachment 131504 [details]
output of `lspci -nnnnnnn -vvvvvvvvv'
Comment 2 Ben Skeggs 2017-05-29 09:35:47 UTC
That's, uh, rather strange.  Is this fully reproducible?

Could I also see a kernel log with "nouveau.debug=debug" from a working kernel?
Comment 3 S. Gilles 2017-05-29 10:57:02 UTC
It certainly feels reproducible.  I've tested ~20 boots past the listed commit, and ~10 from very close to before it, and if that commit isn't the dividing line, it's doing a very good job of pretending to be. (I agree with you that it looks pretty harmless, though.)

I'm about to attach a dmesg from a working kernel (4.11.0-rc4) with "nouveau.modeset=1 nouveau.config=NvGrUseFW=1 nouveau.debug=debug" (my standard command line has "nouveau.modeset=1 nouveau.config=NvGrUseFW=1" if it matters).
Comment 4 S. Gilles 2017-05-29 10:57:53 UTC
Created attachment 131565 [details]
output of dmesg on (working) 4.11.0-rc4 with nouveau.debug=debug
Comment 5 S. Gilles 2017-05-29 11:37:19 UTC
I just reverted e4311ee51d1e2676001b2d8fcefd92bdd79aad85 on -mainline, and the resulting kernel doesn't panic.  It doesn't receive input from any USB devices, but I think that's probably unrelated. :)

Following that, I reset to -mainline, then went through the four changes of the commit and tried reverting them individually.  Reverting three of them does nothing, but reverting the change to drivers/gpu/drm/nouveau/nvkm/subdev/therm/fan.c does prevent the panic.
Comment 6 Ben Skeggs 2017-06-05 07:27:14 UTC
Created attachment 131704 [details] [review]
test fix

Can you give this patch a try please?

I'm not 100% convinced this is the issue here, but I'd like to rule it in/out.  I have identical hardware to a couple of the other reporters of this issue, but have been completely unable to reproduce for unknown reasons.
Comment 7 Karol Herbst 2017-06-05 12:55:03 UTC
(In reply to Ben Skeggs from comment #6)
> Created attachment 131704 [details] [review] [review]
> test fix
> 
> Can you give this patch a try please?
> 
> I'm not 100% convinced this is the issue here, but I'd like to rule it
> in/out.  I have identical hardware to a couple of the other reporters of
> this issue, but have been completely unable to reproduce for unknown reasons.

I can reproduce the issue and I never got the crash with this patch, but I also only tried a few times. Maybe if you ask others with that issue to try it out you get enough confirmations?
Comment 8 Karol Herbst 2017-06-05 17:23:49 UTC
*** Bug 101273 has been marked as a duplicate of this bug. ***
Comment 9 S. Gilles 2017-06-06 01:33:48 UTC
(In reply to Ben Skeggs from comment #6)
> Created attachment 131704 [details] [review] [review]
> test fix
> 
> Can you give this patch a try please?

I built against -mainline just now. Without the patch, I get the panic, and with the patch, I get no panic.  I'll call that successful.
Comment 10 ingo66 2017-06-07 17:46:33 UTC
Hello,

I have test the patch for linux 4.11.3 with the driver xf86-video-nouveau 1.0.15 and now my system starts without a kernel panic.

Without the patch only linux < 4.11.3 start correct.
Comment 11 mrblooter 2017-06-09 16:34:50 UTC
I can also confirm that it fixed the kernel panics for me with the patch applied.
Comment 12 mrblooter 2017-06-15 06:57:25 UTC
Using kernel 4.11.5 fixed this.
Comment 13 S. Gilles 2017-06-15 07:35:08 UTC
It appears the fix has been merged into mainline as b4e382ca7586a63b6c1e5221ce0863ff867c2df6 "drm/nouveau/tmr: fully separate alarm execution/pending lists". I can also confirm that unpatched -mainline now boots - thank you for the fix!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.