Created attachment 117518 [details]
dmesg log from freeze with runpm=0
I have a Medion X7827 laptop with GK104 GPU in an Optimus setup, running:
Linux 4.1.2 x86_64
I've been experiencing some freezes:
- a total freeze (no ping, no sysrq, only hard reset) shortly after xorg start - if nouveau is loaded *without* runpm=0
- a recoverable freeze (sysrq+K worked) when exiting xorg - if nouveau is loaded *with* runpm=0
On #nouveau IRC channel I've been told to try the hack-gk106m branch of this repository: http://... , with runpm=0
At first I thought it helped, but then I noticed the freezez happen randomly.
When runpm=0 is set, the freeze has about 60% chance of happening. I've tested it with both in-tree nouveau.ko and one built from hack-gk106m branch, and looks like the chance is the same on both.
When the freeze happens, there's either a "HUB_INIT timed out" message or "grctx template channel unload timeout" message in dmesg.
If the freeze is to happen, the error message shows up at nouveau module load time, and then again when Xorg starts. Full logs in attachments.
I did mmiotraces of the nouveau.ko from hack-gk106m branch (can repeat with in-tree nouveau.ko if necessary), with runpm=0, for all of the cases:
- the driver loading succesfully
- the driver loading with HUB_INIT timeout error
- the driver loading with grctx timeout error
The traces and corresponding dmesg logs are in attachments. I have more traces, but included only one per case.
I did not try to start xorg and trigger the freeze during the mmiotraces, because:
a) I believe the problem happens at nouveau load time, when it tries to initialize the GPU
b) The traces compressed with `xz -9` barely fit in the max attachment size of bugzilla, if they were longer I doubt I could make them fit.
I hope these traces will be useful and help figure out why it sometimes works and sometimes doesn't, and how to make it always work.
Let me know if there's anything more I could to to help you figure this out.
Created attachment 117519 [details]
Xorg log from freeze with runpm=0
Created attachment 117520 [details]
netconsole log from freeze with default runpm setting
Created attachment 117521 [details]
Xorg log from freeze with default runpm settings
Created attachment 117522 [details]
journalctl output (kernel messages only) from freeze with default runpm settings
(In reply to wolf480 from comment #0)
> On #nouveau IRC channel I've been told to try the hack-gk106m branch of this
> repository: http://... , with runpm=0
I mean http://cgit.freedesktop.org/~darktama/nouveau/log/?h=hack-gk106m
Created attachment 117523 [details]
mmiotrace of successful nouveau initialization
Created attachment 117524 [details]
dmesg log from successful nouveau initialization
Created attachment 117525 [details]
from successful nouveau initialization, dunno if that matters
Created attachment 117526 [details]
mmiotrace of nouveau loading with HUB_INIT timeout
Created attachment 117527 [details]
dmesg log from nouveau loading with HUB_INIT timeout
Created attachment 117528 [details]
mmiotrace of nouveau loading with grctx timeout
Created attachment 117529 [details]
dmesg log from nouveau loading with grctx timeout
I also did an mmiotrace of the proprietary nvidia driver being loaded when running nvidia-smi. Even compressed it doesn't fit in a bugzilla attachment, so I uploaded it here: https://www.dropbox.com/s/14mchinykvfqrg9/nvidia-trace.txt.xz?dl=1
Created attachment 117531 [details]
dmesg log from nvidia-smi with proprietary driver
the corresponding mmiotrace (over 3MiB compressed) is here:
Please try (a) kernel v4.3-rc7, and (b) kernel v4.3-rc7 booted with nouveau.config=War00C800_0=1
The former has an updated pgob protocol, the latter enables an additional workaround necessary for some laptops. If it works, we can whitelist your specific subdevice, so please provide the output from
lspci -vnn -d 10de::300
Update from OP on IRC: War00C800_0=1 makes it work
01:00.0 VGA compatible controller : NVIDIA Corporation GK104M [GeForce GTX 870M] [10de:1199] (rev a1) (prog-if 00 [VGA controller])
Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:1106]