Bug 91557 - [NVE4] freezes: HUB_INIT timed out
Summary: [NVE4] freezes: HUB_INIT timed out
Status: NEW
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-08-04 17:34 UTC by wolf480
Modified: 2015-10-31 18:43 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg log from freeze with runpm=0 (72.47 KB, text/plain)
2015-08-04 17:34 UTC, wolf480
no flags Details
Xorg log from freeze with runpm=0 (30.94 KB, text/plain)
2015-08-04 17:35 UTC, wolf480
no flags Details
netconsole log from freeze with default runpm setting (15.64 KB, text/plain)
2015-08-04 17:36 UTC, wolf480
no flags Details
Xorg log from freeze with default runpm settings (31.42 KB, text/plain)
2015-08-04 17:38 UTC, wolf480
no flags Details
journalctl output (kernel messages only) from freeze with default runpm settings (219.29 KB, text/plain)
2015-08-04 17:39 UTC, wolf480
no flags Details
mmiotrace of successful nouveau initialization (1.70 MB, application/x-xz)
2015-08-04 17:42 UTC, wolf480
no flags Details
dmesg log from successful nouveau initialization (82.20 KB, text/plain)
2015-08-04 17:45 UTC, wolf480
no flags Details
lspci output (12.98 KB, text/plain)
2015-08-04 17:47 UTC, wolf480
no flags Details
mmiotrace of nouveau loading with HUB_INIT timeout (1.89 MB, application/x-xz)
2015-08-04 17:49 UTC, wolf480
no flags Details
dmesg log from nouveau loading with HUB_INIT timeout (82.81 KB, text/plain)
2015-08-04 17:50 UTC, wolf480
no flags Details
mmiotrace of nouveau loading with grctx timeout (2.37 MB, application/x-xz)
2015-08-04 17:53 UTC, wolf480
no flags Details
dmesg log from nouveau loading with grctx timeout (81.96 KB, text/plain)
2015-08-04 17:54 UTC, wolf480
no flags Details
dmesg log from nvidia-smi with proprietary driver (81.96 KB, text/plain)
2015-08-04 19:47 UTC, wolf480
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description wolf480 2015-08-04 17:34:38 UTC
Created attachment 117518 [details]
dmesg log from freeze with runpm=0

I have a Medion X7827 laptop with GK104 GPU in an Optimus setup, running:
Linux 4.1.2 x86_64
Mesa 10.6.2
Xorg 1.17.2

I've been experiencing some freezes:
- a total freeze (no ping, no sysrq, only hard reset) shortly after xorg start - if nouveau is loaded *without* runpm=0
- a recoverable freeze (sysrq+K worked) when exiting xorg - if nouveau is loaded *with* runpm=0

On #nouveau IRC channel I've been told to try the hack-gk106m branch of this repository: http://... , with runpm=0
At first I thought it helped, but then I noticed the freezez happen randomly.

When runpm=0 is set, the freeze has about 60% chance of happening. I've tested it with both in-tree nouveau.ko and one built from hack-gk106m branch, and looks like the chance is the same on both.
When the freeze happens, there's either a "HUB_INIT timed out" message or "grctx template channel unload timeout" message in dmesg.
If the freeze is to happen, the error message shows up at nouveau module load time, and then again when Xorg starts. Full logs in attachments.

I did mmiotraces of the nouveau.ko from hack-gk106m branch (can repeat with in-tree nouveau.ko if necessary), with runpm=0, for all of the cases:
- the driver loading succesfully
- the driver loading with HUB_INIT timeout error
- the driver loading with grctx timeout error
The traces and corresponding dmesg logs are in attachments. I have more traces, but included only one per case.
I did not try to start xorg and trigger the freeze during the mmiotraces, because:
a) I believe the problem happens at nouveau load time, when it tries to initialize the GPU
b) The traces compressed with `xz -9` barely fit in the max attachment size of bugzilla, if they were longer I doubt I could make them fit.

I hope these traces will be useful and help figure out why it sometimes works and sometimes doesn't, and how to make it always work.
Let me know if there's anything more I could to to help you figure this out.
Comment 1 wolf480 2015-08-04 17:35:49 UTC
Created attachment 117519 [details]
Xorg log from freeze with runpm=0
Comment 2 wolf480 2015-08-04 17:36:45 UTC
Created attachment 117520 [details]
netconsole log from freeze with default runpm setting
Comment 3 wolf480 2015-08-04 17:38:13 UTC
Created attachment 117521 [details]
Xorg log from freeze with default runpm settings
Comment 4 wolf480 2015-08-04 17:39:53 UTC
Created attachment 117522 [details]
journalctl output (kernel messages only) from freeze with default runpm settings
Comment 5 wolf480 2015-08-04 17:41:02 UTC
(In reply to wolf480 from comment #0)
> On #nouveau IRC channel I've been told to try the hack-gk106m branch of this
> repository: http://... , with runpm=0
I mean http://cgit.freedesktop.org/~darktama/nouveau/log/?h=hack-gk106m
Comment 6 wolf480 2015-08-04 17:42:55 UTC
Created attachment 117523 [details]
mmiotrace of successful nouveau initialization
Comment 7 wolf480 2015-08-04 17:45:27 UTC
Created attachment 117524 [details]
dmesg log from successful nouveau initialization
Comment 8 wolf480 2015-08-04 17:47:22 UTC
Created attachment 117525 [details]
lspci output

from successful nouveau initialization, dunno if that matters
Comment 9 wolf480 2015-08-04 17:49:05 UTC
Created attachment 117526 [details]
mmiotrace of nouveau loading with HUB_INIT timeout
Comment 10 wolf480 2015-08-04 17:50:20 UTC
Created attachment 117527 [details]
dmesg log from nouveau loading with HUB_INIT timeout
Comment 11 wolf480 2015-08-04 17:53:37 UTC
Created attachment 117528 [details]
mmiotrace of nouveau loading with grctx timeout
Comment 12 wolf480 2015-08-04 17:54:24 UTC
Created attachment 117529 [details]
dmesg log from nouveau loading with grctx timeout
Comment 13 wolf480 2015-08-04 19:44:49 UTC
I also did an mmiotrace of the proprietary nvidia driver being loaded when running nvidia-smi. Even compressed it doesn't fit in a bugzilla attachment, so I uploaded it here: https://www.dropbox.com/s/14mchinykvfqrg9/nvidia-trace.txt.xz?dl=1
Comment 14 wolf480 2015-08-04 19:47:01 UTC
Created attachment 117531 [details]
dmesg log from nvidia-smi with proprietary driver

the corresponding mmiotrace (over 3MiB compressed) is here:
https://www.dropbox.com/s/14mchinykvfqrg9/nvidia-trace.txt.xz?dl=1
Comment 15 Ilia Mirkin 2015-10-26 05:02:20 UTC
Please try (a) kernel v4.3-rc7, and (b) kernel v4.3-rc7 booted with nouveau.config=War00C800_0=1

The former has an updated pgob protocol, the latter enables an additional workaround necessary for some laptops. If it works, we can whitelist your specific subdevice, so please provide the output from

lspci -vnn -d 10de::300
Comment 16 Ilia Mirkin 2015-10-31 18:43:49 UTC
Update from OP on IRC: War00C800_0=1 makes it work

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK104M [GeForce GTX 870M] [10de:1199] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:1106]


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.