Bug 70354 - [NVE6,NVE7] HUB_INIT timeout on graph init, blob fw doesn't help
[NVE6,NVE7] HUB_INIT timeout on graph init, blob fw doesn't help
Status: NEW
Product: xorg
Classification: Unclassified
Component: Driver/nouveau
unspecified
x86-64 (AMD64) Linux (All)
: medium normal
Assigned To: Nouveau Project
Xorg Project Team
:
: 80627 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-10 19:35 UTC by Fred New
Modified: 2015-03-21 22:02 UTC (History)
11 users (show)

See Also:


Attachments
Xorg.0.log (37.21 KB, text/plain)
2013-10-10 19:35 UTC, Fred New
no flags Details
dmesg (91.68 KB, text/plain)
2013-10-10 19:36 UTC, Fred New
no flags Details
Full dmesg on linux-3.12-rc4 - without config=NvGrUseFw=1 (72.21 KB, text/plain)
2013-10-12 20:17 UTC, Joey 4712
no flags Details
Full dmesg on linux-3.12-rc4 - with config=NvGrUseFw=1 (72.22 KB, text/plain)
2013-10-12 20:20 UTC, Joey 4712
no flags Details
Full dmesg on linux-3.11.4-1 - without config=NvGrUseFw=1 (73.90 KB, text/plain)
2013-10-12 20:23 UTC, Joey 4712
no flags Details
Full dmesg on linux-3.11.4-1 - with config=NvGrUseFw=1 (73.02 KB, text/plain)
2013-10-12 20:25 UTC, Joey 4712
no flags Details
another dmesg with some more messages at the end (86.07 KB, text/plain)
2013-11-06 06:52 UTC, Martin
no flags Details
Full dmesg on linux-3.13.0-1 - loading nouveau.ko built from Ben's repository (70.61 KB, text/plain)
2013-12-14 09:41 UTC, Joey 4712
no flags Details
probably "fix" (1.40 KB, patch)
2014-03-05 04:47 UTC, Ben Skeggs
no flags Details | Splinter Review
dmesg | egrep -i "nouveau|drm" (5.93 KB, text/plain)
2014-04-03 10:08 UTC, D. Moens
no flags Details
kernel log file in case it doesn't HUB_INIT timeout (746.24 KB, text/plain)
2014-08-29 09:21 UTC, Karol Herbst
no flags Details
mmiotrace when module loads sucessfully (299.24 KB, application/x-xz)
2014-09-27 17:04 UTC, Karol Herbst
no flags Details
mmiotrace when failing (433.64 KB, text/plain)
2014-09-27 17:11 UTC, Karol Herbst
no flags Details
Xorg log (37.39 KB, text/plain)
2015-01-18 13:32 UTC, Vitaly Torshyn
no flags Details
dmesg (215.81 KB, text/plain)
2015-01-18 13:32 UTC, Vitaly Torshyn
no flags Details
lspci -v (8.24 KB, text/plain)
2015-01-18 13:32 UTC, Vitaly Torshyn
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Fred New 2013-10-10 19:35:40 UTC
Created attachment 87411 [details]
Xorg.0.log

It looks like my new HP Envy 17, Intel Core i7-4702MQ (Haswell), Nvidia GeForce GT 750M is a little too bleeding edge. Initialisation of the GT 750M fails and the integrated graphics controller is used.

My operating system is Fedora 20 (beta) with Linux kernel 3.11.3-301.bz105920.fc20.x86_64. This is the 3.11.3-301.fc20.x86_64 kernel with two patches from freedesktop bug 70208 applied - patches from comment 6 and comment 10.

The xorg nouveau driver from Fedora is xorg-x11-drv-nouveau-1.0.9-2.fc20.x86_64.
Comment 1 Fred New 2013-10-10 19:36:31 UTC
Created attachment 87412 [details]
dmesg
Comment 2 Ilia Mirkin 2013-10-10 19:43:03 UTC
This bug appears to be identical to the later issue presented in #70208 (comment 15) -- of which you were aware since you even posted in that issue. What was your motivation for opening a separate issue?
Comment 3 Fred New 2013-10-10 20:01:38 UTC
Sorry, reading the comments in bug 70208, I was under the impression that the vbios problem was resolved and a new bug was needed to resolve the next problem that appeared. Feel free to close this as duplicate if that isn't the case.
Comment 4 Ilia Mirkin 2013-10-10 20:37:19 UTC
Erm, you're right. My bad. Let's keep this separate. I'll put up a note in the other issue.
Comment 5 Ilia Mirkin 2013-10-10 22:13:17 UTC
Looks like a PGRAPH init failure, here's the relevant bit:

[   34.393155] nouveau E[    PBUS][0000:01:00.0] MMIO read of 0x00000000 FAULT at 0x300000 [ IBUS ]
[   34.393201] nouveau E[   PIBUS][0000:01:00.0] GPC0: 0x419eb4 0xbadf1000 (0x3800820c)
[   36.395442] nouveau E[  PGRAPH][0000:01:00.0] HUB_INIT timed out
[   36.395451] nouveau E[  PGRAPH][0000:01:00.0] 409000 - done 0x00000244
[   36.395462] nouveau E[  PGRAPH][0000:01:00.0] 409000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
[   36.395472] nouveau E[  PGRAPH][0000:01:00.0] 409000 - stat 0x00000000 0x00000000 0x00000002 0x00000009
[   36.395477] nouveau E[  PGRAPH][0000:01:00.0] 502000 - done 0x00000300
[   36.395484] nouveau E[  PGRAPH][0000:01:00.0] 502000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
[   36.395490] nouveau E[  PGRAPH][0000:01:00.0] 502000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
[   36.395492] nouveau E[  PGRAPH][0000:01:00.0] init failed, -16
Comment 6 Joey 4712 2013-10-12 19:58:16 UTC
Think I have the same problem on my Geforce GT 750M on Asus N750JV laptop.

I'm using Manjaro Linux and tried it with Kernel 3.11 and 3.12. 

After fixing the loading of the vbios (see https://bugs.freedesktop.org/show_bug.cgi?id=70208) I get the following message:

 Failed to initialise context object: 2D_NVC0 (0)

When I extract the firmware and load the nouveau kernel module with config=NvGrUseFw=1 I'm getting this error on dmesg when starting X:

 [   43.161759] nouveau E[    PBUS][0000:01:00.0] MMIO read of 0x00000000 FAULT at 0x300000 [ IBUS ]
 [   43.161803] nouveau E[   PIBUS][0000:01:00.0] GPC0: 0x419eb4 0xbadf1000 (0x3800820c)
 [   45.164110] nouveau E[  PGRAPH][0000:01:00.0] fuc09 req 0x10 timeout
 [   45.164114] nouveau E[  PGRAPH][0000:01:00.0] init failed, -16


I will attach the full dmesg for both kernels, with and without using external firmware. Please also note that I extracted the firmware myself and I'm not sure if the generated firmware files are correct.

In addition I will send mmiotrace of the working nvidia blob driver to mmio dot dumps at gmail dot com.

If you need more information or if I can support you in any way having a closer look at this, please just let me know :-)
Comment 7 Joey 4712 2013-10-12 20:13:42 UTC
(In reply to comment #6)

Without external firmware (without config=NvGrUseFw=1) the dmesg when starting X looks like this: (very similar to the one reported above, as far as I can see)


[  969.762998] nouveau E[    PBUS][0000:01:00.0] MMIO read of 0x00000000 FAULT at 0x300000 [ IBUS ]
[  969.763048] nouveau E[   PIBUS][0000:01:00.0] GPC0: 0x419eb4 0xbadf1000 (0x3800820c)
[  971.764879] nouveau E[  PGRAPH][0000:01:00.0] HUB_INIT timed out
[  971.764886] nouveau E[  PGRAPH][0000:01:00.0] 409000 - done 0x00000244
[  971.764896] nouveau E[  PGRAPH][0000:01:00.0] 409000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
[  971.764906] nouveau E[  PGRAPH][0000:01:00.0] 409000 - stat 0x00000000 0x00000000 0x00000002 0x00000009
[  971.764909] nouveau E[  PGRAPH][0000:01:00.0] 502000 - done 0x00000300
[  971.764915] nouveau E[  PGRAPH][0000:01:00.0] 502000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
[  971.764920] nouveau E[  PGRAPH][0000:01:00.0] 502000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
[  971.764922] nouveau E[  PGRAPH][0000:01:00.0] init failed, -16
Comment 8 Joey 4712 2013-10-12 20:17:07 UTC
Created attachment 87530 [details]
Full dmesg on linux-3.12-rc4 - without config=NvGrUseFw=1
Comment 9 Joey 4712 2013-10-12 20:20:17 UTC
Created attachment 87531 [details]
Full dmesg on linux-3.12-rc4 - with config=NvGrUseFw=1
Comment 10 Joey 4712 2013-10-12 20:23:39 UTC
Created attachment 87532 [details]
Full dmesg on linux-3.11.4-1 - without config=NvGrUseFw=1
Comment 11 Joey 4712 2013-10-12 20:25:38 UTC
Created attachment 87533 [details]
Full dmesg on linux-3.11.4-1 - with config=NvGrUseFw=1
Comment 12 Joey 4712 2013-10-12 20:35:57 UTC
I've sent the mmiotrace to mmio dot dumps at gmail dot com.
Comment 13 Ivan Havlicek 2013-11-06 00:55:26 UTC
Hi,

I have same issue (kernel-3.11.6), my card is a GeForce GTX 765M (10de:11e2).
I'm also ready to help if needed, my log :

$ dmesg | grep -i -e nouveau -e drm
[    4.607166] [drm] Initialized drm 1.1.0 20060810
[    4.656791] [drm] Memory usable by graphics device = 2048M
[    4.656796] fb: conflicting fb hw usage inteldrmfb vs VESA VGA - removing generic driver
[    5.235194] fb: conflicting fb hw usage inteldrmfb vs VGA16 VGA - removing generic driver
[    5.249530] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[    5.249533] [drm] Driver supports precise vblank timestamp query.
[    5.300252] fbcon: inteldrmfb (fb0) is primary device
[    6.727359] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[    8.082531] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
[    8.084219] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
[    8.085730] nouveau  [  DEVICE][0000:01:00.0] BOOT0  : 0x0e6220a1
[    8.085770] nouveau  [  DEVICE][0000:01:00.0] Chipset: GK106 (NVE6)
[    8.085771] nouveau  [  DEVICE][0000:01:00.0] Family : NVE0
[    8.086406] nouveau  [   VBIOS][0000:01:00.0] checking PRAMIN for image...
[    8.134605] nouveau  [   VBIOS][0000:01:00.0] ... signature not found
[    8.134696] nouveau  [   VBIOS][0000:01:00.0] checking PROM for image...
[    8.309024] nouveau  [   VBIOS][0000:01:00.0] ... appears to be valid
[    8.310003] nouveau  [   VBIOS][0000:01:00.0] using image from PROM
[    8.311106] nouveau  [   VBIOS][0000:01:00.0] BIT signature found
[    8.312066] nouveau  [   VBIOS][0000:01:00.0] version 80.06.5b.00.05
[    8.313602] nouveau  [ DEVINIT][0000:01:00.0] adaptor not initialised
[    8.314594] nouveau  [   VBIOS][0000:01:00.0] running init tables
[    8.419401] nouveau  [     PFB][0000:01:00.0] RAM type: GDDR5
[    8.420382] nouveau  [     PFB][0000:01:00.0] RAM size: 2048 MiB
[    8.421371] nouveau  [     PFB][0000:01:00.0]    ZCOMP: 0 tags
[    8.445223] nouveau  [  PTHERM][0000:01:00.0] FAN control: none / external
[    8.446209] nouveau  [  PTHERM][0000:01:00.0] fan management: disabled
[    8.447206] nouveau  [  PTHERM][0000:01:00.0] internal sensor: yes
[    8.487489] nouveau  [     DRM] VRAM: 2048 MiB
[    8.488465] nouveau  [     DRM] GART: 1048576 MiB
[    8.489437] nouveau E[     DRM] Pointer to TMDS table invalid
[    8.490396] nouveau  [     DRM] DCB version 4.0
[    8.491369] nouveau E[     DRM] Pointer to flat panel table invalid
[    8.492335] nouveau W[     DRM] voltage table 0x50 unknown
[    8.493301] nouveau  [     DRM] 3 available performance level(s)
[    8.494246] nouveau  [     DRM] 0: core 202MHz shader 405MHz memory 405MHz voltage 100mV
[    8.495207] nouveau  [     DRM] 1: core 405MHz shader 810MHz memory 1080MHz voltage 80mV
[    8.496147] nouveau  [     DRM] 3: core 1002MHz shader 2004MHz memory 1080MHz voltage 40mV
[    8.497098] nouveau  [     DRM] c:
[    8.509834] nouveau  [     DRM] MM: using COPY for buffer copies
[    8.510744] [drm] Initialized nouveau 1.1.1 20120801 for 0000:01:00.0 on minor 1
[    8.645176] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000000 FAULT at 0x418880 [ IBUS ]
[    8.645188] nouveau E[   PIBUS][0000:01:00.0] GPC2: 0x419f74 0x00000555 (0x3800820c)
[    8.645321] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000001 FAULT at 0x503018 [ IBUS ]
[    8.645360] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000002 FAULT at 0x4188ac [ IBUS ]
[    8.645389] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0xbadf1008 FAULT at 0x419cc0 [ IBUS ]
[    8.645399] nouveau E[   PIBUS][0000:01:00.0] GPC2: 0x419cc0 0xbadf1008 (0x3800820c)
[    8.645547] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0xbadf1000 FAULT at 0x419eb4 [ IBUS ]
[    8.645564] nouveau E[   PIBUS][0000:01:00.0] GPC2: 0x419eb4 0xbadf1000 (0x3800820c)
[   10.648199] nouveau E[  PGRAPH][0000:01:00.0] HUB_INIT timed out
[   10.648204] nouveau E[  PGRAPH][0000:01:00.0] 409000 - done 0x00000244
[   10.648208] nouveau E[  PGRAPH][0000:01:00.0] 409000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
[   10.648212] nouveau E[  PGRAPH][0000:01:00.0] 409000 - stat 0x00000000 0x00000000 0x00000002 0x00000009
[   10.648214] nouveau E[  PGRAPH][0000:01:00.0] 502000 - done 0x00000300
[   10.648220] nouveau E[  PGRAPH][0000:01:00.0] 502000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
[   10.648226] nouveau E[  PGRAPH][0000:01:00.0] 502000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
[   10.648228] nouveau E[  PGRAPH][0000:01:00.0] 50a000 - done 0x00000300
[   10.648234] nouveau E[  PGRAPH][0000:01:00.0] 50a000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
[   10.648239] nouveau E[  PGRAPH][0000:01:00.0] 50a000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
[   10.648242] nouveau E[  PGRAPH][0000:01:00.0] 512000 - done 0x00000300
[   10.648247] nouveau E[  PGRAPH][0000:01:00.0] 512000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
[   10.648253] nouveau E[  PGRAPH][0000:01:00.0] 512000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
[   10.648255] nouveau E[  PGRAPH][0000:01:00.0] init failed, -16

which results in this error while runing Xorg :
 
[     8.621] (II) NOUVEAU(G0): Opened GPU channel 0
[    10.622] (EE) NOUVEAU(G0): Failed to initialise context object: 2D_NVC0 (0)
[    10.622] (EE) NOUVEAU(G0): Error initialising acceleration.  Falling back to NoAccel
Comment 14 Martin 2013-11-06 06:51:18 UTC
This is my bug I think, but i got some more dmesg messages, as far as I can see, I'll try to upload...
Comment 15 Martin 2013-11-06 06:52:47 UTC
Created attachment 88738 [details]
another dmesg with some more messages at the end
Comment 16 Emil Velikov 2013-11-06 11:46:14 UTC
(In reply to comment #14)
> This is my bug I think, but i got some more dmesg messages, as far as I can
> see, I'll try to upload...

Your card and dmesg output is very different from the one(s) concentrated in this bug report.
I would suggest taking a look at our wiki [1] [2] and filling a separate bug.

[1] http://nouveau.freedesktop.org/wiki/Bugs/
[2] http://nouveau.freedesktop.org/wiki/TroubleShooting/
Comment 17 Ilia Mirkin 2013-12-10 13:12:05 UTC
If any of you are up for it, there's a HUB_INIT timeout fix in Ben's repository. It's set up as an out-of-tree module:

git clone git://people.freedesktop.org/~darktama/nouveau
cd nouveau/drm; make

That should create a nouveau.ko that you can use. The tree it will compile against probably has to be 3.13-rc1+. (You can also port the patches over to the linux tree with a bunch of sed work.)

P.S. The patch in question is: http://cgit.freedesktop.org/~darktama/nouveau/commit/?id=0c463bb767a0af0781256ec118c5890077c2f46c
Comment 18 Joey 4712 2013-12-14 09:41:40 UTC
Created attachment 90760 [details]
Full dmesg on linux-3.13.0-1 - loading nouveau.ko built from Ben's repository

Thanks or your information about the HUB_INIT timeout fix in Ben's repository.

I tried it for Geforce GT 750M on Asus N750JV laptop on linux-3.13.0-1 but the module fails to load with a lot of messages like "nouveau: Unknown symbol ttm_bo_mmap (err 0)".

Attaching full dmesg output.
Comment 19 Joey 4712 2013-12-14 10:26:37 UTC
Sorry for posting too quickly: found it myself, I only had to "modprobe ttm" first to fix these "Unknown symbol ttm_bo_mmap..." errors.

No the nouveau.ko from ben's repo loads successfully but I'm getting the HUB_INIT timed out message again:

[   99.171109] [drm] Initialized nouveau 1.1.1 20120801 for 0000:01:00.0 on minor 1
[   99.314206] nouveau E[    PBUS][0000:01:00.0] MMIO read of 0x00000000 FAULT at 0x300000 [ IBUS ]
[   99.314232] nouveau E[   PIBUS][0000:01:00.0] GPC0: 0x419eb4 0xbadf1000 (0x3800820c)
[   99.314464] nouveau E[   PIBUS][0000:01:00.0] HUB0: 0x404170 0x00000012 (0x0e008201)
[  101.316059] nouveau E[  PGRAPH][0000:01:00.0] HUB_INIT timed out
Comment 20 Joey 4712 2013-12-14 11:18:18 UTC
Btw: The commit message of the patch says it is for GK110/GK208 but the GT 750M this bug is about is a GK107, am I right? Should the patch also work for GK107 or is it possible to make it work for GK107 as well?
Comment 21 Ben Skeggs 2014-03-05 04:47:03 UTC
Created attachment 95131 [details] [review]
probably "fix"
Comment 22 Joey 4712 2014-03-05 21:24:51 UTC
Thanks soooo much.

It's working now with you patch applied against Linux 3.13 Kernel source on Manjaro Linux.

  $ xrandr --setprovideroffloadsink nouveau Intel
  $ DRI_PRIME=1 glxgears -info
  Running synchronized to the vertical refresh.  The framerate should be
  approximately the same as the monitor refresh rate.
  GL_RENDERER   = Gallium 0.4 on NVE7
  GL_VERSION    = 3.0 Mesa 10.0.3
  GL_VENDOR     = nouveau

Do you need any more info? Like dmesg of the now working nouveau setup?

Once again, thanks to all of you!
Comment 23 Richard 2014-03-07 21:17:54 UTC
Same issue on a Lenovo Y510P. Tried compiling the module from Ben's git sources in comment 17 on a 3.13.5 kernel, but it complains:

/hidden_tmp/nouveau/drm/core/subdev/mxm/base.c:109:2: error: implicit declaration of function ‘acpi_evaluate_dsm’ [-Werror=implicit-function-declaration]

grep -R acpi_evaluate_dsm /usr/src/linux/* comes up with nothing, so I'm guessing this function disappeared?

Also tried simply applying the patch (link in comment 17) to a copy of my source tree, and I seem to be missing (at least) the "engine/graph/fuc/hubnv108.fuc5.h" file in my source tree.
Comment 24 Ilia Mirkin 2014-03-07 21:24:30 UTC
(In reply to comment #23)
> Same issue on a Lenovo Y510P. Tried compiling the module from Ben's git
> sources in comment 17 on a 3.13.5 kernel, but it complains:
> 
> /hidden_tmp/nouveau/drm/core/subdev/mxm/base.c:109:2: error: implicit
> declaration of function ‘acpi_evaluate_dsm’
> [-Werror=implicit-function-declaration]
> 
> grep -R acpi_evaluate_dsm /usr/src/linux/* comes up with nothing, so I'm
> guessing this function disappeared?

More like 'appeared'. That tree is against the ~latest kernel (3.14-rcX right now).

> 
> Also tried simply applying the patch (link in comment 17) to a copy of my
> source tree, and I seem to be missing (at least) the
> "engine/graph/fuc/hubnv108.fuc5.h" file in my source tree.

Check the patch in comment 21 (attachment 95131 [details] [review]). It's not in Ben's git repo, since he's not sure what effect it'll have on other cards.
Comment 25 D. Moens 2014-03-18 13:43:52 UTC
Experiencing the same issue on a Dell M4800 QHD+ (NVE6).

- Fedora rawhide :
kernel-3.14.0-0.rc7.git0.1.fc21.x86_64
xorg-x11-server-Xorg-1.15.0-5.fc21.x86_64
xorg-x11-drv-nouveau-1.0.10-1.fc21.x86_64

- lscpi :
01:00.0 VGA compatible controller: NVIDIA Corporation GK106GLM [Quadro K2100M] (rev a1)

- Kernel command line:
BOOT_IMAGE=/vmlinuz-rawhide-nouveau root=/dev/mapper/vg01-rootfs4 ro rd.lvm.lv=vg01/rootfs4 vconsole.font=latarcyrheb-sun16 LANG=en_US.UTF-8 nouveau.debug=PDISP=debug,VBIOS=trace drm.debug=0xe


- Recompiled latest http://cgit.freedesktop.org/~darktama/nouveau/ as of 2014-03-18 :
Comment 26 D. Moens 2014-03-18 19:58:28 UTC
(In reply to comment #25)
> Experiencing the same issue on a Dell M4800 QHD+ (NVE6).

Argh, my apologies.
Please disregard comment #25 (filed under bug #76319).
Comment 27 D. Moens 2014-04-03 10:08:30 UTC
Created attachment 96836 [details]
dmesg | egrep -i "nouveau|drm"

Testing with GK106GLM [Quadro K2100M], QHD+ (3200x1800) screen.

MMIO FAULT is reproducible with kernel 3.14 & darktama's git  633e16bb8571071b9da8ed03513a2266cbf21eb5.
Comment 28 Paul Bredbury 2014-05-25 11:21:25 UTC
Seems fixed with kernel 3.15-rc6, and in bootloader:

rcutree.rcu_idle_gp_delay=1
Comment 29 buhman 2014-08-21 22:20:12 UTC
*** Bug 80627 has been marked as a duplicate of this bug. ***
Comment 30 Karol Herbst 2014-08-29 09:20:28 UTC
I have the strange problem, that it sometimes seems to not happen. Uploading a kernel log file for that case. But every other 9 in 10 times I got this HUB_INIT timeout instead.
Comment 31 Karol Herbst 2014-08-29 09:21:51 UTC
Created attachment 105421 [details]
kernel log file in case it doesn't HUB_INIT timeout
Comment 32 Karol Herbst 2014-09-27 17:04:52 UTC
Created attachment 106964 [details]
mmiotrace when module loads sucessfully
Comment 33 Karol Herbst 2014-09-27 17:11:10 UTC
Created attachment 106965 [details]
mmiotrace when failing

the diff between both traces is quite interessting. Both happend on the same machine, the same kernel and the same nouveau.ko file on the same gpu.

Especially the part in the failing trace starting with this:

"[0] 0.000000, FB32 28c <= 1
[0] 0.000000, FB32 219010 => 0
[0] 0.000000, FB32 219010 => 0
[0] 0.000000, FB32 219010 => 0
[0] 0.000000, FB32 219010 => 0
[0] 0.000000, FB32 219010 => 0
[0] 0.000000, FB32 219010 => 0
[0] 0.000000, FB32 219010 => 0
[0] 0.000000, FB32 219010 => 0
[0] 0.000000, FB32 219010 => 0
[0] 0.000000, FB32 219010 => 0
[0] 0.000000, FB32 219010 => 0
... until timeout
"

because this part never happen in the "working" trace
Comment 34 bruno.pagani 2015-01-01 21:40:09 UTC
Per bug 87942, it seems I’m affected too (Dell XPS 9530, GT750M).

And rcutree.rcu_idle_gp_delay=1 doesn’t fix this for me, I’m running with this for a while because nvidia needs it, but it doesn’t change anything for nouveau.

Is Ben’s patch still working for everyone here (I have not tested it myself, since I’m not sure on how to do so), and if so is there a chance it will land somewhere in the future?

Or is there another solution to try?
Comment 35 nshp 2015-01-02 22:20:34 UTC
Bruno, the patch does not seem to help (at least for me).
I also have this issue with GK107. Not sure if there's anything useful in this kernel log, I just tried ignoring the HUB_INIT timeout to see if it spat out anything more interesting: http://ix.io/fzc
Comment 36 Vitaly Torshyn 2015-01-18 13:32:05 UTC
Created attachment 112418 [details]
Xorg log
Comment 37 Vitaly Torshyn 2015-01-18 13:32:31 UTC
Created attachment 112419 [details]
dmesg
Comment 38 Vitaly Torshyn 2015-01-18 13:32:58 UTC
Created attachment 112420 [details]
lspci -v
Comment 39 Vitaly Torshyn 2015-01-18 13:37:12 UTC
Hi folks,
Please see previously attached dmesg and xorg logs.
I was using Ben's git repo to apply a patch for 3.18.2 kernel.
Actually, the patch is only removed INIT HUB error. In addition, X was managed to start with AIGLX enabled but nouveau crashed on start any GL enabled application.
Please note, the same behavior with SLI enabled. 
HW is Lenovo Y510P with dual GT 755M video.

Please, feel free to request additional information.
Comment 40 aidan 2015-01-18 19:07:14 UTC
(In reply to Vitaly Torshyn from comment #39)
> Hi folks,
> Please see previously attached dmesg and xorg logs.
> I was using Ben's git repo to apply a patch for 3.18.2 kernel.
> Actually, the patch is only removed INIT HUB error. In addition, X was
> managed to start with AIGLX enabled but nouveau crashed on start any GL
> enabled application.
> Please note, the same behavior with SLI enabled. 
> HW is Lenovo Y510P with dual GT 755M video.
> 
> Please, feel free to request additional information.

I have a similar laptop, and I think the X crashing is this (https://bugs.freedesktop.org/show_bug.cgi?id=88514) issue.
Comment 41 Karol Herbst 2015-01-19 09:54:03 UTC
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/?id=74b51ee152b6d99e61ba329799a039453fb9438f this kernel patch fixed the PGRAPH timeout issue
Comment 42 Karol Herbst 2015-01-19 10:01:49 UTC
at least this patch made it more likely to suceed for me.
Comment 43 Vitaly Torshyn 2015-01-19 15:58:59 UTC
(In reply to aidan from comment #40)
> (In reply to Vitaly Torshyn from comment #39)
> > Hi folks,
> > Please see previously attached dmesg and xorg logs.
> > I was using Ben's git repo to apply a patch for 3.18.2 kernel.
> > Actually, the patch is only removed INIT HUB error. In addition, X was
> > managed to start with AIGLX enabled but nouveau crashed on start any GL
> > enabled application.
> > Please note, the same behavior with SLI enabled. 
> > HW is Lenovo Y510P with dual GT 755M video.
> > 
> > Please, feel free to request additional information.
> 
> I have a similar laptop, and I think the X crashing is this
> (https://bugs.freedesktop.org/show_bug.cgi?id=88514) issue.

I don't think so. The X crashing even with second nvidia GPU installed. In case intel gpu is turned off by notebooks firmware.
Comment 44 Vitaly Torshyn 2015-01-19 16:02:26 UTC
(In reply to Karol Herbst from comment #42)
> at least this patch made it more likely to suceed for me.

Have you tested it? Are sure it doesn't break other ACPI related stuff?
It's weird to fix ACPI stuff for our cards.
Also, please respond if you applied this fix with Ben's fixes.
Comment 45 bruno.pagani 2015-01-19 16:14:34 UTC
(In reply to Vitaly Torshyn from comment #43)
> (In reply to aidan from comment #40)
> > (In reply to Vitaly Torshyn from comment #39)
> > > Hi folks,
> > > Please see previously attached dmesg and xorg logs.
> > > I was using Ben's git repo to apply a patch for 3.18.2 kernel.
> > > Actually, the patch is only removed INIT HUB error. In addition, X was
> > > managed to start with AIGLX enabled but nouveau crashed on start any GL
> > > enabled application.
> > > Please note, the same behavior with SLI enabled. 
> > > HW is Lenovo Y510P with dual GT 755M video.
> > > 
> > > Please, feel free to request additional information.
> > 
> > I have a similar laptop, and I think the X crashing is this
> > (https://bugs.freedesktop.org/show_bug.cgi?id=88514) issue.
> 
> I don't think so. The X crashing even with second nvidia GPU installed. In
> case intel gpu is turned off by notebooks firmware.

I don’t understand your message Vitaly, but I think he is right, this is the same bug. I’m having the same symptoms he has.
Comment 46 Vitaly Torshyn 2015-01-19 16:21:57 UTC
(In reply to bruno.pagani from comment #45)
> (In reply to Vitaly Torshyn from comment #43)
> > (In reply to aidan from comment #40)
> > > (In reply to Vitaly Torshyn from comment #39)
> > > > Hi folks,
> > > > Please see previously attached dmesg and xorg logs.
> > > > I was using Ben's git repo to apply a patch for 3.18.2 kernel.
> > > > Actually, the patch is only removed INIT HUB error. In addition, X was
> > > > managed to start with AIGLX enabled but nouveau crashed on start any GL
> > > > enabled application.
> > > > Please note, the same behavior with SLI enabled. 
> > > > HW is Lenovo Y510P with dual GT 755M video.
> > > > 
> > > > Please, feel free to request additional information.
> > > 
> > > I have a similar laptop, and I think the X crashing is this
> > > (https://bugs.freedesktop.org/show_bug.cgi?id=88514) issue.
> > 
> > I don't think so. The X crashing even with second nvidia GPU installed. In
> > case intel gpu is turned off by notebooks firmware.
> 
> I don’t understand your message Vitaly, but I think he is right, this is the
> same bug. I’m having the same symptoms he has.

Sorry, I wasn't clear in my earlier message. I have an SLI enabled laptop and second GPU can be removed. Without second GPU intel's GPU is turned on by BIOS (firmware). 
So, X crashed with/without second GPU.
Comment 47 bruno.pagani 2015-01-19 16:48:04 UTC
(In reply to Vitaly Torshyn from comment #46)
> (In reply to bruno.pagani from comment #45)
> > (In reply to Vitaly Torshyn from comment #43)
> > > (In reply to aidan from comment #40)
> > > > (In reply to Vitaly Torshyn from comment #39)
> > > > > Hi folks,
> > > > > Please see previously attached dmesg and xorg logs.
> > > > > I was using Ben's git repo to apply a patch for 3.18.2 kernel.
> > > > > Actually, the patch is only removed INIT HUB error. In addition, X was
> > > > > managed to start with AIGLX enabled but nouveau crashed on start any GL
> > > > > enabled application.
> > > > > Please note, the same behavior with SLI enabled. 
> > > > > HW is Lenovo Y510P with dual GT 755M video.
> > > > > 
> > > > > Please, feel free to request additional information.
> > > > 
> > > > I have a similar laptop, and I think the X crashing is this
> > > > (https://bugs.freedesktop.org/show_bug.cgi?id=88514) issue.
> > > 
> > > I don't think so. The X crashing even with second nvidia GPU installed. In
> > > case intel gpu is turned off by notebooks firmware.
> > 
> > I don’t understand your message Vitaly, but I think he is right, this is the
> > same bug. I’m having the same symptoms he has.
> 
> Sorry, I wasn't clear in my earlier message. I have an SLI enabled laptop
> and second GPU can be removed. Without second GPU intel's GPU is turned on
> by BIOS (firmware). 
> So, X crashed with/without second GPU.

OK, I think you’re the only one with a SLI laptop. Every one else here seems to have an Intel+NVIDIA setup, without being able to deactivate Intel.
Comment 48 Karol Herbst 2015-01-20 19:31:57 UTC
(In reply to Vitaly Torshyn from comment #44)
> (In reply to Karol Herbst from comment #42)
> > at least this patch made it more likely to suceed for me.
> 
> Have you tested it? Are sure it doesn't break other ACPI related stuff?
> It's weird to fix ACPI stuff for our cards.
> Also, please respond if you applied this fix with Ben's fixes.

no, I don't use Ben's fixes, because I wanted to try this patch out allone. The thing is, that for me nouveau fails most of the time loading my card, but with this patch it seems to be more likely to succeed, allthough it may be not related, but it kinda helped me.

On a side note: if it succeeds the kernel moudle is loaded under a second, so it is far bellow the timeout.
Comment 49 Karol Herbst 2015-01-20 19:36:02 UTC
(In reply to Vitaly Torshyn from comment #44)
> (In reply to Karol Herbst from comment #42)
> > at least this patch made it more likely to suceed for me.
> 
> Have you tested it? Are sure it doesn't break other ACPI related stuff?
> It's weird to fix ACPI stuff for our cards.
> Also, please respond if you applied this fix with Ben's fixes.

Also note that rcutree.rcu_idle_gp_delay=1 helped for some and the kernel patch tries to make this not needed anymore. So it kinda makes sense, even if not for all.
Comment 50 Karol Herbst 2015-01-20 19:45:12 UTC
And no, Ben's patch doesn't help me at all.
Comment 51 Richard 2015-01-26 19:43:44 UTC
(In reply to bruno.pagani from comment #47)
> OK, I think you’re the only one with a SLI laptop. Every one else here seems
> to have an Intel+NVIDIA setup, without being able to deactivate Intel.
I wouldn't say he's the only one... I have the same setup myself (comment 23). I have noted that the nouveau module loads reliably with no patches since at least kernel 3.16.x (well after my original post), but it does go nuts whenever I do anything with opengl. I haven't attempted to capture logs when it does this - it shoots the CPU up to near 100% and becomes nearly unresponsive. I usually have to shut it down hard to recover (and I'm careful now to only load the nvidia driver when I intend to do opengl, and nouveau at all other times).

(Not related to this bug: I haven't been able to get it to fire up with SLI enabled (even with nvidia driver) outside of windows. Vitaly, was there anything special you had to do to get SLI working on your Y510P?)
Comment 52 Richard 2015-02-23 19:11:07 UTC
I think I should update my post... I should say that nouveau loads reliably, but I still get the HUB_INIT timeout error from the OP. Applying the patch does not fix it. The driver works for 2D applications, but 3D applications cause it to hang the GPU.
Comment 53 bruno.pagani 2015-03-21 22:02:47 UTC
Using 3.19, little changes from RCU patches, but still no luck.

[  173.852656] nouveau  [  DEVICE][0000:02:00.0] BOOT0  : 0x0e7240a2
[  173.852658] nouveau  [  DEVICE][0000:02:00.0] Chipset: GK107 (NVE7)
[  173.852659] nouveau  [  DEVICE][0000:02:00.0] Family : NVE0
[  173.867029] nouveau  [   VBIOS][0000:02:00.0] using image from ACPI
[  173.867119] nouveau  [   VBIOS][0000:02:00.0] BIT signature found
[  173.867121] nouveau  [   VBIOS][0000:02:00.0] version 80.07.b3.00.21
[  173.867339] nouveau  [ DEVINIT][0000:02:00.0] adaptor not initialised
[  173.867370] nouveau  [   VBIOS][0000:02:00.0] running init tables
[  174.004784] nouveau  [     PMC][0000:02:00.0] MSI interrupts enabled
[  174.004849] nouveau  [     PFB][0000:02:00.0] RAM type: GDDR5
[  174.004850] nouveau  [     PFB][0000:02:00.0] RAM size: 2048 MiB
[  174.004851] nouveau  [     PFB][0000:02:00.0]    ZCOMP: 0 tags
[  174.004927] nouveau E[    PBUS][0000:02:00.0] MMIO write of 0x00000002 FAULT at 0x4188ac [ IBUS ]
[  174.006964] nouveau  [    VOLT][0000:02:00.0] GPU voltage: 600000uv
[  174.056647] nouveau  [  PTHERM][0000:02:00.0] FAN control: none / external
[  174.056662] nouveau  [  PTHERM][0000:02:00.0] fan management: automatic
[  174.056687] nouveau  [  PTHERM][0000:02:00.0] internal sensor: yes
[  174.056753] nouveau  [     CLK][0000:02:00.0] 07: core 405 MHz memory 810 MHz 
[  174.056811] nouveau  [     CLK][0000:02:00.0] 0a: core 405-1058 MHz memory 1600 MHz 
[  174.056887] nouveau  [     CLK][0000:02:00.0] 0f: core 405-1058 MHz memory 5000 MHz 
[  174.057024] nouveau  [     CLK][0000:02:00.0] --: core 405 MHz memory 810 MHz 
[  174.091539] nouveau E[    PBUS][0000:02:00.0] MMIO read of 0x00000000 FAULT at 0x500c30 [ IBUS ]
[  174.091572] vga_switcheroo: enabled
[  174.091701] [TTM] Zone  kernel: Available graphics memory: 8169492 kiB
[  174.091702] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[  174.091703] [TTM] Initializing pool allocator
[  174.091706] [TTM] Initializing DMA pool allocator
[  174.091713] nouveau  [     DRM] VRAM: 2048 MiB
[  174.091714] nouveau  [     DRM] GART: 1048576 MiB
[  174.091716] nouveau E[     DRM] Pointer to TMDS table invalid
[  174.091717] nouveau  [     DRM] DCB version 4.0
[  174.091718] nouveau E[     DRM] Pointer to flat panel table invalid
[  174.098167] nouveau  [     DRM] MM: using COPY for buffer copies
[  174.098177] [drm] Initialized nouveau 1.2.1 20120801 for 0000:02:00.0 on minor 1
[  174.116049] nouveau E[    PBUS][0000:02:00.0] MMIO write of 0x00000000 FAULT at 0x418880 [ IBUS ]
[  174.116122] nouveau E[    PBUS][0000:02:00.0] MMIO write of 0x00000000 FAULT at 0x418e08 [ IBUS ]
[  174.116138] nouveau E[   PIBUS][0000:02:00.0] GPC0: 0x419f74 0x00000555 (0x3800820c)
[  174.116157] nouveau E[    PBUS][0000:02:00.0] MMIO write of 0x00000010 FAULT at 0x418980 [ IBUS ]
[  174.116175] nouveau E[    PBUS][0000:02:00.0] MMIO write of 0x00000002 FAULT at 0x4188ac [ IBUS ]
[  174.116186] nouveau E[    PBUS][0000:02:00.0] MMIO write of 0xbadf1008 FAULT at 0x419cc0 [ IBUS ]
[  174.116202] nouveau E[   PIBUS][0000:02:00.0] GPC0: 0x419cc0 0xbadf1008 (0x3800820c)
[  174.116235] nouveau E[    PBUS][0000:02:00.0] MMIO write of 0xbadf1000 FAULT at 0x419eb4 [ IBUS ]
[  174.116250] nouveau E[   PIBUS][0000:02:00.0] GPC0: 0x419eb4 0xbadf1000 (0x3800820c)
[  174.116267] nouveau E[    PBUS][0000:02:00.0] MMIO write of 0x00000000 FAULT at 0x405804 [ IBUS ]
[  174.116284] nouveau E[    PBUS][0000:02:00.0] MMIO write of 0x3f800000 FAULT at 0x405804 [ IBUS ]
[  174.116319] nouveau E[    PBUS][0000:02:00.0] MMIO write of 0x00000000 FAULT at 0x405804 [ IBUS ]
[  174.116345] nouveau E[    PBUS][0000:02:00.0] MMIO write of 0x3f800000 FAULT at 0x405804 [ IBUS ]
[  174.116359] nouveau E[    PBUS][0000:02:00.0] MMIO write of 0x00000000 FAULT at 0x405818 [ IBUS ]
[  174.116405] nouveau E[    PBUS][0000:02:00.0] MMIO write of 0x3f800000 FAULT at 0x405818 [ IBUS ]
[  174.116422] nouveau E[    PBUS][0000:02:00.0] MMIO write of 0x00000009 FAULT at 0x405820 [ IBUS ]
[  174.116606] nouveau E[   PIBUS][0000:02:00.0] HUB0: 0x404170 0x00000012 (0x0e008201)
[  176.118600] nouveau E[  PGRAPH][0000:02:00.0] HUB_INIT timed out
[  176.118607] nouveau E[  PGRAPH][0000:02:00.0] 409000 - done 0x00000204
[  176.118616] nouveau E[  PGRAPH][0000:02:00.0] 409000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
[  176.118626] nouveau E[  PGRAPH][0000:02:00.0] 409000 - stat 0x00000000 0x00000000 0x00000002 0x00000009
[  176.118629] nouveau E[  PGRAPH][0000:02:00.0] 502000 - done 0x00000300
[  176.118634] nouveau E[  PGRAPH][0000:02:00.0] 502000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
[  176.118638] nouveau E[  PGRAPH][0000:02:00.0] 502000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
[  176.118640] nouveau E[  PGRAPH][0000:02:00.0] init failed, -16