Bug 86537 - [NV96] Unable to handle page request from nv50_crtc_(prepare,disable) on dual Nvidia setup
Summary: [NV96] Unable to handle page request from nv50_crtc_(prepare,disable) on dual...
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-11-21 21:39 UTC by Pierre Moreau
Modified: 2015-10-12 18:14 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (129.76 KB, text/plain)
2014-11-21 21:39 UTC, Pierre Moreau
no flags Details
journalctl when I modprobe nouveau (20.34 KB, text/plain)
2014-11-24 13:02 UTC, l3iggs
no flags Details
Kmesg with EVO debug messages (262.19 KB, text/plain)
2015-03-08 19:45 UTC, Pierre Moreau
no flags Details
Disable bits 5 and 6 of 0x8841c (818 bytes, patch)
2015-09-01 21:50 UTC, Pierre Moreau
no flags Details | Splinter Review
Enable EXT_TAG if hardware allows it (986 bytes, patch)
2015-09-02 01:16 UTC, Pierre Moreau
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Pierre Moreau 2014-11-21 21:39:46 UTC
Created attachment 109818 [details]
dmesg

Hardware:
* 9600M GT (NV96)
* 9400M (NVAC)

Tested on something newer than 3.17-rc5.
Though it is not visible in the dmesg, I set "config=NvForcePost=1" which had no effect on the issue.

When using the NV96 to drive the display - this is set before booting -, a BUG: unable to handle kernel page request is thrown by nv50_crtc_prepare after the fb was allocated for the NV96 - the NVAC wasn't initialised yet.
Comment 1 l3iggs 2014-11-21 22:27:49 UTC
I've also experienced this bug with my MacBook Pro 5,1 (same GPUs as parent). 

I've seen it with linux 3.18 rc5 when I run 'modprobe nouveau' : http://hastebin.com/qusumotusi.m
Comment 2 Emil Velikov 2014-11-22 04:35:52 UTC
l3iggs for the future please try to attach logs (in text/plain format) here, as links to other sites tend to expire. Esp. hastebin ones :P
Comment 3 l3iggs 2014-11-24 13:02:59 UTC
Created attachment 109938 [details]
journalctl when I modprobe nouveau

I've attached my journalctl output when I 'modprobe nouveau' on my MacBook Pro 5,1 which has GeForce 9400M and GeForce 9600M GPUs.

This bug generally occurs during the boot process which prevents the machine from booting.
Comment 4 l3iggs 2014-11-24 13:22:40 UTC
I've found a workaround for this bug (tested with 3.18-rc5):

(1) Append 'modprobe.blacklist=nouveau' to the kernel boot parameters
This allows the system to boot.
(2) Once booted, as root, reset the 9600M GPU: 'echo 1 > /sys/bus/pci/devices/0000:02:00.0/reset'
(3) 'echo 1 > /sys/bus/pci/devices/0000:02:00.0/rescan'
(4) 'modprobe nouveau'

Note that 0000:02:00.0 is the PCI bus address of my 9600M GT GPU, which I learned by inspecting the output of lspci.

The trick here is getting the GPU into the proper state before the nouveau module is loaded. I've found that the internal state of the GPU can frequently be inconsistent across reboots making troubleshooting difficult. Resetting and then rescanning it before loading nouveau seems to solve things for every scenario I've tested.

With the above workaround I can start a display manager (GDM for gnome 3.14.2) and everything seems to work as expected, ex. glxgears.

The 9400M GPU can optionally be disabled between steps (3) and (4): 'echo 1 > /sys/bus/pci/devices/0000:03:00.0/remove'

where 0000:03:00.0 is the PCI address of my 9400M.
Comment 5 Pierre Moreau 2014-11-26 17:34:26 UTC
(In reply to l3iggs from comment #4)
> I've found a workaround for this bug (tested with 3.18-rc5):
> 
> (1) Append 'modprobe.blacklist=nouveau' to the kernel boot parameters
> This allows the system to boot.
> (2) Once booted, as root, reset the 9600M GPU: 'echo 1 >
> /sys/bus/pci/devices/0000:02:00.0/reset'
> (3) 'echo 1 > /sys/bus/pci/devices/0000:02:00.0/rescan'
> (4) 'modprobe nouveau'

It does work indeed, and seems to solve some other issues related to that card (powering it down using vgaswitcheroo, suspending the laptop - resuming doesn't work though).

> 
> Note that 0000:02:00.0 is the PCI bus address of my 9600M GT GPU, which I
> learned by inspecting the output of lspci.
> 
> The trick here is getting the GPU into the proper state before the nouveau
> module is loaded. I've found that the internal state of the GPU can
> frequently be inconsistent across reboots making troubleshooting difficult.
> Resetting and then rescanning it before loading nouveau seems to solve
> things for every scenario I've tested.
> 
> With the above workaround I can start a display manager (GDM for gnome
> 3.14.2) and everything seems to work as expected, ex. glxgears.
> 
> The 9400M GPU can optionally be disabled between steps (3) and (4): 'echo 1
> > /sys/bus/pci/devices/0000:03:00.0/remove'
> 
> where 0000:03:00.0 is the PCI address of my 9400M.

vgaswitcheroo should be able to connect the integrated to the display (look at http://nouveau.freedesktop.org/wiki/Optimus/) using apple-gmux driver.
Comment 6 Pierre Moreau 2015-03-08 19:45:39 UTC
Created attachment 114138 [details]
Kmesg with EVO debug messages

Still an issue. I've been playing with PDISPLAY and PFB regs by following the blob, but never managed to make it work.
Comment 7 Pierre Moreau 2015-04-04 12:32:49 UTC

*** This bug has been marked as a duplicate of bug 82714 ***
Comment 8 Pierre Moreau 2015-07-05 22:37:02 UTC
Reopening this bug as its core issue is different from the other bug's one.

Setting the 8 LSB of register 0x8841c (PPCI.CONFIG) to 0 makes it possible to interact with the card normally, without any crashes. Though after powering off the G96, switching to it from the MCP79 will result in a blank screen: no error message outputed, no hang, just a blank screen. I'll push a hacky patch for those who wants to use their G96, or power it down; you may alternatively run `nvapoke -c0 0x8841c 28400` **before** loading Nouveau.

This bit resetting is also done by the PCI reset step.
Comment 9 Pierre Moreau 2015-09-01 21:50:36 UTC
Created attachment 118038 [details] [review]
Disable bits 5 and 6 of 0x8841c

The extremely hacky attached patch gets the laptop into a working state: being able to load Nouveau and X properly, switch between cards, and power down/up the inactive card.
Comment 10 Pierre Moreau 2015-09-02 01:16:39 UTC
Created attachment 118042 [details] [review]
Enable EXT_TAG if hardware allows it

Apparently enabling EXT_TAG, as the hardware supports it, is enough to fix the bug. So apparently, bits 5 and 6 can only be enabled if EXT_TAG is also enabled?


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.