Bug 58556 - MacBook Pro 5,1 with nVidia 9400m and 9600m, scrambled screen
Summary: MacBook Pro 5,1 with nVidia 9400m and 9600m, scrambled screen
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) other
: medium major
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-12-20 07:23 UTC by Joanand
Modified: 2014-09-24 06:53 UTC (History)
5 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Kernel messages with nouveau verbose debug enabled (491.34 KB, text/plain)
2013-07-23 22:27 UTC, Matyas
no flags Details

Description Joanand 2012-12-20 07:23:51 UTC
System:
MacBook Pro 15", late 2008 (Core2 duo, 4GB RAM, 500GB HDD)
nVidia 9600m GT at PCI=02:00.0
nVidia 9400m at PCI=03:00.0

OS:
Gentoo Linux, kernel gentoo-sources-3.7.1
Booting with EFI-Stub (by copying kernel-image to first fat-partition/EFI/BOOT/BOOTX86.efi)

Problems:
- System "crashes" when nouveau is loaded without any parameters. It would be nice if one can set individual parameters for each hardware (as in "modprobe nouveau -dev-pci=03:00.0 -noaccel=1" and "modprobe nouveau -dev-pci=02:00.0").
- EFIFB is required to get nouveau to bind panel to graphic-adapter(s). If EFIFB is not loaded, nouveau selects no graphic adapter and swiches off display on "echo OFF > ...".
- EFIFB handover seems to be broken: As soon as nouveau loads, screen gets scrambled on the graphic adapter which is active. Swiching to the other (e.g. "echo DIGD > /sys...") the screen is working fine. Swiching back does not solve the problem.


Kernel is compiled with EFIFB, otherwise nouveau is unable to bind panel to any of the two adapters.

Resolution for kernels < 3.4.9: I made use of gpupwr to disable the discrete adapter and then loaded nouveau (without any parameters). The system was "programmed" to use 9400m as it starts. This worked quite fine over long time.

Temporary resolution for 3.7.1: Use MacOSX to change the default adapter to the one which is "not" desired. Reboot, load nouveau with noaccel=1 (now screen gets scrambled), switch to the other device (in my case "echo DIGD > /sys/kernel/debug/vgaswitcheroo/switch"). Voila you have a readable screen.
Comment 1 John Flatness 2013-02-01 18:31:03 UTC
I have the exact same symptoms on kernel 3.7.4, and nouveau has been that way for many kernel versions. It's never worked with this system (9600M GT/9400M), whether in-tree or out.

My setup is slightly different, as I'm booting with grub2, not the EFI stub, but the behavior is exactly the same: a scrambled, non-updating screen after the handover from efifb. I've never tried using vgaswitcheroo by typing blindly, though.
Comment 2 Jamie Macdonald 2013-07-17 21:59:32 UTC
This is still an issue with Linux 3.9.9
on my Macbook Pro 17" Early 2009 (5,2)

According to the Wikipedia article: https://en.wikipedia.org/wiki/Macbook_pro , MBP 5,1 (15" late 2008/early 2009) and MBP 5,2 (17" early 2009) share the same graphics setup.

MBP 5,1 5,3 5,4 5,5 (13", 15", 17" mid 2009) look the same too.
Comment 3 Matyas 2013-07-23 21:56:27 UTC
I am also experiencing this issue.

I am booting the kernel EFI stub using refind on my macbook 5,1.
This model is fitted only with nvidia 9400M.

I use nouveau.noaccel=1 as a boot parameter.
This way I at least get a scrambled screen after the efifb handover.
Otherwise the screen would just freeze.

When I can see that the init script finished, I blindly log in and type pm-suspend.
Than I press the power button to wake from suspend.
Now, I am presented with a usable screen, but not accelerated.
I can also run X.

I am running Gentoo. Tried vanilla and git kernels in and out of tree without luck, from version 3.7.4 till 3.10.1

Clearly it is a handover problem. If there is any useful information I can provide let me know how.
Comment 4 Matyas 2013-07-23 22:27:33 UTC
Created attachment 82879 [details]
Kernel messages with nouveau verbose debug enabled

I remembered that I had nouveau debug on already, so here are the kernel messages. Kernel: vanilla 3.10.0 from Gentoo Packages
Comment 5 Emil Velikov 2013-07-23 22:52:07 UTC
Great find Matyas

If you're interested in cutting short the sequence you can forcepost the card, thus it should have you the suspend/resume trick. Use
nouveau.config=NvForcePost=1

Curious if the above will give you acceleration or only a working output/monitor
(ie. try it with and without noaccel=1)

Cheers
Comment 6 Joanand 2013-07-24 11:58:49 UTC
(In reply to comment #5)
> Great find Matyas
> 
> If you're interested in cutting short the sequence you can forcepost the
> card, thus it should have you the suspend/resume trick. Use
> nouveau.config=NvForcePost=1
> 
> Curious if the above will give you acceleration or only a working
> output/monitor
> (ie. try it with and without noaccel=1)
> 
> Cheers

Hi,
On my MBP, if I do not use noaccel=1, then the whole system crashes.

I have tried NvForcePost=1 with noaccel, the result was a switched off screen, but system seems to boot.
With accel, screen switches off and crashes.

Is there any config which would activate accel for 9400 but no accel for 9600? This would be very helpful for me.

Thanks.
Comment 7 Emil Velikov 2013-07-24 13:17:29 UTC
Hi Joanand

(In reply to comment #0)
> Resolution for kernels < 3.4.9: I made use of gpupwr to disable the discrete
> adapter and then loaded nouveau (without any parameters). The system was
> "programmed" to use 9400m as it starts. This worked quite fine over long
> time.
> 
"Worked" with our without acceleration ?

> Temporary resolution for 3.7.1: Use MacOSX to change the default adapter to
> the one which is "not" desired. Reboot, load nouveau with noaccel=1 (now
> screen gets scrambled), switch to the other device (in my case "echo DIGD >
> /sys/kernel/debug/vgaswitcheroo/switch"). Voila you have a readable screen.

Did you had the chance to narrow down what caused the change (<3.4.9 vs 3.7.1) ? It may be due to nouveau, vgaswitcheroo and/or other kernel driver


(In reply to comment #6)
> (In reply to comment #5)
> > Great find Matyas
> > 
> > If you're interested in cutting short the sequence you can forcepost the
> > card, thus it should have you the suspend/resume trick. Use
> > nouveau.config=NvForcePost=1
> > 
> > Curious if the above will give you acceleration or only a working
> > output/monitor
> > (ie. try it with and without noaccel=1)
> > 
> > Cheers
> 
> Hi,
> On my MBP, if I do not use noaccel=1, then the whole system crashes.
> 
> I have tried NvForcePost=1 with noaccel, the result was a switched off
> screen, but system seems to boot.
> With accel, screen switches off and crashes.
> 
> Is there any config which would activate accel for 9400 but no accel for
> 9600? This would be very helpful for me.
> 
> Thanks.

There has been a brief discussion what is the best way to handle this (passing nouveau params to specific card, disabling certain card etc.) although implementation may be far off ;(

Meanwhile add a hack for your card by checking the PCI and returning early rather than initialising the card - not sure which location is better nouveau_drm_probe or nouveau_drm_load. Keep in mind to keep is symmetric (ie. handle the case in nouveau_drm_remove/nouveau_drm_unload)

Your code will look something similar to

if ((pdev->bus == xx) &&
    (pdev->dev == xx) &&
    (pdev->func == xx)) {
      return 0; // you can also try return -E*
}

Cheers
Emil
Comment 8 Emil Velikov 2013-07-24 13:22:52 UTC
Same/similar to bug 27501 ?
Either way lets link both bugs
Comment 9 Joanand 2013-07-24 17:49:20 UTC
(In reply to comment #7)
> Hi Joanand
Hi Emil

> "Worked" with our without acceleration ?

With GPUPWR, the discrete (nVidia 9600M GT) was no longer available. And as nouveau works with 9400 and acceleration, it worked with acceleration.

> Did you had the chance to narrow down what caused the change (<3.4.9 vs
> 3.7.1) ? It may be due to nouveau, vgaswitcheroo and/or other kernel driver

3.4.9 did not have a viable vgaswitcheroo, but 3.7+ did. GPUPWR did no longer help, as vgaswiteroo reactivated 9600m. So the sole solution was to use the nouveau driver without acceleration.

> 
> There has been a brief discussion what is the best way to handle this
> (passing nouveau params to specific card, disabling certain card etc.)
> although implementation may be far off ;(
> 
> Meanwhile add a hack for your card by checking the PCI and returning early
> rather than initialising the card - not sure which location is better
> nouveau_drm_probe or nouveau_drm_load. Keep in mind to keep is symmetric
> (ie. handle the case in nouveau_drm_remove/nouveau_drm_unload)
> 
> Your code will look something similar to
> 
> if ((pdev->bus == xx) &&
>     (pdev->dev == xx) &&
>     (pdev->func == xx)) {
>       return 0; // you can also try return -E*
> }
> 
> Cheers
> Emil

Thanks for these pointers. I will try this hack on 3.10.2. I will report if I get any further.

BR
Joanand
Comment 10 Matyas 2013-07-24 22:51:47 UTC
Thanks for the pointers Emil.

I have tried the NvForcePost=1 configuration with noaccel=1, it only resulted in the backlight being bumped to 100%. My screen was still meessed up. Doing the suspend/resume trick fixed it.

Without noaccel I get a system crash too.

It would be nice to get acceleration.

Joanand, am I understanding it correctly that you could get your 9400 with accel? If so was that prior to 3.7 kernels?
Comment 11 Joanand 2013-07-25 17:11:53 UTC
(In reply to comment #10)
> Joanand, am I understanding it correctly that you could get your 9400 with
> accel? If so was that prior to 3.7 kernels?
Kernel 3.4.9 and below has worked with gpupwr and nouveau WITH acceleration.
gpupwr program deactivated 9600m graphics adapter and nouveau was unable to load the driver for 9600 (PCI xxxx has fallen off the bus, was the message).
Comment 12 Pierre Moreau 2013-08-06 13:59:29 UTC
Hi Joanand,

(In reply to comment #6)
> Is there any config which would activate accel for 9400 but no accel for
> 9600? This would be very helpful for me.

You might also try, in nouveau_accel_init:

    if (device->chipset == 0x96)
        return;

It works without nouveau.noaccel=1 and has no scrambled screen (at least for the GUI and the console, except for a tiny moment of full garbage (boot logo?) but it gets cleared away after), but it is unstable (hanged up some times at boot) and it spams a lot (more than 1600 lines) of

     nouveau E[PFB][0000:03:00:0] trapped write at 0x0000546000 on channel 0x0000fee0 [unknown] BAR/PFIFO_WRITE/FB reason: PAGE_NOT_PRESENT


Strangely, when connecting an external monitor to the laptop (MacBook Pro mid 2009, same cards), the GUI isn't scrambled any more and the console still is, but in a "better way".


I'll try to find why with an external monitor or with acceleration on the 9400, the handover goes well.
Comment 13 Ilia Mirkin 2013-08-21 04:07:01 UTC
There are two long bugs about this same issue, I'm giving the older one precedence.

*** This bug has been marked as a duplicate of bug 27501 ***
Comment 14 Pierre Moreau 2013-08-21 13:44:52 UTC
Hi Ilia,

It seems to me bug 27501 is about being unable to boot, which is not the "main" problem here, but rather having a garbage screen after a successful boot.

I'm bisecting it, and it seems it appeared between kernel 3.4 and 3.5-rc7. I'll post the full bisection here as soon as I can.
Comment 15 Pierre Moreau 2013-08-22 19:43:31 UTC
Bisected to:
commit 20abd1634a6e2eedb84ca977adea56b8aa06cc3e
Author: Ben Skeggs <bskeggs@redhat.com>
Date:   Mon Apr 30 11:33:43 2012 -0500

    drm/nouveau: create real execution engine for software object class
    
    Just a cleanup more or less, and to remove the need for special handling of
    software objects.
    
    This removes a heap of documentation on dma/graph object formats.  The info
    is very out of date with our current understanding, and is far better
    documented in rnndb in envytools git.
    
    Signed-off-by: Ben Skeggs <bskeggs@redhat.com>


I'll try to look for a patch this week-end.
Comment 16 Jamie Macdonald 2013-08-23 17:47:08 UTC
I patched my kernel with this diff: http://pastebin.com/q3MVep1f

to screen the 9400m from being initialized.
this: http://pastebin.com/JMrbbFVA is the dmesg output. (search "jamie")

with this patch, my command line is *not scrambled* when I use nouveau.noaccel=1, but crashes without that argument.

When I screen the 9600m from being initialized in the same way instead, the console gets scrambled with nouveau.noaccel=1 and crashes without that argument.

But, in the case of screening the 9400m, and using nouveau.noaccel=1, I cannot start X - it gives a "No screens found" error. Using that patched kernel, and trying to use the Nvidia proprietary driver fails as well with a "No screens found" error - but the Nvidia driver works when I use the unpatched kernel.

And now I'm stuck .. any help?
Comment 17 Joanand 2013-08-23 21:33:36 UTC
(In reply to comment #16)

> But, in the case of screening the 9400m, and using nouveau.noaccel=1, I
> cannot start X - it gives a "No screens found" error. Using that patched
> kernel, and trying to use the Nvidia proprietary driver fails as well with a
> "No screens found" error - but the Nvidia driver works when I use the
> unpatched kernel.
> 
> And now I'm stuck .. any help?

Yes this is "normal". X tries to bind to the PCI with the lowest ID, in our case 9600M has ID=2 and 9400M has ID=3. So you will have to use BusID:

Section "Device"
    Identifier  "NOUVEAU"
    Driver      "nouveau"
    BusID       "PCI:03:00:0"
    Screen      0
EndSection

Now if you are screening the 9400m, X should start without problem.

BR.
Comment 18 Pierre Moreau 2013-09-13 14:35:20 UTC
It took me some times, but here is a patch correcting commit 20abd1634a6e2eedb84ca977adea56b8aa06cc3e:
---------------------------------------------------------------------------------
diff --git a/drivers/gpu/drm/nouveau/nouveau_software.h b/drivers/gpu/drm/nouveau/nouveau_software.h
index fe30a8f..7adfcb9 100644
--- a/drivers/gpu/drm/nouveau/nouveau_software.h
+++ b/drivers/gpu/drm/nouveau/nouveau_software.h
@@ -20,10 +20,10 @@ struct nouveau_software_chan {
 static inline void
 nouveau_software_vblank(struct drm_device *dev, int crtc)
 {
-       struct nouveau_software_priv *psw = nv_engine(dev, NVOBJ_ENGINE_SW);
+       struct drm_nouveau_private *dev_priv = dev->dev_private;
        struct nouveau_software_chan *pch, *tmp;

-       list_for_each_entry_safe(pch, tmp, &psw->vblank, vblank.list) {
+       list_for_each_entry_safe(pch, tmp, &dev_priv->vbl_waiting, vblank.list) {
                if (pch->vblank.head != crtc)
                        continue;

---------------------------------------------------------------------------------
(The above empty line is needed)

However, the code was later modified, and the patch can't be applied on recent kernel; I'll try to get a new patch for it this week-end.
Comment 19 chr[] 2013-10-03 22:08:55 UTC
Hi!

Regarding suspend/resume scrambling screen issues, try this patch:


[PATCH] drm/nouveau/fb: fix suspend/resume fbcon

http://lists.freedesktop.org/archives/nouveau/2013-October/014656.html

chr[]
Comment 20 Pierre Moreau 2013-11-18 20:36:02 UTC
Here are some news about my small progresses.

I found out why commit 20abd1634a6e2eedb84ca977adea56b8aa06cc3e introduced a bug: it would init the psw->vblank field only if acceleration is enabled even if it is used in both cases; calling nv50_software_create even with acceleration solves the problem.

I reverted a few more commits, however I'm stuck on

  commit ebb945a94bba2ce8dff7b0942ff2b3f2a52a0a69 
  Author: Ben Skeggs <bskeggs@redhat.com>
  Date:   Thu, 19 Jul 2012 22:17:34 +0000

It seems like the init issue was fixed, but the screen stays scrambled. After some testing, it seems some of the work done by nouveau_channel_new (which is only called when acceleration is enabled) is needed, and therefore also n84_fence_create, but I couldn't find which parts. When enabling nv84_fence_create and nouveau_channel_new for the NVAC card (boot hangs if enabling it for the NV96 card), nv50_disp_intr spams lots of

  nouveau E[PFB][0000:03:00:0] trapped write at 0x0000546000 on channel 0x0000fee0 [unknown] BAR/PFIFO_WRITE/FB reason: PAGE_NOT_PRESENT

but it seems harmless, apart from getting a really big dmesg and, some times, hanging on boot.


Booting with 'nouveau.accel=0 nouveau.modeset=0 3' results in a clean console mode, and running startx manually after boot will also give a clean GUI.
Comment 21 Pierre Moreau 2013-12-09 11:04:50 UTC
Necessary part from nouveau_channel_new to get a clean screan are:
- nouveau_channel_ind
- nouveau_channel_init, but only the beginning:
  - vram creation
  - gart creation
  - dma variables initialisation

There are some MEM_CACHE errors, but at least it boosts and screen is clean.

I found out that nouveau_abi16_ioctl_channel_alloc was also calling nouveau_channel_new, but with other arguments for vram and gart, and it is called whether or not acceleration is enabled.
Is there a specific reason to call at first nouveau_channel_new only when acceleration is enabled, and later on when starting the GUI, to always call it?
Comment 22 Pierre Moreau 2014-03-05 13:31:49 UTC
Using commit e18833a518777e249b6badf54f65b37b741b6864 (http://cgit.freedesktop.org/~darktama/nouveau/commit/?id=e18833a518777e249b6badf54f65b37b741b6864) fixes the issue (tested on Git HEAD and on 3.13.5).
Thanks for the pointer Ilia!
Comment 23 Joanand 2014-03-07 08:53:04 UTC
(In reply to comment #22)
> Using commit e18833a518777e249b6badf54f65b37b741b6864
> (http://cgit.freedesktop.org/~darktama/nouveau/commit/
> ?id=e18833a518777e249b6badf54f65b37b741b6864) fixes the issue (tested on Git
> HEAD and on 3.13.5).
> Thanks for the pointer Ilia!

Hi Pierre,
I have tried the diff patch (3 changed lines) on  Kernel 3.13.5-gentoo, and did a quick test on nVidia 9600m GT: The screen gets scrambled as soon as nouveau is loaded. At the startup, EFIFB is used and works with full screen resolution, but the colors are "incorrect". EFI is setup to use 9600m as default.

I am still using nouveau without acceleration. I will be testing the patched Kernel on nVidia 9400m, by modifying EFI to use 9400m as default.

Could you do a diff on your patched kernel and unpatched, so that we could find other differences?

Thanks.

BR
Comment 24 Pierre Moreau 2014-03-07 23:21:06 UTC
Hi Joanand,

My HEAD before applying the patch was commit 34d5950 (http://cgit.freedesktop.org/nouveau/linux-2.6/commit/?h=drm-nouveau-next&id=34d595081812da62b5357579267c4ab5eae64ac1).
The HEAD after the patch is just: 34d5950 + e18833a, nothing more. So I'm running with both cards enabled.

I tried Ilias' patch (https://bugs.freedesktop.org/show_bug.cgi?id=27501#c27) to test each card alone, and I got:

*   NVAC only: corrupted screen (without e18833a) + acceleration not working;
*   NV96 only: no corrupted screen (even without e18833a) + acceleration working.

I wonder why we do have quite different results for the NV96...

Cheers,

Pierre
Comment 25 Joanand 2014-03-08 21:31:36 UTC
(In reply to comment #24)
Hi Pierre,
I have applied/changed these lines on driver/gpu/drm/nouveau/core/subdev/bar/nv50.c:
- line 233: int ret, i;
- line 351: for(i = 0; i < 8; i++)
- line 352: nv_wr32(priv, 0x001900 + (i * 4), 0x00000000);

Booting with both adapters enabled (9600 is the boot adapter), EFI-stub and EFIFB, loading nouveau without acceleration:leads to scrambled screen. The resolution is native 1440x900.

Booting with both adapters enabled (9600 is the boot adapter) EFI-stub and EFIFB, loading nouveau with acceleration, screen/adapter is frozen. System seems to "hang"/lag.

I am now trying the setup without EFIFB. Report back as I have tested it.

BR.

PS: Do you have a MacBook Pro 5,1 15.4"?
Comment 26 Pierre Moreau 2014-03-08 23:39:11 UTC
Hi Joanand,

Shouldn't the patch for driver/gpu/drm/nouveau/core/subdev/bar/nv50.c rather be
+ line 233: int ret, i;
+ line 351: for(i = 0; i < 8; i++)
+ line 352: nv_wr32(priv, 0x001900 + (i * 4), 0x00000000);
By the way, on top of which commit/kernel are you applying the patch?

Yeah, if you try with acceleration you end up with bug 27501 (https://bugs.freedesktop.org/show_bug.cgi?id=27501).

EFIFB should not be the problem: it is removed by nouveau at some point, to be replaced by nouveaufb, which seems to be not rightly configured (or the accesses to it), bringing screen corruption.

I have a 5,3 (iirc) MacBook Pro (mid 2009) 15.6", with the same graphic cards, resolution is also 1440x900.

Pierre
Comment 27 krestfallen 2014-03-09 10:36:30 UTC
(In reply to comment #25)

> Booting with both adapters enabled (9600 is the boot adapter), EFI-stub and
> EFIFB, loading nouveau without acceleration:leads to scrambled screen. The
> resolution is native 1440x900.
> 
> Booting with both adapters enabled (9600 is the boot adapter) EFI-stub and
> EFIFB, loading nouveau with acceleration, screen/adapter is frozen. System
> seems to "hang"/lag.

doesn't efifb wants to pick the 9400M?

efifb pics the framebuffer base at 0xC0010000
from efifb.c:
  [M_MBP_5_1] = { "mbp5,1", 0xc0010000, 2048 * 4, 1440, 900 }

these are the values for each gpu:
9400M:    0xC0010000
9600M GT: 0xB0030000

i tried nouveau yesterday and couldn't even load the kernel. with the nvidia driver it's even possible to switch to the desired gpu on boot. but with the 9400m screening and logging out afterwards and back in (no reboot) the screen also gets scrambled, slow and unstable.

perhaps it's something with the gmux values or memory allocation!?
Comment 28 Pierre Moreau 2014-05-09 20:55:04 UTC
I tested the drm-fixes branch from airlied's repo (http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-fixes) and it works, I can boot with just nouveau.noaccel=1 without having any garbage screen. Grabbing any 3.15-rc* should also work. Without any parameters seems to be working sometimes, until I launch X, but this is another bug obviously.
Comment 29 Pierre Moreau 2014-08-17 10:14:04 UTC
A fix went into kernel 3.15.

If you're still experiencing this issue with kernel 3.15+, please reopen the bug report.
Comment 30 krestfallen 2014-08-17 10:29:43 UTC
is noaccel=1 still needed?
Comment 31 Pierre Moreau 2014-08-17 10:50:38 UTC
As long as bug 27501 isn't fixed, noaccel=1 is still needed unfortunately.
Comment 32 Joanand 2014-09-24 06:53:01 UTC
Pierres patch from <a href="https://bugs.freedesktop.org/show_bug.cgi?id=27501#c29">Bug 27501, comment 29</a> works on my system.

Patch is tested on gentoo-sources 3.16.3.

Next step is to check if Ilias patch works to deactivate 9600M on boot up. For the moment I am using gpupwr to shutdown 9600M.

BR


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.